When you need to execute sudo from monit, you have to check whether the requiretty flag is off in your sudoers file.
requiretty: If set, sudo will only run when the user is logged in to a real tty. When this flag is set, sudo can only be run from a login session and not via other means such as cron(8) or cgi-bin scripts. This flag is off by default.
“Complex systems tend to oppose their own proper function. As systems grow in complexity, they tend to oppose their stated function.”` —John Gall
Σε περίπτωση που κάποιος έχει ακόμα απορία γιατί δεν λειτουργούν οι “άχρηστοι” μεγάλοι (δημόσιοι ή όχι) οργανισμοί. Θυμάμαι δημόσιο οργανισμό του οποίου τα έσοδα πήγαιναν πρακτικά όλα στη μισθοδοσία του προσωπικού του και όχι στο έργο που έπρεπε να επιτελεί. Γιατί ανεξάρτητα από το stated purpose, η μισθοδοσία είναι η βασική λειτουργία.
Όσο γιατί δεν κλείνουν εύκολα, γίνονται δυσκίνητοι και too big to fail:
“Systems tend to grow, and as they grow, they encroach†.”
Ο άνθρωπος τα έχει πει όλα.
Update:
Λέω να αναπτύξω λίγο την κοινοτυπία μου. Η αλήθεια είναι πως το συγκεκριμένο post προκλήθηκε από το πρόβλημα της ΕΡΤ και το πως το αντιμετωπίζει μια Κυβερνητική γραφειοκρατία. Δεν έχει σημασία ποιος είναι Κυβέρνηση ή όχι, ούτε πόσο μεγάλη ή μικρή ήταν η ΕΡΤ. Συνήθως λέμε κοινοτυπίες πράγματα που είναι προφανή, αλλά δεν παύουν να είναι παρατηρήσεις που ισχύουν. Στην περίπτωση μας έχουμε ένα complex system την ΕΡΤ που είναι μεγάλη και δεν δουλεύει καλά. Όχι μόνο αυτό αλλά τελικά είναι τόσο μεγάλη που με συνεχείς απεργίες σε καίριες ημερομηνίες καταφέρνει να μην υπηρετεί το σκοπό για τον οποίο υπάρχει. Δεν λέω να μην έκαναν απεργίες, εγώ ήθελα να συνεχίσει να υπάρχει η ΕΡΤ. Λέω πως αυτά φαίνονται σε ένα εξωτερικό παρατηρητή που δεν έχει καμία σχέση με τους εμπλεκόμενους στο πρόβλημα.
Όσο για το δεύτερο νόμο που αναφέρω από το Systemantics, αυτός είναι μια κλασική περιγραφή για τα legacy systems. Μπαίνουν σε λειτουργία, αποδίδουν, μεγαλώνουν σε κλίμακα και αγκιστρώνονται. Αν δεν σας αρέσει η παρατήρηση για την ΕΡΤ μπορείτε να την κάνετε για το Gmail, το twitter ή το Facebook. Αλλά τι λέω; Αυτά έχουν ωραίο web interface, δεν μπορεί να είναι τόσο χάλια όσο η ΕΡΤ.
Δεν βρίσκω κοινοτυπίες τις συγκεκριμένες φράσεις, κυρίως γιατί όταν τις χρησιμοποιώ παρατηρώ την αντίδραση στα μάτια αυτού που τις ακούει. Για έξυπνους ανθρώπους που καταλαβαίνουν από συστήματα και scaling είναι προφανείς. Αλλά δεν είναι για όλους.
† Advance gradually and in a way that causes damage.
Nonprofit institutions need a healthy atmosphere for dissent if they wish to foster innovation and commitment. Nonprofits must encourage honest and constructive disagreement precisely because everybody is committed to a good cause: Your opinion versus mine can easily be taken as your good faith versus mine. Without proper encouragement, people have a tendency to avoid such difficult, but vital, discussions or turn them into underground feuds.
And thus we observe one of the major deficiencies of most governments where no one dares oppose the Leader. And Leaders who are more in need of followers than opposers, do not foster dissent. This results in a convergence where the inner circle does not dare to disagree giving the Leader a distroted view of reality. Because as a basic law of Systemantics says:
Information rarely leaks up.
Always have someone question your decisions. Engineering or otherwise.
Back in the days with no remotely-controlled power we had a server with another computer set face-to-face, such that the CD tray was hitting the power reset button of the neighbor. Unfortunately, this one has the power button on top, and you have to hold it in order for the box to shut down.
This is something System Administrators aquire as knowledge along the way (as the homeostasis provider that they are). This is something that developers always ignore for they do not operate the systems that they build either on scale or for long enough to understand how what they built works. This is something that every DevOp and their managers should be prepared for:
“Complex systems possess potential for catastrophic failure. Human practitioners are nearly always in close physical and temporal proximity to these potential failures – disaster can occur at any time and in nearly any place. The potential for catastrophic outcome is a hallmark of complex systems. It is impossible to eliminate the potential for such catastrophic failure; the potential for such failure is always present by the system’s own nature.” — How Complex Systems Fail
If people expect that the software intensive systems that they use are like bridges, they should be prepared for Tacoma Narrows.
“Human interaction is a game, a dance, a playful thing that is deeply satisfying in itself” – John Gall
I got to read John Gall’s “Dancing with Elves” after reading his well known “Systems Bible” (for which I’ll blog another time). The book deals with strategies that one can use in order to influense kids in a positive way so as to achieve what the parent wants the kid to achieve. By that we do not mean to pre-plan the child’s life and then watch as the plan gets executed. This is not the plan. The plan is to overcome furstration (and disobedience) and find out strategies which will help the child arm itself before being released into the world as a responsible adult that does not require parental supervision.
I have to admit that the fifteen strategies presented in the book are interesting. They all strive to make the parent not say “no” or use any other negative, derrogatory or yelling arguments to have a point pass. Like the author says “don’t oppose forces- utilize them”. The strategies may seem conflicting, but Gall as an accomplished paediatrician undrestands that there is no unique strategy that would fit all children, or even one child all the time. So one of the first things that parents need to realise, is that you have to use the strategy that works at the given time and situation. And be prepared that it may not work some time afterwards. I think the message of the book is: Everytime you want to yell to make a point, can you do it without yelling? Here’s how.
a million ways *
A book about (systems) management
I do not know how well am I going to use advice from the book as a parent, but this book is more than a parenting book. It is a management one. At least within the IT business where childish, erratic or other BOFH style behavior is common. This occured to me when reading that
“although every picture tells a story, the story it tells may not be the same for everyone. The meanining of communication is what the other person makes of it, and that’s not necessarily the same as what you intended. It’s up to you to notice that. That’s your feedback.”
Compare the above to the everything is a DNS problem mantra. But then again there is also other management insight that most overlook:
“But what does it mean when you say a person is “just lazy” or “just stubborn”? It really means that you have tried out some of your repertoire of behavioral interventions in order to elicit desired piece of behavior from the other person and you have failed, because yoour repertoire was too limited.”
Yes dear manager of weird IT people, sometimes you have to admit that your repertoire is limited. You too have to change your approach to get the job done.
I loved the book. How could I not love a management book presenting itself as a parenting one which in the last pages includes the definition of the law of requisite variety?
[*] – image and phrase came from my twitter timeline, not from the book
Regardless of how fun (and close to heart) it is, the analogy is flawed because it describes a sequence. We do not wait to compete all the steps in the ladder in order to get to the next level. Nor, do we restart only when a barrel hits us.
Fast transients is what we do. Fast transients is a term conceived by John Boyd and he first used it for air combat: “the ability to change altitude, airspeed and direction in any combination”. This is after all the essense of the Release Early; Release Often mantra. Push your system out in the wild so as to get a grasp of where the audience wants to direct it. Or plan for organized abandonment. According to Boyd, what matters most is the tempo of change: “fast transients suggeststhat -in order to win or gain superiority- we should operate at a faster tempo than our adversaries, or inside our adversaries’ time scales”1,2
So it is no wonder that I belive that although not so funny, the OODA loop describes how we work:
OODA Loop from CTOVision’s “I’ve got the OODA Blues”
Because as Boyd wrote:
“Orientation isn’t just a state you are in; it’s a process. You’re always orienting […] A nice tight little world where there’s no change – dinosaurs; they’re going to die. The name of the game is not to become a dinosaur […] If you are in an equilibrium position, you’re dead”
Now think of that in terms of what you do just to keep current with the tools of the trade and what you do in order to monitor, manage and evolve your infrastructure.
[1] – A vision so noble, Daniel Ford
[2] – Which reminds me of the Nyquist sampling theorem
“Physics: there was the key. Record your observations. Apply physical principles.Speculate, but only trust proven conclusions. If I were to make any progress, I’d have to treat the task as a freshman physics problem. Time to update my notebook.” —Cliff Stoll, The Cuckoo’s Egg.
Recording observations. Updating notebooks. Something we computer people frequently forget regardless of the big data hype and logging infrastructures that we build.
We all know the system administration rule of thumb about when a task will be done:
Estimate the time it will take you and double it. Then double it again and add some more.
This is not a baseless rule. You know how much it will take you to finish the task. What you cannot predict in this interrupt driven line of work that we do, is how much noise you will have to deal with simultaneously while dealing with the task at hand. Or what unexpected circumstances will unearth because of misinfromation, poor documentation or simply bad luck. So you need breathing space in order to complete the task. For when you give an estimate, users take this as written in stone. When you are of your estimate they view this as a broken promise, regardless of what caused the extra delay. They just do not care. All they care about is that you “promised” it will be ready at some time and it is not. And because they do not care, you always need more time than you think.
Sometimes this also has the added advantage that when you are lucky enough to work uninterrupted and finish within the time frame promised, polite users will thank you for your efforts to complete as fast as you could. They see this as a proof that you care about their pain and do your best. Oh, yes the rest again do not care.
Why was I reminded of this? Well because our DBA hang on his wall the following formula:
Where Tc is time to complete, b is the best time, w is the worst time and m is the most likely time. You can read more about the formula and its history here [pdf].
After reading this (in Greek) a friend, who works as a junior DBA at a bank, sent me these dialog exchanges between him and a highly costing consultant:
Friend: I think that rebuilding the indices once a week just because this works is not the proper method to deal with the problem. Shouldn’t we be looking at other stuff like table fragmentation for example?
Consultant: Look, rebuilding and reorganizing the indices once a week is good practice because you know, it works!
Two years pass and:
Consultant (the same): In order to be sure when to rebuild the index, you should look at the table fragmentation level.
By the way, if someone is interested in a junior DBA who can put in the hours needed to solve problems and is not afraid of studying in depth in order to do so, drop me a note so that I can put you in touch.