systemd are not very compatible yet. But it is here to stay and one can hold back their machines for only so much. Inevitably, no matter how much you delay it, you will migrate to operating system versions that support it. So here is the path that I figured out for smooth coexistence until systemd grows on you:
Do most of if not everything that you would try to do with systemd with something like puppet or ansible by applying playbooks locally. Configuration management / orchestration software already knows how to handle systemd and they do it well. Take advantage of that then.
[ Αυτό το κείμενο γράφτηκε λίγο μετά την 2006/11/26 μετά από τις πρώτες εκλογές στο ΤΕΕ με ηλεκτρονική υποβοήθηση. Δεν είχα πατήσει ποτέ το publish ως σήμερα, αλλά το σημερινό φιάσκο της ΝΔ, η χρονική απόσταση και το γεγονός πως δεν είμαι στο ΤΕΕ μου δίνουν μια ευχέρεια να το κάνω. Για όποιον ενδιαφέρεται, δεν είχα κεντρικό ρόλο στο έργο. Η ημερομηνία των εκλογών συνέπιπτε με την άφιξη του πρώτου πελαργού. ]
Ιστορία γράψαμε μια και η καταμέτρηση έγινε ηλεκτρονικά και είναι η πρώτη που έγινε στη χώρα και σε τέτοια κλίμακα. Ίσως δεν είναι τόσο πετυχημένη στα μάτια πολλών (σε καμία περίπτωση δεν ανταποκρίνεται στις προσδοκίες μου), δεν παύει όμως να είναι μια δύσκολη και heroic system administration story για εμάς που ήμασταν μέσα σε αυτή. Τα προβλήματα πάρα πολλά. Αυτό το κείμενο δεν καταπιάνεται με κανένα από τα προβλήματα που δεν άπτονται της ηλεκτρονικής διαδικασίας (π.χ. πολιτική ανάλυση των εκλογών του ΤΕΕ, ενστάσεις κ.λπ.) παρά μόνο με τα προβλήματα που διαπιστώσαμε κατά την ηλεκτρονική διαδικασία και όπως την έζησα από το γραφείο μου:
Το task ήταν τιτάνιο για το μέγεθος του υποστηρικτικού μηχανισμού. Τα εκλογικά τμήματα είναι 159. Αυτό σημαίνει άμεση διαχείριση 2×159 ανθρώπων, συν τους τεχνικούς που είχε stand-by ο προμηθευτής του εξοπλισμού. Όλοι αυτοί οι άνθρωποι ήταν διαφόρων επιπέδων τεχνικών ικανοτήτων και διάθεσης. ~318 βαθμοί ελευθερίας.
Υπήρχαν διαθέσιμα 180 PC. Όλα ίδια και όλα φτιαγμένα με την ίδια μήτρα. Θα περίμενε κανείς πως θα είχαν και κοινή συμπεριφορά. Αμ δε! Οι drivers των scanners σε μερικά εκλογικά κέντρα χάνονταν ανεξήγητα κατά τη διαδικασία και έπρεπε να γίνει επανεκκίνηση.
Για να παίζουν κάποια PC έπρεπε α) να αλλάξει PCI port το PSTN modem ή β) να αφαιρεθεί πλήρως. Διαφορετικά, BSOD.
Παρόλο που όλα έγιναν install με το ίδιο image, μερικά ξεκίναγαν με αυθαίρετα settings για το firewall.
Το OpenVPN για να παίξει εγκαθιστά ένα virtual Ethernet driver. Επίσης ανεξήγητα σε μερικά PC αυτός ήταν disabled (ενώ στις δοκιμές της προηγούμενης μέρας έπαιξε κανονικά!). Έπρεπε να γίνει enable με το χέρι.
Και το τελειωτικό: Ενώ δεν υπήρχε κανένα φαινομενικό πρόβλημα με το OpenVPN, δεν είχαμε σύνδεση. Ούτε με reboot. Μετά από shutdown όμως είχαμε.
Όσο συμπαγές και μικρό να είναι το documentation δεν θα το διαβάσουν όλοι.
Ακόμα κι αν υπάρχει ένα δηλωμένο τηλέφωνο επικοινωνίας, όλη η Ελλάδα θα ανακαλύψει το δικό μου.
Το χειρότερο από όλα: Εκτός από ερωτήσεις για την ηλεκτρονική διαδικασία μας γίνονταν και ερωτήσεις επί της εκλογικής διαδικασίας. Φανταστείτε λοιπόν κάποιον που ρωτάει τρία πράγματα μαζί και πρέπει να του πεις “Ξέρετε, αυτά είναι θέματα της Κεντρικής Εφορευτικής, πρέπει να μιλήσετε εκει”. Χαρά ε; Ειδικά αν για να λυθούν αυτά πρέπει να κάνει τρία (3) τηλέφωνα και όχι ένα.
Δεν υπολογίζω στα παραπάνω τραβήγματα καλωδίων ή άλλους λάθος χειρισμούς, γιατί είναι αναμενόμενοι. Ακόμα και αυτά όμως ήταν αρκετά για να υπερχειλίσει το τηλεφωνικό μας κέντρο (είναι ώρα για το asterisk πια) τις πρώτες ώρες και πριν ακόμα ξεκινήσουν τα scannαρίσματα όπου:
Ο DBA μας έδωσε ρεσιτάλ restore (μερικά over PSTN, άρα όχι τόσο άνετα). Είδαμε database errors που δεν είχαμε ξανασυναντήσει ποτέ. Το επόμενο βήμα είναι να κάνουμε forensic restores.
Είδαμε απίστευτα κολλήματα στο scan. Δεν τα είδαμε στο software development, δεν φάνηκαν στις εκπαιδεύσεις, δεν φάνηκαν καν στο test-event και ήταν οι ίδιοι άνθρωποι (και ναι ξέρουμε να στήνουμε σενάρια) που τα έτρεξαν όλα.
Αποκρυπτογράφηση του ίδιου προβλήματος με 30 διαφορετικές περιγραφές (π.χ. σε εμένα πρόβλημα στον scanner έφτανε ώς πρόβλημα σύνδεσης στο δίκτυο) σε 5 διαφορετικούς ανθρώπους.
Θα πάρω όλη την υπόλοιπη κανονική μου άδεια και θα αφοσιωθώ στον παιχταρά μου (== adamo version 2.0).
A BlinkStick is a USB powered LED that can be driven via a simple API from a programming language like Python. Since the status of an ElasticSearch cluster can be determined using three colors (green, yellow, red) a BlinkStick can be a nice visual aid in your monitoring infrastructure. An easy proof of concept is to use the BlinkStick Website API that allows, given a token, for the color of your BlinkStick to be set remotely by your monitoring system for example. You can find such a proof of concept here: https://github.com/a-yiorgos/elastic-blink
BlinkStick, soldered
Of course if you want a more elaborate and secure setup, it is possible. I had my 15 minutes of fun. YMMV.
If you’ve read The Phoenix Project, then you already know that of the four types of work, unplanned work is the plague for DevOps. It is just that sometimes, this unplanned work for DevOps, is planned work for everybody else. Conway’s Law at its best.
I remember I got the Phoenix Project for the Kindle for free on a promotional day sometime back in 2013. I’ve been meaning to read it ever since. But back in those days I was suffering from the problems that the main character is suffering for at least the first 35% of the book. And well, when you can simply rename the characters of a book and relive the experience it is not something that helps.
I finally made it this week and got through it with some late night reading. Just like now that I am writing the post minutes after reading the last page. I am not going to rumble about the three ways or even the four types of work. By now this is common stuff and even some years ago, if you tried to read about Systems Thinking or even Cybernetics, you would have reached to those conclusions. But hey a story always imprints a lesson better than a textbook and this is so much better than The Deadline. You want to revisit The Deadline in order to copy the notes of Mr. Tompkins. You do not need to revisit The Phoenix Project.
Interestingly the book forms a career path for people interested to follow. It kind of reminded me of Putt’s Law and how you cannot postpone your promotions forever. I find it kind of optimistic careerwise, depending the location of the reader and there is still the question of the top floor.
While this is a novel about DevOps, DevOps still means different things to different people. Luckily this is a novel for all people for whom DevOps at least means something.
Well it is SysAdmin Day today and there will be lots of cheerful happy posts of appreciation. But me, I will be a fun spoiler today and will point you to the most powerful posts that I’ve read this year:
If you are running Ganglia in a non multicast environment, it may be the case where a gmond process occupies 100% of the system’s CPU. In such cases it is helpful to check whether the node should be deaf or not (when it is a node that should not receive any messages from others). So try to set in your gmond.conf:
deaf = yes
and see what happens. This has happened to me on CentOS 6.x systems.
Conway was working with six problems at a time as a means to battle depression by failing to conquer a specific problem. Six problems are enough to fit your daily mood. It is more than one per day anyway. In DevOps the software stack you need to coordinate just to make your app go live is so versatile that there is always something you can work on daily even if your primary focus seems stalled. Beware though since this is also a way to not make any work at all. You can eat an elephant but somehow we seem to attract hordes of them. And then you can just stare at the monitor Hopeless.
A good strategy to combat this is what I once heard “type A people procrastinate by doing other things”. Your stack has more than six components, you can easily have more than six fallbacks should you get stuck.
If anything, please try and don’t ever break your rhythm. Interrupts cost. Context switching too.
Pick your problems with specific goals in mind
This is one thing that I suffer a lot. Everything is cool. And I want / need to know everything. In acceptable depth. Today. But this is not feasible even if we have 25h per day and no sleep. So, IMHO (and I try hard to do so) the plan is to pick problems with two goals in mind: Your current job and “the dream job” you want to land on.
Big problem
The difficult and important problem for Conway. The Architecture problem in the DevOps case. Is your running Architecture OK? Does it need improvements? Do you need something completely new? Are you to invent the next lambda architecture? How are you going to make your dent in the circle?
Your contribution to the world.
(Yes, you have to try to make a dent in the circle even if you’re not pursuing a PhD.)
Workable problem
Big problems mean big delays most of the time. And in trying to solve big problems you need practice. So you need to have an arsenal of problems you can work on that you can solve. They may be boring or even need repeated tedious tasks performed by you. Automate yourself out of them if you can. Flex your brain muscle so that you can work on the big stuff properly warmed up. Your current setup always has clear steps that you can walk forward. Nothing fancy that you could give a speech about, but something that you can complete during the day and feel good about it.
Book problem
I do not have a book project on my own. But over the last 20 years there have been twobooks that I wish I could find time to revise. I have a friend though that just finished writing a book and he seems pretty happy with the result.
If you’re writing a book, consider this as yet another problem you’re working on. If you’re not writing a book, well write something. There is always less documentation than needed.
Read a book by the way. Make reading books your book project. That’s mine too.
Fun problem
You should always have at least one problem that you do for fun said Conway. Well I guess we have Github for that, don’t we? I think my current fun project will be cryptopals. Let’s see for how long.
Enjoy your life
“The trick in life is to find out what you think is play that the fools think is work so that they will pay you to do it.”
Happiness is the single productivity booster that one can think of. Grief and depression the best demoralisers. This post about Karojisatsu really shook me. And it came during a time that I was seriously thinking about DevOps inflicted depression.
I still have no generally applicable tricks about that.
ansible_managed is a string that can be inserted into files written by Ansible’s config templating system. You put the macro string # {{ ansible_managed }} in your jinja2 template and it gets expanded to something meaningful like:
# Ansible managed: /path/to/file/template/hosts.j2 modified on 2014-09-24 10:52:51 by username on hostname
You get a good idea of where the file came from. Unfortunately, templates work only with ansible playbooks and not with the direct ansible command. But even when you use the copy module outside a playbook it is a good practice to put a comment that includes {{ ansible_managed }} at the beginning of the file. It serves as a handy reminder on how this file got installed in the first place. And in the future, if you make a template and a playbook work with it, you’re already set.
The old sysadmin had resigned and the new one was taking things over. After having finished orientation of the systems and important stuff that needed to be performed since day one, the old sysadmin gave the new three envelopes.
– Open these if you run up against a problem you don’t think you can solve, he said.
After a few months passed by a severe problem arose. Downtime was long, management and customers furious and all over our sysadmin. Desperate and without any other help, he opened the first envelope. The message read, “Blame your predecessor.”
So he did and everybody got off his back. In a calmer environment and with less pressure he was able to get things working again.
About a year and a half later, havoc rose again. Deadlocked, our sysadmin opened the second envelope. “Reorganise” said the message.
He decided to switch automation software, documented why this was beneficial for the company’s business and indeed the systems behaved better and everyone was happy again.
Time passed and it just so happened that the system was inexplicably inoperable again. So the helpless sysadmin decided to open the third envelope. The message read, “Prepare three envelopes”.
You know people at work appreciate you when they do not let you open the second envelope.