Happy SysAdmin Day

One more SysAdmin Day. I don’t have any war story to share, so I will copy-paste a comment I posted on LinkedIn:


Let me tell you a story that involves a real (brick & mortar) Architect:

– Sir, I need to fill you in on the latest details on the construction of the mansion and to ask for a favor?
– What is it?
– Sir, can you please ask your wife to not change her mind every two days? We tear down walls and raise them and we will never finish that way
– Look, I pay you a ton of money to deal with my wife’s expectations. Otherwise I’d be doing it

Forget the crudeness of the client. The moral of the story is that you are being paid to operate in a volatile environment with changing requirements outside your control. You need to accept the fact and learn to navigate it.

“Management Science Fiction “

In this episode of ArrayCast, Turing Award winner Ken E. Iverson talks about Ian P. Sharp (founder of IPSharp and Associates) and shares this:

He was one of the early people in operations research, which came to be called management science. So he knows what this stuff is, but he also likes to speak of management science fiction, which I think reflects the correct thing that those techniques were very much overblown and oversold, at least for a period.

This of course reminded me of Gene Woolsey (again someone well known in operations research) who at the beginning of the book is seen saying:

:
5. Does it work?
6. If yes, is there a measurable, verifiable reduction in cost over what was done before, or a measurable, verifiable increase in readiness?
7. If yes, show it to me NOW.

If you think I am trying to take a dig at some current trend of overblown and oversold techniques using lessons and parallels from the past, you are correct :) I have expressed the very same opinion in private conversations about what is happening with GenAI today as I.P. Sharp had years ago about Management Science.

That made me smile for the rest of the weekend.

OpenVPN, LDAP and group membership

While the need for LDAP integration and OpenVPN seems straightforward, it seems to me that the documentation for the auth-ldap plugin is not very easy to locate and find. Take for example the following auth-ldap.conf configuration file

<LDAP>
URL ldap://ldap.example.com
Timeout 15
</LDAP>
<Authorization>
BaseDN "ou=users,dc=example,dc=com"
SearchFilter "(uid=%u)" # (or choose your own LDAP filter for users)
RequireGroup false
</Authorization>

This is a very handy starter that would allow any user with a working password under the ou=users part of your tree to be granted access. But what if you would want to restrict access based on group membership? According to fragments of documentation scattered at different bits of forums and StackOverflow / ServerFault, you’d need to set RequireGroup true and then use the BaseDN of the group and the memberUid attribute within a <Group> ... </Group> subsection of Authorization. This never worked for me. What worked was changing the Search filter to include group membership:

<LDAP>
URL ldaps//ldap.example.com
Timeout 15
</LDAP>
<Authorization>
BaseDN "ou=users,dc=example,dc=com"
SearchFilter "(&(uid=%u)(memberOf=cn=openvpn,ou=groups,dc=example,dc=com))"
RequireGroup false
</Authorization>

Voila!

I did not come up with this. I found it via random Googling somewhere in SO (I cannot remember and cite that answer anymore).

I’d rather deploy on a Sunday

Originally posted on LinkedIn, but also saved here in hopes of longer posterity:

When you deploy you change the behavior of the system. You wouldn’t be deploying if you didn’t want to change it in the first place.

I don’t deploy on Fridays, not because I’m afraid of the technical implications -we know for years how to manage these- of a deployment gone wrong, but of the business implications and outsider systems dependencies that operate in degraded mode during the weekend. It is not whether the Ops or Dev on call can handle the things. It is whether the other side can and the faith you have they can. Or that you (or they) have a business person on call.

I’d rather deploy on a Sunday

The “oil lamp” law

[ Originally a Facebook post, copied here for posterity. ]

Some thirty years ago I was told the story of a server with an oil lamp on the side (the kind that Greek Orthodox people light to honor God and the Saints). It was put there to humor the situation: the server need not break under any circumstance.

Well, it has been my experience of many years, sectors and shops of different sizes, that no matter what, there is always at least one key system that “needs” an oil lamp by its side in the organization. A system that is critical enough to warrant all the attention it gets, yet so critical that nobody risks upgrading / changing / phasing it out during their tenure (the system is guaranteed to outlive them; I count three such systems that have outlived me). Untouchable systems that get replaced only when they physically die.

Seek out who needs an oil lamp. Plan accordingly.

[ There’s another “law” that follows as a result of the oil-lamp, but maybe for another time. ]


Achilles Voliotis writes:

The candle was placed (without oil) for fun on the VAX-750 by Spyros Potamianos, while I was admin in softlab. Since VAX was running (initially, then we installed 4.2 BSD) the Unix version of the DEC called ULTRIX, he had also put an empty bottle of the well-known shampoo ULTREX with changed E to I with a marker.
The VAX 750 consisted of 2 "boxes" (~ 30U). The lamp was on the disk device (170 MB, "removable") and became famous for a strange story that makes you become superstitious or believe in Murphy's laws.
During a service visit by the DEC technician, he saw the lamp on the "disk drive", and started joking ("you are not serious", what has the lamp to do with computers, etc.). And in a theatrical move he throws a smack at the candle that falls down.
In a satanic coincidence, the "disk drive" (actually it was a closet weighing more than several hundred Kg), almost at the same time made some strange sounds and stopped working !!!
The sequel was quite adventurous, due to a painful history between the DEC and its representative office in Greece and Cyprus (DCC). In short, for ~ 2 months, a team came to the softlab every 1-2 days, which never had 2 Greeks (French+German, Greek+German, Italian+ Greek, etc.), they disassembled the device into parts, all of them were replaced (some of them more than once), with always the same result: The "disc" spun, sometimes it worked 4-5 minutes and in the end was failing with the same strange noise.
At some point, after two months of ineffective repairs, a team comes with two Greeks (no foreigners). They Opened the lid, modified the settings in a DIP switch, closed the lid and ... the disk started working normally!
After this adventure the candle came back on the disk pack and stayed there for a very long time.

on brilliant assholes

[ yet another meaningless micropost ]

From time to time people get to read the autobiography, or memoirs of a specific time when a highly successful individual reached their peak. Fascinated by their success and seeking to improve their own status, these followers* copy the behavior they read about. Interestingly, if this behavior is assholic and abusive even more easier. Someone with psychology studies would have more to say here, I’m sure. In my “communication radius” this is very easy to observe with people who want to copy successful sports coaches, and you can see this pattern crossing over to other work domains too.

It is not easy to understand that someone can be an asshole whose brilliance may make them somewhat tolerable to their immediate workplace, while the other way round does not stand: assholic behavior does not generate brilliance. Solutions do.

If you think you’re brilliant, just pick a hard problem and solve it. I know, it’s …hard.

[*] Leadership programs and books create followers, not leaders.

Work expands to fill the time available for its completion.

This is also known as Parkinson’s Law. But earlier today I dug up from the Internet Archive this gem of a post (you should read it all, really):

Parkinson inferred this effect from two central principles governing the behavior of bureaucrats:

1. Officials want to multiply subordinates, not rivals.
2. Officials make work for one another.

Just a note when I seek to understand big organization behavior.

Sometimes n2n is good enough

You are not always in a position to use one of the big five cloud providers (BTW, I think Watson’s prediction was kind of true; serverless is making sure of that).

So when you’re working in different than usual public cloud environments, sometimes you miss features that are a given, like a VPC. A VPC is a pretty cool abstraction that allows you to have an isolated network of machines (and sometimes services) within your cloud provider and allows for easier management of things like security groups, routing traffic between machines and the like.

So what do you do when you do not have a VPC available? You need some kind of overlay networking. When deploying Kubernetes for example, you need to deploy an overlay network (there are many solutions to choose from) and you let it deal with iptables and routing hell. But, you may need to temporarily scale services that are not container orchestrated for whatever the reason (I, for example abhor running databases with Kubernetes). Still you may need an autoscaling solution like EC2 does. IPSec would be a cool solution, but deploying it in my current workplace would be too complex. Something simpler was needed. And I found it here, despite the shortcomings reported: N2N from ntop.org.

N2N was in a development hiatus, but now is back on active development. It utilizes TUN/TAP and allows you to build a VPC over the interface that the client you run on your machine creates. It comprises of two components: a supernode, which is actually a directory server that informs the members of the (let’s call it) VPC of the actual IP addresses of the members, and a client program (called edge) that creates the interface on each VM and contacts the supernode to register with it and query for needed routing information. The supernode itself is not routing any packets. It was a single point of failure, but current versions of N2N/edge support two supernodes, so your network is in a better position.

In my case I needed to autoscale a certain service for a limited amount of time, but did not have any prior knowledge of the IP addresses of the VMs that were to be created. So I had them spun off using an image that was starting edge, registering to the supernode and then routing a certain kind of traffic through a node that was also registered in the same network and acting as a NAT server. Hence, I simplified some of the iptables hell that I was dealing with until we deploy a better solution.

N2N supports encrypted traffic, and requires the equivalent of a username / password combination (common to all machines that are members of the VPC, but not known to supernode apriori).

So where else might you want to use N2N? Maybe you need a common IP address space between two cloud providers? You may be in a cloud provider that allows for VPCs but does not make it easy to route traffic from a VPC in one region to a VPC in another region? Or in cases when solutions like DMVPN are expensive and your own BGP solution an overkill? Stuff like that.

So how do the machines acquire an address in that VPC? You have two choices (a) DHCP and if that is not working (b) a static address. In the second case with you need to implement a poor man’s DHCP by having the machine assign an IP address to itself with a low probability of collision. To this end, let’s assign a /16 to that VPC and have the following entry in /etc/rc.local (yes I am still a fun of rc.local for limited usage) like:

edge -l supernode:port -c community-string -k password-string -s 255.255.0.0 -a static:172.31.$(shuf -i 1-251 -n 1).$(shuf -i 1-251 -n 1). 

The probability of a collision is (1/63001) so you get to decide at how many temporary instances you need to have a better hack at that. Plus you get to use 172.31.1.0-255 for static machinery within the VPC (like a NAT gateway for example).

Not a perfect solution, but definitely an easy, fast one.

In sed matching \d might not be what you would expect

A friend asked me the other day whether a certain “search and replace” operation over a credit card number could be done with sed: Given a number like 5105 1051 0510 5100, replace the first three components with something and leave the last one intact.

So my first take on this was:

# echo 5105 1051 0510 5100 | sed -e 's/^\([0-9]\{4\} \)\{3\}/lala /'
lala 5100

which works, but is not very legible. So here is taking advantage of the -r flag, if your modern sed supports it:

# echo 5105 1051 0510 5100 | sed -re 's/^([[:digit:]]{4} ){3}/lala /' 
lala 5100

So my friend asked, why not use \d instead of [[:digit:]] (or even [0-9])?

# echo 5105 1051 0510 5100 | sed -re 's/^(\d{4} ){3}/lala /' 
5105 1051 0510 5100

Why does this not work? Because as it is pointed in the manual:

In addition, this version of sed supports several escape characters (some of which are multi-character) to insert non-printable characters in scripts (\a, \c, \d, \o, \r, \t, \v, \x). These can cause similar problems with scripts written for other seds.

There. I guess that is why I still do not make much use of the -r flag and prefer to escape parentheses when doing matches in sed.