A poster over at ServerFault complained about the attitude some sysadmins show towards their users, even when the task seems simple and can take as little as 30 minutes maximum. Many users share similar concerns / complaints:
“Every time I ask a simple request like [simple request], these guys act like i’m asking them to build the great wall of china overnight. I’ve had to do this myself many times, it takes under 30 minutes, and maybe 30 seconds of user interaction.”
Or so the poster thinks. There are enough answers that show why comparing stuff you do on a single system are not to be compared with stuff you do when inserting a new system into an already working web of systems with provisioning and established procedures in place. But even when it is only a matter of 30 minutes, it is also a matter of when these thirty minutes will be devoted. Users do not know about RMS or EDF and do not understand that in an interrupt driven line of work sysadmins use intuitive variants of them. I want to expand however on a comment I posted there which links Metcalfe’s Law to the problem. Metcalfe himself has written about the law:
“[Nobody] has attempted to estimate what I hereby call A, network value’s constant of proportionality in my law, V=A*N^2. Nor has anyone tried to fit any resulting curve to actual network sizes and values.”
For simplicity most refer to the law by using V ~ N^2. Note though that in the same blog post Metcalfe points that the constant A (which we conveniently omit most of the times) may change while N increases and may even be a more complicated function of N. He urges people to look into that.
What Metcalfe defines as value, is what we, system administrators, lift for a living. So when a service is down and your sysadmins work like crazy to bring it back rest assured that they already know what is at stake. Metcalfe made sure of that. And that is why it does not really help asking them every ten minutes “When is it going to be up again? We are losing money!” Not only do we know, we do not even need a napkin for our guestimate.
And that is why what for the user is “just another server” or “just one more service” and therefore going from N to N+1, actually means that the load to be lifted increases by 2N+1. No it is not just another server or service for it is not independent. It is inserted in an already complex system and it must be done so in a way that does not affect the stability of the (new) whole. Rolling back, if things fail, is a myth. This is a lot more complicated than your testbed setup which no matter how complex, is simple enough. Consultants and other “out of town experts” routinely make this mistake.
A schematic may make it easier to understand. We all know the corporate pyramid, where “the top” is the target (or the result of the Peter Principle in action) of workers within an organization. But within organizations, a second (inverse) pyramid forms, a pyramid that explains your sysadmin’s day:
It’s no wonder that, even putting personality and character deficiencies aside, your sysadmin looks grumpy at times. Like the Last Electrical Engineer, his work is of infinite weight and importance, but invisible to the known (organizational) universe.
Remember, pressure brings tension.