The Unwritten Laws of Engineering

MEMagazine is running a three part series entitled “The Unwritten Laws of Engineering” by W. J. King and James G. Skakoon. W. J. King first published the three series articles in Mechanical Engineering magazine in 1944. Briefly the laws are:

  1. However menial and trivial your early assignments may appear, give them your best efforts.
  2. Demonstrate the ability to get things done.
  3. Develop a “Let’s go see!” attitude.
  4. Don’t be timid—speak up—express yourself and promote your ideas.
  5. Strive for conciseness and clarity in oral or written reports; be extremely careful of the accuracy of your statements.
  6. One of the first things you owe your supervisor is to keep him or her informed of all significant developments.
  7. Do not overlook the steadfast truth that your direct supervisor is your “boss.”
  8. Be as particular as you can in the selection of your supervisor.
  9. Whenever you are asked by your manager to do something, you are expected to do exactly that.
  10. Cultivate the habit of seeking other peoples’ opinions and recommendations.
  11. Promises, schedules, and estimates are necessary and important instruments in a well‑ordered business.
  12. In dealing with customers and outsiders, remember that you represent the company, ostensibly with full responsibility and authority.
  13. Do not try to do it all yourself.
  14. Every manager must know what goes on in his or her domain.
  15. Cultivate the habit of “boiling matters down” to their simplest terms.
  16. Cultivate the habit of making brisk, clean‑cut decisions.
  17. Learn project management skills and techniques, then apply them to the activities that you manage.
  18. Make sure that everyone, managers and subordinates, has been assigned definite positions and responsibilities within the organization.
  19. Make sure that all activities and all individuals are supervised by someone competent in the subject matter involved.
  20. Never misrepresent a subordinate’s performance during performance appraisals.
  21. Make it unquestionably clear what is expected of employees.
  22. You owe it to your subordinates to keep them properly informed.
  23. Never miss a chance to commend or reward subordinates for a job well done.
  24. Always accept full responsibility for your group and the individuals in it.
  25. One of the most valuable personal traits is the ability to get along with all kinds of people.
  26. Never underestimate the extent of your professional responsibility and personal liability.
  27. Let ethical behavior govern your actions and those of your company.
  28. Be aware of the effect that your personal appearance and behavior have on others and, in turn, on you.
  29. Beware of what you commit to writing and of who will read it.
  30. Analyze yourself and your subordinates.
  31. Maintain your employability as well as that of your subordinates.

ASME has published the expanded version of these laws as a book. From the introduction of the book we learn that these laws are the result of direct observation for 17 years in four engineering departments. Also “many of these laws are generalizations to which exceptions occur in special circumstances. There is no thought of urging a servile adherence to rules and red-tape, for there is no substitute for judgement; vigorous individual initiative is needed to cut through formalities in emergencies. But in many respects these laws are like the basic laws of society; they cannot be violated too often with impunity, notwithstanding striking exceptions in individual cases”.

[via]

downsizing

* Initially this post was started because of Sun’s layoffs (18%). Now Sun is no more, and so seems work for a large percentage of the Greek workforce.

Downsizing is to be expected in great numbers. At the scale that this seems it is going to happen, these will not really be informed layoffs, the criteria being simple: You have a high salary (for some definition of high) and your choices may include: retirement, layoff, substantial pay-cut (equals to morale / motivation downsizing) and/or transfer to another organization (in a take it or bye-bye offer).

The short term result of such a massive violent move will of course be proof of elimination of the so called “cost centers”. The mid and long term results will be far more different: The information flow within the organizations will be severely disrupted. There exists the organizational structure and then there exists the informal structure that gets built over time. The departure of a key person can be dealt with by an “unconscious” team auto-configuration. But what about more than one? While small teams communicate more effectively, small teams cannot be made smaller. Time is a limited resource and there is not enough to perform the required analysis before downsizing.

That is the price to be paid for not trying to be lean when you had the chance. And new “cost centers” will emerge.

Appendix: Chalk one up for math (or How even retirement disrupts information flow)

Steinmetz‘ most gratifying moment may have occurred after his retirement. An emergency brought him back to GE’s Schenectady plant to troubleshoot a malfunctioning generator. For days, the hobbled genius pored over drawings with paper and pencil in hand. Finally, he placed a chalk mark on the side of the generator, instructing the repairmen to cut through the casing and remove a number of turns from the stator. It worked.

When asked to submit an invoice, Steinmetz delivered a slip of paper with nothing on it but the surprisingly large figure of $10,000. The accountants, in shock, said they couldn’t process the paperwork without a more detailed breakdown. Steinmetz then forwarded another note on which was typed:

One chalk mark $1. Knowing where to put it $9,999.

A short time later, Steinmetz received his pay in full.

To those that believe that this was not because of disrupted information flow, but because of Steinmetz’ genius, I can only say that I know of cases where retired for decades engineers were called back for consulting due to lack of documentation. And today’s knowledge workers are not better at keeping it either.

Lean Behaviors

I’ve briefly mentioned Emiliani’s “Lean Behaviors” before, but lately I am finding myself coming back to it on a number of occasions. This time it was Al Iverson’s amazement in “What You Suggest Will Kill Email for Everyone“:

It’s amazing to me that some people are so blind to that outcome. A savvy marketer ought to already know that it’s not all that smart to burn up the medium in a way that arrests your future ability to make money from it?

Oh but people are not blind. They just have a different agenda. They aim for short-term profits (and bonuses) and results that last as long as they are part of an organization. Whether their actions set in motion the demise of the organization (which might occur after they have left) is not something to bother them. After all, the organization failed after they left, so it is not their fault, right? Wrong! I quote from “Lean Behaviors”:

Behaving poorly in the workplace makes everyone, including management, ignorant of how well people can actually behave, and results in the evolution of new types of undesirable behavior patterns. Poor behaviors allow people to avoid co-operation, gain personal advantage, and protect personal or departmental interests. These self-serving habits become well-developed over time, resulting in highly skilled but unproductive gamesmanship that no customer would want to pay for. All too often the most highly skilled gameplayers become unwholesome ego-driven role models for future generations. Survival of the fittest, in this context, means the lowest forms of behavior win – but only on a personal level, which is good enough for many people. However, the corporate culture, which mirrors the aggregate of individual behavior of managers, will likely fail to serve the larger community. The result is a deterioration of trust between workers, management, suppliers (Sheridan, 1997), and investors, which can further erode a company’s competitive position. Competitors may also suffer from this, as they now often work together in joint ventures or other co-operative business arrangements. A lack of trust and differences in corporate culture have been cited as primary reasons why collaborative business arrangements often fail or at least fall well below expectations (Kanter, 1994).

(I could quote the whole of the paper, but it is freely available, so go download it)

So it is not blindness. It is about the “take the money and run” attitude. Use whatever half-baked idea seems to bring money on the table regardless of whether it will slain the goose in the long run. With such people switching jobs every three to five years, by that time they will already be aiming at another goose.

On vendor lock-in

(and sometimes open-source vendor lock-in)

Thanks to @nzaharioudakis (whom I had asked whether Debian stable is an adequate platform to run Zimbra on) I remembered the following quote from “Conquest in Cyberspace“:

“The seducer, for instance, could have an information system attractive enough to entice other individuals or institutions to interact with it by, for instance, exchanging information or being granted access. This exchange would be considered valuable; the value would be worth keeping. Over time, one side, typically the dominant system owner, would enjoy more discretion and influence over the relationship, with the other side becoming increasingly dependent. Sometimes the victim has cause to regret entering the relationship; sometimes all victim regrets is not receiving its fair share of the joint benefits. But if the “friendly” conquest is successful, the conqueror is clearly even better off.”

Even though the above is written in cyberwarfare (political) language, the point is very clear and the IBM executive’s phrase becomes well understood:

“Because you don’t want to get locked into an open system”

(One has to keep in mind that the phrase is taken somewhat out of context. Some 20 years ago when he spoke of “open systems” he meant OSI).

I do not want to get locked in any system.

→ “You ALWAYS pay

Dear consultant

Dear (billable by the hour) consultant-

You are brought in to help us find a solution. You are not to bring the one solution that you know and try to fit us in there. You are to find a solution that fits the client, not a client that fits the solution. So next time please present at least two different solutions (see here why), otherwise we are going to bill you for our time instead.

[ Inspired by discussions with colleagues from both the private and public sector ]

How Metcalfe’s Law explains the attitude of your sysadmin (or what you perceive as negative behavior)

A poster over at ServerFault complained about the attitude some sysadmins show towards their users, even when the task seems simple and can take as little as 30 minutes maximum. Many users share similar concerns / complaints:

“Every time I ask a simple request like [simple request], these guys act like i’m asking them to build the great wall of china overnight. I’ve had to do this myself many times, it takes under 30 minutes, and maybe 30 seconds of user interaction.”

Or so the poster thinks. There are enough answers that show why comparing stuff you do on a single system are not to be compared with stuff you do when inserting a new system into an already working web of systems with provisioning and established procedures in place. But even when it is only a matter of 30 minutes, it is also a matter of when these thirty minutes will be devoted. Users do not know about RMS or EDF and do not understand that in an interrupt driven line of work sysadmins use intuitive variants of them. I want to expand however on a comment I posted there which links Metcalfe’s Law to the problem. Metcalfe himself has written about the law:

“[Nobody] has attempted to estimate what I hereby call A, network value’s constant of proportionality in my law, V=A*N^2. Nor has anyone tried to fit any resulting curve to actual network sizes and values.”

For simplicity most refer to the law by using V ~ N^2. Note though that in the same blog post Metcalfe points that the constant A (which we conveniently omit most of the times) may change while N increases and may even be a more complicated function of N. He urges people to look into that.

Metcalfe's original slide presenting the Law, circa 1980

What Metcalfe defines as value, is what we, system administrators, lift for a living. So when a service is down and your sysadmins work like crazy to bring it back rest assured that they already know what is at stake. Metcalfe made sure of that. And that is why it does not really help asking them every ten minutes “When is it going to be up again? We are losing money!” Not only do we know, we do not even need a napkin for our guestimate.

And that is why what for the user is “just another server” or “just one more service” and therefore going from N to N+1, actually means that the load to be lifted increases by 2N+1. No it is not just another server or service for it is not independent. It is inserted in an already complex system and it must be done so in a way that does not affect the stability of the (new) whole. Rolling back, if things fail, is a myth. This is a lot more complicated than your testbed setup which no matter how complex, is simple enough. Consultants and other “out of town experts” routinely make this mistake.

A schematic may make it easier to understand. We all know the corporate pyramid, where “the top” is the target (or the result of the Peter Principle in action) of workers within an organization. But within organizations, a second (inverse) pyramid forms, a pyramid that explains your sysadmin’s day:

A day in the life of your sysadmin

It’s no wonder that, even putting personality and character deficiencies aside, your sysadmin looks grumpy at times. Like the Last Electrical Engineer, his work is of infinite weight and importance, but invisible to the known (organizational) universe.

Remember, pressure brings tension.

serverfault

New assignment for apprentice: Try to answer one question per day from www.serverfault.com

(Note: Asking questions also counts.)

System Administration requires a diverse set of skills that (still) most pick up on the job in a reactive way: Problem occurs, learn what is needed to solve it; if we like the subject dig deeper too. Serverfault is one of those places where people in the profession go for help. Reading questions and answers helps, but answering something helps more. Actually writing an answer (or a question) includes that extra effort that differentiates between it may be solved this way and it is solved this way. Plus there is a whole community that can correct in no time any errors in your answers. You do not even have to know the answer. Just pick up any question you find interesting enough and try to find an answer. The diversity of the questions asked on serverfault makes it virtually impossible to not find at least one (even remotely) interesting every day.

Just pick one. Any. Failure is an option. You do not have to be sysadmin1138 to answer a question, but you can surely become one.

Upgrades: Friday or Sunday?

I think I’ve read about this on sage-members (it is also quite possible that I’ve blogged about it, but a quick search did not reveal anything):

You’ve got a major upgrade ahead of you, one that might take too long to complete and on top of that, the company (your employer) cannot halt while you are at the task. So do you schedule to start the upgrade on Friday evening, or on Sunday morning?

For years I used to opt for Friday evening. But it seems that I was lucky. For as I read in sage-members, what if the upgrade does not complete and you need support? Do you have (verified) support 24×7 for everything involved in the process, including hardware, software, personnel (if shifts are needed)? Even if you do, have you ever tested them? Are the support people you contact on weekends of the quality you expect or simply note takers so that you get an open ticket and a checklist while an actual solution may arrive on Monday evening? What if a simple fan fails and you need to replace it?

Start on Sunday mornings. As a bonus you get a full day for rest and mental preparation.

Unintended consequences

The recent sport related (but unsporting) events bring to mind the point I was trying to make in my previous post: That an organization must rely on its people following rules and processes and not on their display of filotimo (which must be saved for extreme circumstances only).

A decision was made to have the grass surface of the field in the Olympic Stadium of Athens replaced. The works begun and reports in the press showed progress. However, a pump was broken while the person responsible for it was on leave. This event went unnoticed until a friendly match was given between Panathinaikos and Genoa C.F.C. In this match, Djibril Cisse, Panathinaikos’ main fire power was injured. So were two players of Genoa. The pitch was declared unusable and will be replaced after the U2 concert in the Stadium (September 3).

Tomorrow AEK, who also use the Stadium as home, is supposed to play against Dundee United for the Europa League competition. Only now they face the problem of having to find a home stadium for the match to be played. In a controversial for some fans agreement, they decided to use the Nea Smyrni Stadium, home of Panionios FC. Angered by the agreement, Panionios’ fans entered the pitch and made it virtually unusable, not only for AEK, but for Panionios’ home game in Saturday too! Now AEK is supposed to defend its win, using Karaiskakis Stadium, Olympiacos‘ home, with no fans on their side- only fans of Dundee United who traveled from Scotland will watch the game.

A broken pump while a single person was on leave has lead to two damaged soccer fields and a team not having the support of their fans while giving an international competition game at home. This displays the complexity and inter-connectivity between systems in this world in weird and unforeseen ways, where the law of unintended consequence strikes, with a seemingly low priority glitch creating so much havoc because “the system” could not deal with (or even detect) it.

Learn to say “No”

Users consider their needs top priority. Not only that, but when they pick up the phone or press the send button of their email client, they demand immediate service. System Administrators on the other hand are trained (over time) to objectively distinguish between real emergencies (threats to the organization’s business operation if not dealt with) and the rest.

So whenever an urgent situation arises, step back and ask yourself:

– Urgent for whom?
– Why is this urgent?
– Is there a process missing here??

These are important questions, especially if there exists no process covering the situation. Organizations have written workflows that define processes, but operate on the evolution of those rules which are mostly undocumented. If you identify a missing process, your reaction to the matter will create a process, no matter what. Solve the problem as a fireman and you have just created a process with your name hardcoded in it. Not your team, your name. People will look for you.

Identify that it is about a missing process problem that needs to be fixed and everybody will promise to you that it will be dealt with. Only it will not. The next time it arises, they will come back to you, because you did it the first time and now you are (informally) in charge of “those things”.

This is neither good for you nor for your employer. So the need to say “No, I will not fix that this way. Create a process and I will” arises. It is to the benefit of your employer to do so. It makes certain that for this particular situation they are not depended from a single person (you). It also protects your time, weekends and vacation. Explain this to upper management. Learn to say “No” in a productive way. Make sure this is not misunderstood as BOFHiness from your part. Put a price tag on what it means not doing it the formal way.

As a System Administrator you do not only manage the computers in your organization. You manage the people using them too. You manage human-computer systems. And whenever there is a void in the workflow, you need to do your best to create a process. Otherwise it will be created without your intervention. You do not want that. You are the System manager.

Eliminating unnecessary processes is part of the job too, but this is maybe for another blog post.