Benford’s Law and email subjects

The first book I ever bought from ISACA‘s bookstore, was Nigrini‘s book on Benford’s Law. Briefly stated the law says that in a series of numbers that occur while observing a phenomenon, numbers starting with 1 are more likely to occur than those starting with 2 which in turn are more likely to appear than those that start with 3 and so on up to numbers starting with 9.

P(n) = \log_{10} (1 + \frac{1}{n}), n = 1, ..., 9

The law stands for other bases too.

I’ve had discussions about Benford’s Law applicability on email data over at twitter with Martijn Grooten, but never run any tests. A few hours back I had an interesting discussion with Theodore which reminded me of the law and so I decided to see whether it stands on a number series related to email. The easiest test I could run was on the length of the Subject: lines. Bellow what follows is a graph of Benford’s distribution and actual data from 376916 mails that passed a certain mail server during last week:

Benford's Law vs. length of Subject: lines

It seems that the length of subject lines follow the pattern. For the sake of speed I have omitted from the computation non-latin subject lines, which means that I have to recompute whenever I find a timeslot longer than 15 minutes. But then again if I am to find such a slot, I think I will try to see whether the message body size also follows a Benfordian distribution. It may be more difficult to verify though because of different mail servers imposing different limits on the size of messages sent and received by them. Oh wait, Sotiris just did that! The rest of the tests mentioned in Nigrini’s book are also worth a try.

So what do your logs say about subject lines’ length and Benford’s Law? Do they follow the pattern? I’d be glad to see your answer in the comments section.

PS: I see that there is now a second edition of Nigrini’s book about to be published!

Game Theory and the Cyber domain

According to this leak:

Russia alleged that an arms control race was unfolding in cyberspace and that constraints on state capabilities were necessary

Now where had I heard that before? It was in 2009 while watching a presentation given by iDefense’s Eli Jellenc. In it he presented the following variation of the Prisoner’s Dilemma:

The Security Dilemma

The basic premise of the model is that efforts to increase your own security makes others insecure. In Cyber warfare it is easier to attack than to defend a complex system (or at least it feels that way since time is on the side of the persisent, patient attacker). It is also very difficult at times to distinguish between offense and defense and the fact of the matter is that both the digital underground and the private sector have well established offensive capabilities for hire. The result of the situation is that everybody is forced to deploy offensive capabilies with a spiral of mistrust being built at the same time as a side effect.

Indeed an example of why such a spiral of death is formed is given in “Strategy and the Revolution of Military Afairs: From Theory to Policy“:

“Why, foreign leaders ask, would the world’s only superpower seek radical improvement of its armed forces in the absence of a clear threat? Given the expense of accumulating national power, some may assume it is meant to be used and conclude that the United States is improving its military capabilities in order to impose its will on others. The United States can either accept such suspicions or find a new, less intimidating method of pursuing the revolution in military affairs, perhaps through greater cooperation with potential allies. The problem is that such cooperation could speed the dissemination of new technology, techniques, and ideas, and thus contribute to the emergence of challengers. But if the United States unilaterally pursues the RMA, other states will respond, whether symmetrically or asymmetrically. In turn, knowing the benign intentions of the United States, American leaders and planners will consider this threatening. Why, they will ask, would other states seek to improve their military capability unless contemplating aggression? Vigorous American pursuit of the RMA may make other nations feel less secure and their response will make the United States feel less secure. The result may be a spiral of mutual misperception and a new arms race, albeit a qualitative rather than quantitative one.”

Ironic how I was scolded in a meeting a couple of months ago for mentioning Game Theroy as a tool to study strategies (“Theory is one thing, reality is another”) when in fact we see how such simple models are suited to study reality.

But what do I know dear officer? In his “How cyberattacks threaten real-world peace” TEDxParis talk (a quick summary of which you can read here), Guy-Philippe Goldstein presented the following 1978 model by Rober Jervis in “Cooperation under the security dilemma“:

Cyberwar Game

As Jervis puts it:

“The fear of being exploited is what drives the security dilemma”

Game Theory and the Cyber Domain? What do I know. I simply read about stuff.

Further reading:

Now I am off to read “Security and Game Theorythanks to Sakis.

“Naturally”

In the fourth edition of the bat book I read:

Naturally, such a recovery should never be necessary if your machine is properly backed up, and if you keep your source files under some form of revision control, such as rcs(1).

Upon reading the passage, my memory triggered and brought to my attention again cvi, a handy little tool by Sotiris Tsimbonis just for this purpose.

Naturally.

sendmail sender queue groups

Sendmail provides for queue groups where one can have messages that stay in queue be placed in separate queues which are treated differently according to rules described in the queuegroup ruleset. FEATURE(queuegroup) helps managing such queues via the access database but unfortunately deals only with recipient addresses. But what if one wants to place messages in a separate (slower) queue based on sender’s address?

QUEUE_GROUP(`newsletter', `N=10, I=31m, P=/storage/queues/n.*')dnl

LOCAL_RULESETS
Squeuegroup
R$*             $: $>canonify $&f
R$* < @ $* > $*         $: $1
Rowner-newsletter         $# newsletter

The above trick does not make use of the access database. In fact you must not use FEATURE(queuegroup) in your sendmail.mc with it. The queuegroup ruleset is called with the recipient address as an argument. The first line replaces it with the sender’s address ($&f) canonified. In this particular newsletter case, we are only interested in the left hand side of the email address ($1). Others may be interested in the sender’s domain ($2). The third line checks to see whether the left hand side matches what we expect (owner-newsletter) and if so, it selects the corresponding queue. Otherwise the default queue, named mqueue, is selected.

For a more complete ruleset that can treat combinations of senders and recipients and via the access database see “Sendmail Extended Queue Groups“.

The sysadmin paradox

The sysadmin paradox, n.:
The fact that when your system administrator is constantly running behind problems is perceived to be working and being productive, as opposed to being perceived as idle while managing a working infrastructure.

Our aim is to eliminate ourselves from the management of the system, to be considered as “not needed” because the system has no problems, therefore we do not work enough. Luckily, whenever (if) this happens, new more complex requirements emerge and the circle continues.

The five most important questions

It was thanks to this post by John D. Cook on abandoning projects that I got interested in Peter Drucker. So I went to ebooks.com and looked up whether there exist any ebook versions of his works. I bumped into “The Five Most Important Questions You Will Ever Ask About Your Organization” which is focused on non-profit and social organizations. Being a public sector worker, the book seemed a natural candidate.

The book expands on an earlier 1992 version written by Drucker and contains essays by him and other experts in the field of management. All essays are centered around five basic questions which as Drucker writes it is important to ask:

“The most important aspect of the Self-Assessment Tool is the questions it poses. Answers are important; you need answers because you need action. But the most important thing is to ask these questions.”

The five questions are:

  1. What is Our Mission?
  2. Who is Our Customer?
  3. What Does the Customer Value?
  4. What Are Our Results?
  5. What Is Our Plan?

Non-profit organizations are about changing lives and these questions are a tool to achieve this. Even without reading the explanatory essays their importance is evident (as is answering them in a sincere way). And while the book itself is not a self-assessment tool for an individual, the questions themselves are a good start.

It is beyond evident to people that know me that the concept of organized abandonment is what I liked most in the book. I’ve been (unsuccessfully) advocating a similar stance within my employer’s organization for years but I had never seen it so clearly articulated until now. Plus this time it is not only me saying this, Drucker said that too, see? IMVHO, organized abandonment is the basic evolution mechanism for organizations (public and private sector).

This is definitely a book I will revisit in six months time. To evaluate its impact on my way of thinking within my own organization and to see whether I managed to pass anything along.

PS: I bought the PDF version of the book by mistake. Normally I try to read ePub versions on my BeBook Mini, but luckily in this case the BeBook rendered the PDF adequately.

staying up late

engineering student

When the image popped in my timeline, I was immediately reminded of Bob Lucky‘s May 1998 essay about Electrical Engineering:

“Electrical engineering will be in danger of shrinking into a neutron star of infinite weight and importance, but invisible to the known universe.”

Others fear that CS might not be far behind. And systems administration, even in its DevOps morph is not far behind too. So while the artist (anybody knows who the artist is?) drew that with engineering students in mind, the image reflects the situation for more.

Happy 2012 to you all.