Russ Cox on regular expressions

Thanks to Ozan S. Yigit I found out about a three-article series by Russ Cox on regular expressions:

I knew about Russ Cox and his interest in regular expressions because of this link to a pdf copy of “Programming Techniques: Regular expression search algorithm” that I had found at his site. Somehow I had missed the articles. Using Ozan’s words “russ cox, like other top-notch cs people, takes a topic and nails it shut. these three papers are more valuable to me than any RE book”.

Yes the articles are that good. However the good news do not stop here. Russ Cox implemented a fast, safe, thread-friendly alternative to backtracking regular expression engines (like those used in PCRE, Perl, and Python) written in C++, called RE2. It even comes with a POSIX (egrep) mode.

The postmaster in me quickly thought of the possibility of implementing a milter that makes use of RE2, just like milter-regex uses traditional regex(3), but my time is so limited by other more pressing projects, that I can only wish that someone else undertakes such a task.

Algorithms on Strings

I was first exposed to string matching by given to read “Algorithms for Finding Patterns in Strings” back in 1990, when I naively asked Prof. Stathis Zachos something like “How does grep work?”.

Time passed, I became a system administrator and most of my exposure to string matching was through scripts and sysadmin stuff automation. Automata are nice, but Perl and shell brought food to the table.

These memories surfaced because I got to read “Algorithms on Strings” in January thanks to Bill Gasarch. Complete, self-contained and with plain and well understood English, the book covers the subject fulfilling simultaneously the needs of those who want to just read the theory, those who want to see the proofs and those who just want to write code.

The pseudocode in the book is understood by anyone who has ever written a single program in C or Java. It either introduces new functions or makes use of others previously defined. This may make it a little difficult at first for people who need to write something described in, for example, chapter six and may find themselves reading from chapter one up to six. In this process the book manages to educate even the programmer who does not care about theory not only about how to do certain functions, but why they are done the way they are. As a plus, references to appropriate Unix shell tools (e.g. diff) are given when appropriate.

A really impressive book, definitely worth your time! A book that you can use both to learn about stuff and as a reference.

The purpose of SMTP-HELO

Years ago D. J. Bernstein wrote “I recommend that server implementors let clients skip HELO, to support a future transition to a world without HELO”. I suppose that anyone who has spend enough time “speaking” SMTP as part of debugging mail systems must have wondered about the need for HELO to even exist in SMTP.

Well it was not always there. RFCs 722 (Sep 1980) and 780 (May 1981) do not include it. It first appears in RFC 788 (Nov 1981). But why?

Back in 2005 in comp.mail.imap Mark Crispin explained why:

The purpose of HELO (and the Received: header line) was to fix a problem that went away with the NCP->TCP transition.

He goes on to explain that in the NCP days the IMPs that relayed messages knew only of the destinations of them and how that could lead to loops delivering the messages to the sender’s machine instead of the recipient’s. HELO solved the loop probelm. The transition from NCP to TCP/IP took place in 1/1/1983 in what is known as the Internet Flag Day. That should have effectively ended the life of HELO. But no, “people felt strongly about making this never happen again” and with the introduction of SMTP:

the SMTP client identified itself (HELO), and you were allowed to barf if the HELO claimed to be yourself since that meant that the network was in loopback.

HELO not only survived, but also a trend emerged as it started to be used as a weak authentication mechanism. People started checking whether the IP addrees of the connecting machine and the argument supplied with HELO had matching A and PTR RRs. This lead to the RFC 1123 prohibition:

However, the receiver MUST NOT refuse to accept a message, even if the sender’s HELO command fails verification.

This prohibition stands even with the current SMTP specification (RFC 5321):

Information captured in the verification attempt is for logging and tracing purposes. Note that this prohibition applies to the matching of the parameter to its IP address only

This is not to be interpreted as that no connection can be rejected based on the argument supplied with HELO. This thread over at RFC Ignorant discusses such valid cases where rejection is possible.

So there, now you not only know the history of HELO and why it was invented, you also know that it is not needed since 1983.

SMTP servers should not require, or ascribe meaning to, HELO or EHLO.

Πυροσβεστικό Μουσείο

Μια και ο Θ. δείχνει ένα ιδιαίτερο ενδιαφέρον για τα “πυροσβεστικά” σήμερα πήγαμε οικογενειακώς στο Πυροσβεστικό Μουσείο. Επειδή είμασταν και οι μοναδικοί επισκέπτες εκείνη την ώρα, είχαμε την τύχη να έχουμε πλήρη ξενάγηση από τον πυροσβέστη που είχε βάρδια στο μουσείο εκείνη την ώρα.

Το μουσείο είναι εντυπωσιακό, ειδικά εάν σκεφτεί κανείς πως προσπαθεί να καλύψει το έργο της Πυροσβεστικής από την ίδρυση του Ελληνικού Κράτους μέχρι και σήμερα. Έχει οχήματα και εξοπλισμό πριν από το 1900, μερικά μάλλιστα είναι πιστοποιημένο ακόμα και σε ποιες φωτιές έχουν χρησιμοποιηθεί. Επειδή ο Θ. έδειξε μεγαλύτερο ενδιαφέρον για τον πιο σύγχρονο (και οικείο σε αυτόν) εξοπλισμό, θα πρέπει να ξαναπάμε, γιατί υπάρχουν αντλίες και άλλοι μηχανισμοί που αξίζουν περισσότερης προσοχής.

Το Πυροσβεστικό Μουσείο είναι ανοιχτό για το κοινό Τετάρτη και Κυριακή, οπότε καλό είναι να κάνετε ένα τηλέφωνο πριν πάτε.

Whither Software Engineering

The July 2009 issue of the IEEE/Computer magazine in its “32 and 16 Years Ago” section remembers that 16 years ago:

Software Engineering (p. 68) “The IEEE Computer Society Board of Governors has approved a motion to establish an ad hoc committee to serve as a steering group for evaluation, planning, coordination, and action related to establishing software engineering as a profession. The action came during the board’s May 21 meeting in Baltimore, Maryland, in conjunction with the International Conference on Software Engineering.”

In the same issue Neville Holmes writes:

“Now software engineering aims to be a branch of engineering, but is finding it difficult to be accepted as such. The problem is that other branches sensibly use the skills and talents of technicians to ensure the success of their professional work. Software engineering doesn’t; it won’t let go of programming”

It took a lot of people and effort to design programming languages and models (procedural, functional, etc) that tried to define how people should practice programming. It took only two pieces of software to make anyone think that is a quality programmer: Access and Visual Basic. So Holmes is right: Let go of programming; it is a lost cause.

After reading one of my posts, John Allen (author of “Anatomy of Lisp“) sent me his unpublished manuscript “Wither Software Engineering” [pdf] and corresponding presentation entitled “More Ballast!” which also deal with the subject of whether Software Engineering is actually a branch of Engineering. You can freely download the pdf slides and audio of an older version of the presentation (Title: History, Mystery and Ballast). In them Allen deals with the transition of traditional Engineering from an experience-based craft to a science-based discipline. Much of the historical data he uses come from “Engineering education in Europe and the USA, 1750-1930“. I have also read “Education, technology, and industrial performance in Europe, 1850-1939” (also translated in Greek) on the subject.

Basically Engineering training followed the path of:

  • Apprenticeship for a long period of time under the supervision of an Engineer
  • Study (and get certified for) the equipment of a specific manufacturer paying a considerable amound of money, and
  • University studies

Does this ring a bell regarding today’s IT arena? It is exactly for this reason that Allen was motivated. Mathematics and Physics transformed traditional Engineering. Can this be done with computation and mathematical logic? His presentation closes with:

It is this kind of education, not Java vocational training, that will bring McCarthy‘s 40+ year old quote to life:

“It is reasonable to hope that the relationship between computation and mathematical logic will be as fruitful in the next century as that between analysis and physics in the past. The development of this relationship demands a concern for both applications and for mathematical elegance.”

At least for programmers we are not there yet. The link between their work and mathematical logic is not obvious for all.

In the closing discussion of HDMS 2009 there was a debate whether “their stuff” could be considered as a branch of Engineering, regardless of liability issues. Alex Labrinidis said “Give us 2000 years to perfect bridge building and then come back asking for liability”. Labrinidis is wrong. Hammurabi solved the issue of Engineering liability back in 1790 BC:

If a builder builds a house for someone, and does not construct it properly, and the house which he built falls in and kills its owner, then the builder shall be put to death.

After the discussion ended Panos Vassiliadis pointed to me Peter J. Denning‘s “Is Software Engineering Engineering?” article which concludes:

“We have not arrived at that point in software engineering practice where we can satisfy all the engineering criteria described in this column. We still need more effective tools, better software engineering education, and wider adoption of the most effective practices. Even more, we need to encourage system thinking that embraces hardware and user environment as well as software.

By understanding the fundamental ideas that link all engineering disciplines, we can recognize how those ideas can contribute to better software production. This will help us construct the engineering reference discipline that Glass tells us is missing from our profession. Let us put this controversy to rest.”

Bertrand Meyer adds that the one sure way to advance software engineering is to “pass a law that requires extensive professional analysis of any large software failure”. Meyer is not alone. “Where are the dead bodies?” asks Derek M. Jones who also writes: “The lack of dead bodies attributed to a software root cause suggests that it is very still early days for the field of high integrity software development.”

There you have it: No dead bodies, no Engineering. Hammurabi knew that long before Engineers did.

You may now want to read “Cargo-cult Engineering” and “It’s not Engineering, Jim“.

#include <std/disclaimer.h>