Russ Cox on regular expressions

Thanks to Ozan S. Yigit I found out about a three-article series by Russ Cox on regular expressions:

I knew about Russ Cox and his interest in regular expressions because of this link to a pdf copy of “Programming Techniques: Regular expression search algorithm” that I had found at his site. Somehow I had missed the articles. Using Ozan’s words “russ cox, like other top-notch cs people, takes a topic and nails it shut. these three papers are more valuable to me than any RE book”.

Yes the articles are that good. However the good news do not stop here. Russ Cox implemented a fast, safe, thread-friendly alternative to backtracking regular expression engines (like those used in PCRE, Perl, and Python) written in C++, called RE2. It even comes with a POSIX (egrep) mode.

The postmaster in me quickly thought of the possibility of implementing a milter that makes use of RE2, just like milter-regex uses traditional regex(3), but my time is so limited by other more pressing projects, that I can only wish that someone else undertakes such a task.

Universal Systems Language

While clearing my IEEE/Computer stack I read about the Universal Systems Language (December 2008 issue). Mind blowing stuff! USL and its Deal-Before-The-Fact methodology have their roots in the Apollo space program:

“We were the luckiest people in the world. There was no choice but to be pioneers. What would later become foundations for USL enabled the Apollo team to create the software for the trip to the moon.”

As is highlighted in the article “Correct use of USL eliminates the majority of errors, including all interface errors within a system modem and its derivatives”.

It is a pity that the 001 Tool Suite seems to cost $9950 :( That way we can only read (and dream) about it.

Algorithms on Strings

I was first exposed to string matching by given to read “Algorithms for Finding Patterns in Strings” back in 1990, when I naively asked Prof. Stathis Zachos something like “How does grep work?”.

Time passed, I became a system administrator and most of my exposure to string matching was through scripts and sysadmin stuff automation. Automata are nice, but Perl and shell brought food to the table.

These memories surfaced because I got to read “Algorithms on Strings” in January thanks to Bill Gasarch. Complete, self-contained and with plain and well understood English, the book covers the subject fulfilling simultaneously the needs of those who want to just read the theory, those who want to see the proofs and those who just want to write code.

The pseudocode in the book is understood by anyone who has ever written a single program in C or Java. It either introduces new functions or makes use of others previously defined. This may make it a little difficult at first for people who need to write something described in, for example, chapter six and may find themselves reading from chapter one up to six. In this process the book manages to educate even the programmer who does not care about theory not only about how to do certain functions, but why they are done the way they are. As a plus, references to appropriate Unix shell tools (e.g. diff) are given when appropriate.

A really impressive book, definitely worth your time! A book that you can use both to learn about stuff and as a reference.

The Kirsch postulate

In “An undetected error, Russell A. Kirsch states “the Kirsch postulate”:

All computers are always, in some sense, “broken.”

How he reached to that assertion is an interesting story that includes moving the SEAC, a logic (wiring) error found out during the move and a lot of debugging that really missed the error.


[†] – “Letters,” Computer, Vol. 42, 04, pp. 6-7, April, 2009.

log

Ήθελα να σχολιάσω το άρθρο αυτό καιρό τώρα:

“Ανάμεσα σε άλλους λόγους, γιατί οι άνθρωποι φέρουν τη γνώση του οργανισμού η οποία είναι πολύτιμη, σπάνια, και άρα, αναντικατάστατη. Μάλιστα πολλές φορές, η γνώση υπάρχει σε μια επιχείρηση σε τόσο αφηρημένη μορφή που δεν μπορεί να κωδικοποιηθεί και να αποθηκευτεί σε μια βάση δεδομένων και άρα να ξαναχρησιμοποιηθεί στο μέλλον.”

Κι όμως, υπάρχουν χώροι (π.χ. Αρχαιολόγοι) που το πρόβλημα αυτό έχει μια, μερική έστω, λύση. Τη λένε ημερολόγιο. Και πράγματι μια τέτοια λύση είναι μερική γιατί ένας άνθρωπος μπορεί να γράφει 10 σελίδες το μήνα και άλλος 10 την εβδομάδα και ταυτόχρονα και η ποιότητα να διαφέρει από γραπτό σε γραπτό, αλλά το ημερολόγιο εξασφαλίζει την παραμονή κάποιας πληροφορίας. Και εάν έχει φροντίσει ο οργανισμός να διαμορφωθεί η κατάλληλη κουλτούρα (ανάγνωση από συναδέλφους, αναφορά σε αυτά εάν υπάρχει κάποιο πρόβλημα προς επίλυση, κ.λπ.) αυτά είναι διαχειρίσιμα προβλήματα.

Το πραγματικό πρόβλημα είναι όμως πως οι information workers δεν έχουν μάθει να δουλεύουν έτσι:

  • Θα γράψουμε documentation αργότερα
  • Θα γράψουμε τα σχόλια στον κώδικα αργότερα
  • Ο κώδικάς μου είναι self-documented, δε χρειάζεται σχόλια

Αργότερα. Όλα αργότερα. Όταν θα έχουμε χρόνο. Μόνο που χρόνο δεν έχουμε ποτέ, γιατί το επόμενο project χτυπάει την πόρτα (ή τη χτυπάμε εμείς φεύγοντας).

blank Subject:

Subject: line occurrences

The graph above (click it to see the full image) displays the occurrences of Subject: lines within a week, for emails that went through an auxiliary outgoing / filtering mail server, set up specifically to deal with a spam outbreak that affected our customers. They were all about pain-killers and other “enhancements”, making it easy to write quick filters on the spot, with the exception of #66, the blank Subject: line.

[ Update: The above means that for that week the most common non-spam outgoing email subject was the blank subject. Some more numbers: 196493 messages passed through that system. The most common spam subject appeared 3588 times. 679 messages had blank subjects. 5 had “(no subject)” as subject. 6707 messages had subjects that appeared less than 100 times each. 5628 of them I would not classify as spam based on the subject line. So 10.7% of the messages that could not be considered as spam based on their subject’s content had empty subject lines. ]

It seems that people are still sending emails without a Subject: line. Although this seems weird to me, to a lot of others it is perfectly natural. Am I alone in believing that one should not send subject-less emails?

1981/02/09

Χτες ήταν η θλιβερή επέτειος. Χάρη στο ψηφιακό αρχείο της Αθλητικής Ηχούς είναι εύκολο να δει κανείς πως καλύφθηκε η τραγωδία από τις πρώτες κιόλας στιγμές. “Η μεγαλύτερη αθλητική τραγωδία του τόπου μας”, Αθλητική Ηχώ, 1981/02/09, σελίδες 1 και 8. Χρειάζεται DjVu.