Thanks to Ozan S. Yigit I found out about a three-article series by Russ Cox on regular expressions:
- Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, …)
- Regular Expression Matching: the Virtual Machine Approach
- Regular Expression Matching in the Wild
I knew about Russ Cox and his interest in regular expressions because of this link to a pdf copy of “Programming Techniques: Regular expression search algorithm” that I had found at his site. Somehow I had missed the articles. Using Ozan’s words “russ cox, like other top-notch cs people, takes a topic and nails it shut. these three papers are more valuable to me than any RE book”.
Yes the articles are that good. However the good news do not stop here. Russ Cox implemented a fast, safe, thread-friendly alternative to backtracking regular expression engines (like those used in PCRE, Perl, and Python) written in C++, called RE2. It even comes with a POSIX (egrep) mode.
The postmaster in me quickly thought of the possibility of implementing a milter that makes use of RE2, just like milter-regex uses traditional regex(3), but my time is so limited by other more pressing projects, that I can only wish that someone else undertakes such a task.