memcached – Managing organized complexity

I like milter-ahead a lot. But in our particular deployment it is not a best fit for it assumes that all the useful information for deciding whether to accept or reject email resides not on the server that it runs on, but in the servers that it queries. This is not milter-ahead’s fault. Milters have no way of expanding aliases while checking the recipient address so the programmer has to use tricks like parsing the output of sendmail -bv user@address thus running a second sendmail process for the same delivery. The alternative would be to hack milter-ahead to check with the alias database the existence of recipient addresses, but doing so the way sendmail reads the alias database is overly complex. One could also write an external daemon to monitor the alias database and inject entries in the (Berkeley DB) database maintained by milter-ahead, but that database is locked exclusively. And yes, exceptions could be entered in the access database, but that would mean maintaining two files for a single (and not so frequent) change in the alias files.

As I’ve blogged before, one of the reasons that I like MIMEDefang is that it gives the Postmaster a full programming language to filter stuff. By simply using md_check_against_smtp_server() a poor man’s non-caching version of milter-ahead is possible. Adding support to read the alias database (be it the text file or the hash table) is also trivial.

But what about the case of busy mail systems? You do not want to hammer your mail servers all the time with queries for which the answer is going to be constant for long periods of time. You need a caching mechanism. At first I thought of implementing such a mechanism the way milter-ahead does: By using a Berkeley DB database and some expiration mechanism, either from within MIMEDefang (retrieve the key and if it should have been expired by now delete it, otherwise proceed as expected) or by an external “garbage collecting” daemon. But such an interface with a clean way to enter keys and values already exists and performs well: memcached. So by using Cache::Memcached within the mimedefang-filter mimicking basic milter-ahead behavior (with caching) was done.

But what about the local aliases in the mail server? After all this was all the fuss that prompted the switch anyway. I wrote a Perl script that opened the alias database using the BerkeleyDB package. Two details need caution here:

The first one is ignoring the invalid @:@ entry in the alias database. You do not see it in the alias text file, but you will see it when you run praliases. Sendmail uses this entry in order to know whether the database is up-to-date or not. See the bat book for a longer discussion of this.
The second detail is that since the alias database is written by a C program, all strings are NULL terminated. This is not the case with strings that are used as keys and values with Perl and the BerkeleyDB package. However the Perl BerkeleyDB package provides for filters to deal with this case. You need something like:
```
$db->filter_fetch_key( sub { s/\0$// } );
```

And then there’s the issue of making such a script a daemon. One can go the traditional way, use a daemonizer on steroids or simply use Proc::Daemon::Init and be done with it.

memcached comes handy to storing key-value pairs in many system administration tasks and I think I’m going to use it a lot more in mail filtering stuff.

From the memcached FAQ:

How can you list all keys?

With memcached, you can’t list all keys. There is a debug interface, but that is not an advisable usage.

I was working on some stuff with MIMEDefang, Cache::Memcached and memcached at $work and stumbled upon just that. I wanted to check what exactly was going on while developing. About two hours after reading the informal text protocol specification for memcached, I had a crude working implementation of set and get in Perl and keys stored in a BerkeleyDB hash so that they could be inspected by external tools like makemap and postmap.

I’ve cut a lot of corners in this implementation, like:

the delete queues are not implemented (yet)
no check is done whether the inserted value is of the declared length in bytes
an inserted value cannot contain a \n
It is not demonizing yet

Give it few nights and enough interest and I think that I may fix those too. So anyway here is the project page and code:

→ https://github.com/a-yiorgos/memcached.pl/

I hope it is useful to at least one more person.

While writing these lines I came up with: Sysadmins do it in Perl, Devops in Python. I do not know how true people may consider this, but indeed Python would have been a much better choice. Oh well, next time.

Tag: memcached

memcached and MIMEDefang – a cool combination

memcached.pl – An (incomplete) implementation in Perl with persistence