tl;dr Finally the format of the Thread-Index: header is documented!
Recently I was in a situation where I had to reconstruct a thread of email messages using the Thread-Index: header which is used by Microsoft’s products, instead of the standard way of threading using Message-Id:, References: and In-Reply-To:
The truth is that I was really frustrated, thinking that Microsoft was breaking the standards using custom headers that do not begin with X- but as Dan Bernstein points out:
“822 promised that the IETF would never define field names beginning with X-. It did not prohibit use of non-X names by other organizations.”
Which means that Microsoft is allowed to add Thread-Index: (and Thread-Topic:) without breaking any standards. On the other hand Microsoft does not document anywhere (at least anywhere I looked and I looked plenty) how Thread-Index: is calculated and how it can be decoded to be made useful by any other application, any other than Outlook that is.
After some experimenting and a little bit of reverse engineering I’ve reached to the following results:
- Thread-Topic: preserves the original subject of the thread, that is the Subject: but stripped from any Fw: or Re: prefixes.
- Thread-Index: is used in a way similar to In-Reply-To: and References: Assuming that the first message in a thread has a:
Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QA==
and the next in thread:
Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbA
while a third one:
Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fw=
and a fourth one:
Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fwAABGXGw==
the pattern that decides the threading seems obvious; I have not yet found out what the single or double equal sign suffix means.
If only Microsoft could make such simple information available! Think of all the lost work hours! Only after I had resolved my problem did I find out about these guys, who had arrived on similar conclusions about the usage of Thread-Index:
Update #1: You may be interested to read the next episode.
Update #2: Yes, I keep refusing the BASE64 explanation. This is because what the BASE64 value decodes to is something either meaningless, or without known semantics.
Update #3: From the GNOME documentation: The value is apparently unique but has no meaning we know of. That is why I refuse the BASE64 explanation. It looks like a BASE64 string and it can get decoded into a string of bytes that one can represent as a number. But the questions remain unanswered: How is the first 27-byte long value chosen? Why every “next” value in a thread 5 bytes longer than the previous one? How are these 5 bytes chosen? The decoded value of an undocumented BASE64 string remains undocumented, hence it may not even be a BASE64 string at all (and may only coincidentally look like one).
The example Thread-Index: headers are taken from the MediaDefender Defenders site