Trying to make use of Outlook’s Thread-Index: header

tl;dr Finally the format of the Thread-Index: header is documented!

Recently I was in a situation where I had to reconstruct a thread of email messages using the Thread-Index: header which is used by Microsoft’s products, instead of the standard way of threading using Message-Id:, References: and In-Reply-To:

The truth is that I was really frustrated, thinking that Microsoft was breaking the standards using custom headers that do not begin with X- but as Dan Bernstein points out:

822 promised that the IETF would never define field names beginning with X-. It did not prohibit use of non-X names by other organizations.”

Which means that Microsoft is allowed to add Thread-Index: (and Thread-Topic:) without breaking any standards. On the other hand Microsoft does not document anywhere (at least anywhere I looked and I looked plenty) how Thread-Index: is calculated and how it can be decoded to be made useful by any other application, any other than Outlook that is.

After some experimenting and a little bit of reverse engineering I’ve reached to the following results:

  • Thread-Topic: preserves the original subject of the thread, that is the Subject: but stripped from any Fw: or Re: prefixes.
  • Thread-Index: is used in a way similar to In-Reply-To: and References: Assuming that the first message in a thread has a:
    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QA==

    and the next in thread:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbA

    while a third one:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fw=

    and a fourth one:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fwAABGXGw==

    the pattern that decides the threading seems obvious; I have not yet found out what the single or double equal sign suffix means.

If only Microsoft could make such simple information available! Think of all the lost work hours! Only after I had resolved my problem did I find out about these guys, who had arrived on similar conclusions about the usage of Thread-Index:

Update #1: You may be interested to read the next episode.

Update #2: Yes, I keep refusing the BASE64 explanation. This is because what the BASE64 value decodes to is something either meaningless, or without known semantics.

Update #3: From the GNOME documentation: The value is apparently unique but has no meaning we know of. That is why I refuse the BASE64 explanation. It looks like a BASE64 string and it can get decoded into a string of bytes that one can represent as a number. But the questions remain unanswered: How is the first 27-byte long value chosen? Why every “next” value in a thread 5 bytes longer than the previous one? How are these 5 bytes chosen? The decoded value of an undocumented BASE64 string remains undocumented, hence it may not even be a BASE64 string at all (and may only coincidentally look like one).

The example Thread-Index: headers are taken from the MediaDefender Defenders site

16 thoughts on “Trying to make use of Outlook’s Thread-Index: header

  1. To the next person that will insist that they are base64 encoded:

    1- Please try and decode the string. Then come up with a meaningful explanation of the result.

    2- Still not convinced? Read this paper [pdf].

    3- Still not convinced? Read my next post on the subject.

  2. The above people ARE (sort of) correct.

    base64 is NOT for encoding strings. it is for encoding BINARY data in an ascii string, to make it safe for ascii mode data transfer (as is required by SMTP)

    The result still looks meaningless when decoded, because your trying to read it as a string, when it is infact, binary data.

    1. It is OK for you to believe that I did not try to see it as binary data because you do not know me. The above people (including you) are NOT correct, or are leaving guesswork to the reader. One can give any x as input to an f(x) and get some output. This is the case here with base64. However, as long as someone is not telling what the resulting binary data stands for, it still is undocumented garbage.

      So please provide an explanation of the binary data we are all looking at in order to provide a correct (and complete) answer. What is it that you are seeing there? Is it a number? A random byte sequence? Something else? But do not tell me that it is binary data without telling me what it stands for. Everything can be viewed as binary data if it suits the purpose. So the question remains.

  3. it’s an OLE timestamp (22 bytes), appended with timediffs (5 bytes). which sucks, because the timestamp is not guaranteed unique.

  4. LOL – nice one Mrten.

    @adamo- When it looks like Base64, smellls like Bas64, Tastes like Bas64, and *everyone* else tells you, time and again, it’s Base64, you looks pretty foolish screaming over-and-over that it’s not base64, just because you didn’t comprehend what’s inside.

    Heck – it’s a timestamp – you can’t get much easier than that to figure out. Send. Wait 60 seconds. Send again. Oh look – the base-64-decoded number increased by 60. I wonder what it could mean :-)

    1. You are making the assumption that I was using Outlook when I wrote the post. I am not an Outlook user. I had a mail archive though that I wanted to work on which had several messages with Thread-Index: headers.

      You are trying to make a fuss over my BASE64 complaint. Yes I do understand that it is BASE64 encoded data. So what? You tell me that these 22 bytes are a timestamp. Is it an integer? From what epoch? Does it matter if an epoch exists? Why are five more bytes needed to describe the difference from the previous timestamp?

      You are trying to scold me for my BASE64 complaints yet you provide no useful information to me or the readers, do you?

      Thank you for your time.

      1. The very fact that it’s base64 implies that it’s a binary format. The equal signs just mean that the number of encoded bytes is not always a multiple of 3. I think that if you are trying to reverse engineer the meaning and you find that all the example values are valid base64 (they wouldn’t be if the equal signs were missing for non-multiple-of-3-bytes samples, for example), it is very likely that you should accept that it is a string of base64-encoded bytes.

        The fact that you don’t know what the byte string means doesn’t mean the data isn’t base64-encoded. In fact, you can resume trying to reverse engineer with the byte strings as a starting point.

        @Mrten’s information sounds plausible to me. Even without an Outlook client, it would be possible to correlate and get that information by looking at Date headers, but I would be unlikely to stumble across that correlation myself.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s