I analysed a couple of malware samples in the past that arrived via e-mail. I always found the setting of the X-Mailer header of these e-mails something unusual.
The X-Mailer header is set by the sending program and describes the mail client (mail program) that was used to send the message. Note that spammers can insert whatever value they deem necessary. There’s nothing that prevents them to insert bogus data. Also note that some applications do not use X-Mailer but rather use User-Agent (a similar header as the header used by web browsers).
This raised my interest to know what values are most often used for the X-Mailer header in spam messages. Because I was already extracting one header (the X-Mailer) I could as well extract all the other e-mail headers from the spam messages and see if there’s useful information or statistics to learn from these headers.
So I wanted to start with analyzing spam e-mail headers. The e-mails that were analysed were samples found in Gmail spam folder or in a number of .be (Belgium) mail addresses.
Download IMAP mail via Python
I wrote a Python script that collects the e-mail information from an IMAP server, extracts useful information and stores the header information. The script is on Github.
It downloads the available e-mails and then extracts
- Multi part or not
- The number of headers per e-mail
- The content type
- The full length of the message
- An MD5 hash of the message
- The subject, the length of the subject and an MD5 hash of the message
- The headers an values of the headers
- A stripped (no trailing spaces and converted to lowercase) version of the headers and values
In total 34059 e-mails are imported resulting in 1193600 e-mail headers. On average the e-mails had 35 headers. The maximum number of headers found was 538, the minimum number of headers was 16. There were 1005 distinct headers.
The subject length was on average 127 characters, the maximum subject length was 1632 characters and some e-mails had an empty subject.
The top header that was inserted is the Received header. This is no surprise because one e-mail often contains multiple Received headers. These headers describe the e-mail routing path. They can be altered by the sender (spammer), only the headers inserted by your own equipment can be trusted. Note that the x-virus-scanned header was inserted on the receiving part.
Not every e-mail had the X-Priority header set. Those that had it set most often used 3.
There were 17884 List-* headers, 92 with Authenticated-* and 1968 with X-PHP-*
Out of the 34059 e-mails, 6342 had a header set that contained the string “abuse”, that is 19%.
I started this post because of my interest in the use of different X-Mailer headers. From the 34059 e-mails, 9945 had an X-Mailer. That is 29%.
There were 737 e-mails that had the User-Agent header set and 1285 with the X-User-Agent header set.
The top X-Mailer was OEM.
There are a lot of different versions of Outlook Express 6 used as X-Mailer. In total there were 1198 X-Mailer headers that started with Microsoft Outlook Express 6. There were two X-Mailers that only had a version number : 4.2.8* (160 entries) and 1.0.0* (217 entries).
I started this post because I wanted to check if the headers that were used in e-mails to deliver malware returned in other e-mails. It’s important to note that the e-mails that delivered the malware were initially not marked as spam. For this post I only used the e-mails that were tagged as spam. The spam detection engine is either the one used by Google (Gmail) or by Can-It.
Unfortunately, none of the e-mails had an X-Mailer that resembled the mailers found in the malware delivery e-mails.
One unusual pattern did stand out though. A lot of e-mails had four headers with length 10. The header name was what looked like a random string.
cwmlfdjefh|visuh.info$ pzodxmsufj|3909$ papclkjzjd|57$ avnksotlqd|iOL4UdSUMG++P7nTeKL6/RdS4axVAG+9Nl9iHzyEbLk=$
This pattern returned in about 100 spam e-mails. Neither the other headers, the subject or the content returned something unusual, besides the usual spam content.
The strings used as header were for example chfnrexsxc, pnvgftrsqd, snbcgrslgh or xbcgrephjt.
I also observed a similar pattern with header strings of length 14 (pkgygghncsmprhg, otrjgnaschptjsf, vnfaproerfdnvlkg). So far I can not explain what is the source or reason of these headers.
I hoped to see X-Mailer header data that had some resemblances with the ones used in the malware delivery mails. This was not the case.
(spam) E-mails contain a lot of headers, on average 35.
Out of 34059 e-mails only 29% had the X-Mailer set.
Most of the e-mails did not have an “abuse” header (only 19%).
I can’t explain the use of the 10 character header with the random strings.