June 19, 2009

Faulty character decoding as the last line of anti-spam defense

I receive spam every day. Filtering is in place and everything, but occasionally some garbage gets through. And then I may look through it, briefly, less than a second perhaps before I hit "Delete", but the eye is fast enough to read and understand more than I'd want to. Then you might say such kamikaze message still had succeeded.

Much of the spam I receive is in Russian. As a side note, Russian characters have multiple encodings - WIN1251, KOI8-R, CP866, ISO-8859-5 and the universal UTF-8 come to mind. This means that the mail client has to properly understand the encoding and decode the message so that it can be displayed correctly.

I use Thunderbird, and it is just awful in decoding Russian messages. I don't have any idea why is that, but I have to manually specify encoding for every last message, because they always appear garbled.

But then, the bug becomes an unexpected feature - the spam messages look undecipherable just like legitimate ones, and even though I look at it, nothing is imprinted in my mind, and I just hit "Delete".