spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Problems with Cyrillic spam

Re: Problems with Cyrillic spam

From: Karsten Bräckelmann <guenther_at_nospam>
Date: Thu Dec 15 2011 - 01:02:34 GMT
To: users@spamassassin.apache.org

On Thu, 2011-12-15 at 00:09 +0000, Martin Gregorie wrote:
> I'm running SA 3.3.2 and would appreciate knowing how it recognises that
> a message contains a language that is not listed as belonging to an OK
> locale.

It's based on the charset.

For obvious reasons, UTF-8 is excluded here. What would be necessary for
a plugin like this to work with UTF-8 is snooping the content. I once
had a quick look at it -- seems rather straight forward to solve for
Cyrillic, but was much harder e.g. for Chinese chars.

-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}