spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Problems with Cyrillic spam

Re: Problems with Cyrillic spam

From: Martin Gregorie <martin_at_nospam>
Date: Thu Dec 15 2011 - 01:17:19 GMT
To: users@spamassassin.apache.org

On Wed, 2011-12-14 at 19:38 -0500, darxus@chaosreigns.com wrote:
> On 12/15, Martin Gregorie wrote:
> > I'm getting spam with the Subject, Sender personal name and body all
> > written in Cyrillic, but, despite having "ok_locales en fr de" defined
> > in local.cf, no rules are fired to mark the message as being in an
> > unwanted language.
>
> Probably related to this:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078
>
> There's also TextCat, which is also broken:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6364
>
> Basically, spamassassin's detection of languages is broken.
>
I agree that it seems to be broken by UTF-8 in the way that bug 4078
describes for Windows codepages.

Could somebody with access to the SA Bugzilla kindly add a comment to
bug 4078 saying that this is also an issue with Cyrillic encoded in
UTF-8? I'm asking because at present #4078 only mentions Windows code
pages and koi8. There is nothing to indicate that this is also a problem
with UTF-8.

Martin