|Main Archive Page > Month Archives > spamassassin-users archives|
On Wed, 2011-12-14 at 19:38 -0500, firstname.lastname@example.org wrote:
> On 12/15, Martin Gregorie wrote:
> > I'm getting spam with the Subject, Sender personal name and body all
> > written in Cyrillic, but, despite having "ok_locales en fr de" defined
> > in local.cf, no rules are fired to mark the message as being in an
> > unwanted language.
> Probably related to this:
> There's also TextCat, which is also broken:
> Basically, spamassassin's detection of languages is broken.
I agree that it seems to be broken by UTF-8 in the way that bug 4078
describes for Windows codepages.
Could somebody with access to the SA Bugzilla kindly add a comment to
bug 4078 saying that this is also an issue with Cyrillic encoded in
UTF-8? I'm asking because at present #4078 only mentions Windows code
pages and koi8. There is nothing to indicate that this is also a problem