spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Problems with Cyrillic spam

Re: Problems with Cyrillic spam

From: <darxus_at_nospam>
Date: Thu Dec 15 2011 - 18:14:56 GMT
To: users@spamassassin.apache.org

On 12/15, Martin Gregorie wrote:
> In that case I'm missing some information: how to write a rule that can
> interpret the value(s) returned by TextCat.

I think you're looking for:

ok_languages en fr de

- http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_TextCat.html

> Why wouldn't it be sensible to rewrite ok_locales to compare TextCat
> return value(s) against its list of OK codes?

Because that functionality already exists within TextCat?

> Then why has ok_locales not been fixed already? This is not a criticism,
> just a request for information. Is it something that's difficult to do
> efficiently? I'd imagine that language recognition by looking codepoint
> values is possible but not necessarily fast nor unambiguous.

Because it's not actually broken. That bug should probably be closed.
Perhaps after noting the limited utility in the documentation.

ok_locales functions by identifying character sets that can only be used
for a specific language. UTF8, Windows-1255, and koi8 are not such
character sets, because they can also be used to write in English.

And, most importantly, as Kevin says here, people *do* use those character
sets to write in English:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078#c27

Well, it's obvious that people write English in UTF8.

> I've no time ATM and in any case I'm a middling to poor Perl coder. Now,
> if SA was written in C or Java....

I bet you know that's the best way to get better at a language.

-- "If you are not paranoid... you may not be paying attention." - jimh@creative-net.net, on an IDPA mailing list http://www.ChaosReigns.com