Spam sorting (was Re: Install assistance needed in Berkeley area)
Nick Moffitt
nick@zork.net
Mon, 5 Jan 2004 18:00:43 -0800
begin Tony Godshall quotation:
> Unfortunately bayesian filtering has become a bit less
> effective lately (more spam gets through) as spammers are
> using random misspellings and garbage words to evade. But
> hopefully someone will come up with a fix for that (perhaps
> grouping rarely-seen words together rather than ranking them
> separately).
Nope. You just don't get enough spam. Bayesian filtering
eventually flags those nonsense words as spammy words, and you don't
need to worry. Remember, and repeat after me: The more spam you GET,
the less you have to READ!
> I only use manual training. Supposedly it's best to use
> approximatly equal volume of spam and non-spam in the
> training. spamassassin's FAQ (I think) recommends a training
> with 1000+ messages each (spam and non). Also, it's easy to
> reverse a mistake, as using sa-learn --ham reverses the
> effect of sa-learn --spam and vise versa.
That's why your bayesian filter isn't adapting. You're not
letting it learn.
If it makes a mistake, correct it. But don't keep it in the
dark all the time!
--
"Forget the damned motor car and build cities for lovers and friends."
-- Lewis Mumford
end