Spam sorting (was Re: Install assistance needed in Berkeley area)
Nick Moffitt
nick@zork.net
Thu, 8 Jan 2004 11:23:04 -0800
begin Tony Godshall quotation:
> sure, more data is always good. but what about the imbalance issue?
> are you saying it's not a problem? and why should I take your
> opinion on this seriously if you don't offer a rational argument
> against what the SA doc says?
SA is making the assumption that you can use an actual strict
bayesian analysis instead of just writing it off as "naive bayes". I
keep the balance issue worked out mostly by subscribing to a lot of
legitimate mailing lists as well as spam lists. So long as you're not
getting 3 of one kind of mail per day and 300 of the other, you're
probably fine.
And I'm speaking purely anecdotally, true. I do these things,
and I get next to no spam and no false positives.
--
"Forget the damned motor car and build cities for lovers and friends."
-- Lewis Mumford
end