Spam sorting (was Re: Install assistance needed in Berkeley area)

Nick Moffitt
Thu, 8 Jan 2004 11:23:04 -0800

begin  Tony Godshall  quotation:
> sure, more data is always good.  but what about the imbalance issue?
> are you saying it's not a problem?  and why should I take your
> opinion on this seriously if you don't offer a rational argument
> against what the SA doc says?

	SA is making the assumption that you can use an actual strict
bayesian analysis instead of just writing it off as "naive bayes".  I
keep the balance issue worked out mostly by subscribing to a lot of
legitimate mailing lists as well as spam lists.  So long as you're not
getting 3 of one kind of mail per day and 300 of the other, you're
probably fine.

	And I'm speaking purely anecdotally, true.  I do these things,
and I get next to no spam and no false positives.

