Spam sorting (was Re: Install assistance needed in Berkeley area)

Tony Godshall togo@of.net
Sun, 11 Jan 2004 12:22:10 -0800


According to Nick Moffitt,
> begin  Tony Godshall  quotation:
> > sure, more data is always good.  but what about the imbalance issue?
> > are you saying it's not a problem?  and why should I take your
> > opinion on this seriously if you don't offer a rational argument
> > against what the SA doc says?
> 
> 	SA is making the assumption that you can use an actual strict
> bayesian analysis instead of just writing it off as "naive bayes".  I
> keep the balance issue worked out mostly by subscribing to a lot of
> legitimate mailing lists as well as spam lists.  So long as you're not
> getting 3 of one kind of mail per day and 300 of the other, you're
> probably fine.
> 
> 	And I'm speaking purely anecdotally, true.  I do these things,
> and I get next to no spam and no false positives.

Ah, thanks, Tim!  That makes a lot more sense.

No I just have to google around for or work out the procmail recipe 
(unless you want paste me a sample of yours to let lazy me crib of you).

Say, any quick way to count the number of messages in a mbox file?  
Is  "grep -c '^From' file" sufficient?

Tony