Date: Wed, 10 Mar 2004 12:07:23 -0500 From: Louis LeBlanc <freebsd@keyslapper.org> To: freebsd-questions@freebsd.org Subject: Re: formail recipe Message-ID: <20040310170723.GA90043@keyslapper.org> In-Reply-To: <20040310162744.GA2081@asu.edu> References: <20040310162744.GA2081@asu.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
I know what you mean. Mine's over 6700, and that's just since 1/1/04. I have no doubt whatsoever there are a good number of people here that have that beat several times over in the same period of time. What I do to trim mine down is just take the oldest messages out. Naturally, this can be tricky since the Date: header is often bogus, but it's a place to start. Come the end of the quarter, I'll be blocking off this archive folder and starting a new one. At that time, I'll be rebuilding my SA bayes db to make sure I have a 'correct' base. The next quarters worth (which I'd like to delude myself to believe will be smaller) will be feed in on a regular basis to keep the bayes db on track. The reason I suggest removing the oldest messages is that spammers seem to evolve their methods, and the bayes db will be most accurate with a more complete picture of CURRENT practices, with those methods no longer being used not affecting the current db. Over the last month, I've seen their evolving methods start sneaking in under the SA radar, and have slowly but surely dropped my threshold down to 1.0 rather than the default 5.0. So far, no FNs, and the FPs have gone away (for now). There will be lots of arguments to the contrary of at least some of what I've said here, but the great thing about all this is you get to decide what approach you have more confidence in. This is the approach I have more confidence in - though I'm open to any method of tweaking that method. Good luck. Lou On 03/10/04 09:27 AM, David Bear sat at the `puter and typed: > Hope I'm not imposing too much on this group.. but since this group > has a collection of the best, brightest, and generous.. > > I wonder if someone might have a formail recipe that would randomly > select N messages from a mailbox of M messages? I have a spam corpus > thats well over 10000 and need to trim it down. > > > -- > David Bear > phone: 480-965-8257 > fax: 480-965-9189 > College of Public Programs/ASU > Wilson Hall 232 > Tempe, AZ 85287-0803 > "Beware the IP portfolio, everyone will be suspect of trespassing" > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" > > -- Louis LeBlanc leblanc@keyslapper.org Fully Funded Hobbyist, KeySlapper Extrordinaire :) http://www.keyslapper.org ԿԬ An age is called Dark not because the light fails to shine, but because people refuse to see it. -- James Michener, "Space"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040310170723.GA90043>