FreeBSD Mail Archives

Date:      Fri, 23 Dec 2005 16:58:56 -0500
From:      Louis LeBlanc <FreeBSD@keyslapper.net>
To:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: SPAM Trap
Message-ID:  <20051223215856.GA61699@keyslapper.net>
In-Reply-To: <20051223120440.G5464@seibercom.net>
References:  <20051223120440.G5464@seibercom.net>


[-- Attachment #1 --]
On 12/23/05 12:12 PM, Gerard Seibert sat at the `puter and typed:
> I have been reading about SPAM Traps. Exactly what is a SPAM Trap? I 
> noticed that it seems to be used in conjunction with blacklisting 
> organizations.
> 
> How would one go about setting up one?

Ahh, spam.  A subject near and dear to my heart.  Well, ok, not really,
but certainly one I've spent a lot of time trying to minimize.

I use a honeypot setup to pipe obvious spam through the spamassassin
bayes learner.  Of course, I have broad access to aliases, and I can
have mail delivered to any folder I like using the
user+folder@domain.com extension.  So, I alias some bogus address,
like TrapAddr@mydomain.com to my folder (like user+trap@mydomain.com),
then have procmail intercept it and pipe it directly through
spamassassin to be learned and reported as spam.  Then procmail
ditches it to /dev/null.  I never see the trap, and only those
harvesting addresses on newsgroups are going to send to the address.

Mind you, I am very careful about posting with these.  I have a
specific sig file that describes in detail what will happen to any
message sent to the address.  Since I use mutt to do this, all return
paths lead to the honeypot.  These addresses are only seeded in
postings to newsgroups, but that is more than effective enough.  Pick
any newsgroup, but for best results, focus on those you would never
want your children to frequent.  Make sure your posting does NOT have
any real address at all.  Mutt is best, since you can use the
'set from="honeypot@domain.com"' config, which will ensure all return
path headers use it, and you can explicitly set the From and Reply-To
headers.  That way, the only address harvested is the one you want
harvested.

In my procmail rc file, I catch anything going to the trap folder, log
a '.' to a ~/.honeypot_hits file so I can tell how many hits have
resulted (1163 in the last 3 months, with the last one coming at 8:10
this morning - might be worth reseeding soon).  It's also boosted my
SA bayes accuracy to near perfection - I don't get so much spam at my
"real" address, but what I do get is sorted perfectly over the last 2
years - meaning not one single false positive or false negative in at
least 6 months.

The only real magic, once you've got the aliases down is the procmail
recipe:

##################################
:0
* FOLDER ?? ^^trap^^
{
  VERBOSE=off
  # let's count this message:
  LOGFILE=$HOME/.honeypot_hits
  LOG="."
  LOGFILE=$HOME/.procmail_log
  # uncomment the next line if you log verbose messages
  # VERBOSE=on

  # Report spam.
  # The lock prevents windfalls from knocking the system over
  :0c:honeypot.lock
  | nice -n 20 /usr/local/bin/spamassassin -r

  # Now, teach the bayes db what spam is
  :0:salearn.lock
  | nice -n 20 /usr/local/bin/sa-learn --spam --no-rebuild

  # Now, file it appropriately
  :0
  /dev/null
}
##################################

You may, of course, have to find another way to do this of you don't
have aliasing capabilities, or if you don't have the "plussed folder"
extension available.  In the latter case, you can scan the routing
headers to see what address the message is for - not quite as easy,
but it can be done.

I also have procmail separate spam based on whether it goes over the
autolearn threshold.  If it's autolearned, it goes into the
spam_autolearn folder, and I never bother to look at it.  It is
already automagically trained into the bayes db.  Anything tagged as
spam, but not over that threshold, is put into the spam folder, and
requires a verification.  I simply use mutt or squirrelmail to mark it
as read - if it really is spam, or move it back to the right folder.

Every night, mail in the spam folder that is marked as read is piped
through the learner to teach bayes to count it as spam, and then
backed up into a spam archive folder - named based on the month (like
spam-01-05) - and saved there for 12 months.  After 12 months, this
folder will be removed altogether.  I figure that's long enough to be
sure nobody has sent me anything important.  I do check these from
time to time though, when I'm bored.  So far, 100.00% perfection.

I started using the honeypot way back when I was using Cyrus imapd (3
years ago?), and my false negatives went from about 30/day (out of
around 200 spams) to about 1 every week or so within a month.

About a year ago, I wrote the perl script that manages, archives, and
deletes old spam.  Since then, spam tends to take a *lot* less of my
own time.  So I count all the up front effort as time very well spent.
Currently, I'm only getting around 400 spams/month (not counting
honeypot hits) which is back up from under 100, back when I turned off
one of my domains that had been getting around 2000/month.  That
script has been untouched since February and working well.

I've posted the script on the list I think, but if you're interested,
I'll send it to you offlist (unless I get enough people requesting it
to the list).

HTH

Lou
-- 
Louis LeBlanc                          FreeBSD-at-keyslapper-DOT-net
Fully Funded Hobbyist,                   KeySlapper Extrordinaire :)
Please send off-list email to:         leblanc at keyslapper d.t net
Key fingerprint = C5E7 4762 F071 CE3B ED51  4FB8 AF85 A2FE 80C8 D9A2

meterologist, n.:
  One who doubts the established fact that it is
  bound to rain if you forget your umbrella.

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQFDrHMgr4Wi/oDI2aIRAh0yAJ9NBH/DdMjAGWACU8oOa/fzlYhulgCfRWn3
2P/U0I01ideIxYbINeLhBc0=
=smNX
-----END PGP SIGNATURE-----

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051223215856.GA61699>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation