Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 May 2001 20:56:10 -0500
From:      Mike Meyer <mwm@mired.org>
To:        Nathan@Vidican.com
Cc:        questions@freebsd.org, "Ted Mittelstaedt" <tedm@toybox.placo.com>
Subject:   RE: email to SQL
Message-ID:  <15100.38970.996390.52851@guru.mired.org>
In-Reply-To: <68112128@toto.iv>

next in thread | previous in thread | raw e-mail | index | archive | help
Ted Mittelstaedt <tedm@toybox.placo.com> types:
> Somewhere there are patches to qmail that make it use a SQL
> server.  You might look at that, maybe there is something you
> can use there.

There's one of those in the ports (qmail-mysql). It, like most such
things, just uses the SQL server for admin data (users info, aliases,
etc.). Delivering to a mailbox is a different critter entirely. There
may be such hacks that actually delivers to an SQL database, but I
wouldn't bet on it - doing that right is a nasty problem.

>From: Nathan Vidican
> >Does anyone happen to know of, (or have), some small utility which will
> >archive email into an SQL table? I'm looking for something that will
> >retrieve the messages either via direct access to the mail spool, or via
> >pop3. I know that I could probably just ripoff a portion of some
> >webmail app
> >to accomplish this, but to be optimistic I figured someone might have
> >already done so, and would be willing to share their code. I would
> >prefer to
> >use C, but PERL will work too.

Well, if you were willing to use Python, their's a pop3 client class
and a mail parser class in the standard library.  That's 90% of the
work; all you have to do is write the message objects attributes into
your database. Last time a client needed this kind of thing from me, I
wrote a sendmail delivery agent and all mail to the domain of interest
was handed to it by sendmail. Worked like a charm.

> >    I will require the code so-as to allow for an indexing of the emails
> >from within a website. I want the website to be able to search for messages
> >based on content and subject. I would prefer not to keep the emails in an
> >archive file similar to the mail spool format because of performance
> >reasons. I figure running an SQL query once the system has 10,000+
> >emails in
> >it will be much faster than trying to search a couple hundred
> >thousand lines
> >of a text file.

I think you've misfigured. The amount of time it takes to search a
text is pretty much determined by the search algorithm, not whether
the text is stored in an SQL server or a flat file. In fact, assuming
the same search algorithm is being used, the flat text file should be
faster. mmap it in and you've got it all to search. Since your text is
be scattered across multiple database rows, it will take more than
that for the SQL server to load it before it can start searching.

The best text search algorithm is to prepare an index of the stuff
before you need to search it. It's possible to store index information
in a database and search those efficiently, but I'm not sure that's
the most efficient tack to take.  Datablades - if mysql has those,
*please* let me know! - might be useful here, but I've not had a
chance to play with them.  Someone who's more current on the issue may
suggest something else.  Unless your requirements are strange, your
best bet is probably using a text search tool of some kind, preferably
one that text that's structured like mail messages.  The best sucess
I've had is with WAIS (there are two versions in the ports), and your
database seems to be small enough for it to handle.

Drop me a note off-list if you want to talk about it some more.

	<mike
--
Mike Meyer <mwm@mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15100.38970.996390.52851>