From owner-freebsd-questions@FreeBSD.ORG  Tue Jun 22 02:56:59 2004
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 977C416A4CE
	for <questions@freebsd.org>; Tue, 22 Jun 2004 02:56:59 +0000 (GMT)
Received: from out006.verizon.net (out006pub.verizon.net [206.46.170.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4C31B43D31
	for <questions@freebsd.org>; Tue, 22 Jun 2004 02:56:59 +0000 (GMT)
	(envelope-from cswiger@mac.com)
Received: from [192.168.1.3] ([68.161.84.3]) by out006.verizon.net
          (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP
          id <20040622025658.DNF3317.out006.verizon.net@[192.168.1.3]>;
          Mon, 21 Jun 2004 21:56:58 -0500
Message-ID: <40D79FF9.20308@mac.com>
Date: Mon, 21 Jun 2004 22:56:57 -0400
From: Chuck Swiger <cswiger@mac.com>
Organization: The Courts of Chaos
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.7) Gecko/20040608
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Bill Moran <wmoran@potentialtech.com>
References: <20040621132006.2b1a296f.wmoran@potentialtech.com>
	<a22ff294040621115173bad2e0@mail.gmail.com>
	<20040621172520.3544d6fe.wmoran@potentialtech.com>
	<20040621214348.GB63857@happy-idiot-talk.infracaninophile.co.uk>
	<20040621175626.3e762448.wmoran@potentialtech.com> <40D76DA3.9090809@mac.com>
	<20040621204111.6e684d45.wmoran@potentialtech.com>
In-Reply-To: <20040621204111.6e684d45.wmoran@potentialtech.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Authentication-Info: Submitted using SMTP AUTH at out006.verizon.net from
	[68.161.84.3] at Mon, 21 Jun 2004 21:56:58 -0500
cc: questions@freebsd.org
Subject: Re: [OT] Re: What's the best possible email failover solution
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jun 2004 02:56:59 -0000

Bill Moran wrote:
> Chuck Swiger <cswiger@mac.com> wrote:
>>[ I don't think that stuffing email into a database is a particularly good 
>>idea since that means keeping large blobs of non-relational data floating 
>>around, something that the filesystem can do a better job of handling... ]
[ ... ]
> During my research of the IMAP protocol, I determined that _the_best_
> way to store email for high-performance would be to put them in a
> database.  This is because IMAP doesn't see email as a big blob of
> text like POP does.  It sees the headers as one thing, and the
> different MIME parts of the email each as a seperate thing that can
> be fetched independently of the other MIME parts.  This is a pretty
> good layout for a one -> many relationship in a database.  Fact is,
> every current IMAP server that I'm aware of has to break emails
> apart on the fly in order to server IMAP.

There's nothing wrong with applying database concepts to email, and it sounds 
like you want things which take advantage of database replication and 
transaction management and so forth in order to gain reliability, so perhaps 
you will find a DB better suited for your requirements than my comments above 
suggest.

I don't mind being wrong when the result works better for someone.  However, 
please remember that I know you are an optimist if you think I am a pessimist.

:-)

> Now, I could be wrong on this count, as I never wrote the mailserver,
> so my theory could ultimately be proven wrong, but I guess I just
> don't agree with the statement that SQL is a bad way to store email
> until someone has actually proven it.

My concern has less to do with the suitability of using a database to store 
mail as it has to do with database transactions becoming a potential 
bottleneck on the system as a whole.

I've spent a great deal of time in my day job dealing with dynamic websites, 
which mostly means ones driven by content generated by a database.  In my 
experience, you want to provide static content as efficiently as possible, and 
reserve database transactions for persisting changes to state and answering 
relational queries.

The most relevant comparison is one involving a site where people can search 
for images by keyword, which someone was also storing in the database.  The 
idea works fine under light to moderate load, but it turns out that keeping 
just the "relational" part of the image data (name, keywords, etc) and a 
filesystem reference, and generating a link using that path for Apache to 
serve directly scales much better.

	---

In the case of storing email in a DB, while you can break up a mail message 
into headers plus seperate MIME components, are you really going to want to 
decompose each and every mail message in a 3GB mail volume like that? 
Although if you throw enough RAM at a DB so that the entire thing fits into 
main memory, that can produce some spectacular results, and is almost doable 
for this specific case.

Anyway, consider each time someone reads a message from the DB, you'd have to 
do two or three database transactions per message, maybe more, compared with 
read()ing or mmap()ing a single file in an IMAPD and doing strnstr()s for MIME 
boundary seperators in C.  Remember that hitting the DB involves multiprocess 
IPC and adds a lot of latency compared to what a filesystem-based IMAP daemon 
does.

-- 
-Chuck