From owner-freebsd-questions@FreeBSD.ORG Mon Apr 11 00:06:47 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6BD5816A4CE for ; Mon, 11 Apr 2005 00:06:47 +0000 (GMT) Received: from mail.gmx.net (mail.gmx.de [213.165.64.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 865B243D1D for ; Mon, 11 Apr 2005 00:06:46 +0000 (GMT) (envelope-from m@MHoerich.de) Received: (qmail invoked by alias); 11 Apr 2005 00:06:43 -0000 Received: from pD9514650.dip.t-dialin.net (EHLO localhost) [217.81.70.80] by mail.gmx.net (mp019) with SMTP; 11 Apr 2005 02:06:43 +0200 X-Authenticated: #5114400 Date: Mon, 11 Apr 2005 02:06:42 +0200 From: Mario Hoerich To: Doug Lee , Chuck Swiger , freebsd-questions@freebsd.org Message-ID: <20050411000641.GB620@Pandora.MHoerich.de> References: <20050409203727.GI4670@kirk.dlee.org> <42584A22.9010209@mac.com> <20050409223001.GA58918@kirk.dlee.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050409223001.GA58918@kirk.dlee.org> User-Agent: Mutt/1.4.2.1i X-Y-GMX-Trusted: 0 Subject: Re: Anyone ever consider a filesystem served by MySQL for mail folders? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Apr 2005 00:06:47 -0000 # Doug Lee: [ fixed quote-levels ] > On Sat, Apr 09, 2005 at 05:33:22PM -0400, Chuck Swiger wrote: [ mail storage backed by DB ] > > > > The advantage is that users gets fancy searching. > > > > The disadvantage is that you need to provide around 4 times as much disk > > space for a DB-based mailstore as you would for a normal mbox/maildir style > > representation, you need to provide a lot more server horsepower, you need > > to continuously maintain and purge old mail from the database, and you end > > up with your mail buried in database tables, so heaven help you if the > > database becomes inconsistent and you need to recover. Whereas you can repair mbox-files with your favorite editor and employ pretty much the same level of fancy searching with a couple of scripts. > But as for increased storage requirements, I've always wondered how > much could be saved by an intelligent method of behind-the-scenes > handling of quoting among messages in a thread. Goodness knows half > the mail on a lot of lists, and even in a lot of personal mail > streams, is simply copies of some or all of other messages, perhaps > shifted over by quote signs like `>' etc. Seems to me a system could > be devised to store directions for rebuilding a message instead of the > message itself with all quoting intact. Basically, you could just kill any quotechar, trim headers and store the threads as incremental diffs. You could squeeze redundancy a bit more, but then you'll cry if some bug decides to eat a byte or two. ;) > but I wouldn't be surprised if it could reverse the > increased storage requirements you mention. Probably. What's the gain in all that, though? The mbox-format is simple enough[1], you can just build something to suit your needs in your favorite scripting language. Personally, I'd just build three scripts for that: - The first to interactively insert some headers from within my MUA (mutt, in this instance), i.e. 'X-Archive-Keywords: ' and 'X-Archive-Location: '. - The second to (as a cron-job) i) extract mails from mbox files ii) move them into some kind of archive directory tree (based on the above -location-header, i.e. $TREEBASE/$LOCATION) and iii) store interesting headers inside a DB. - The third for searching and cat(1)ing results to stdout (which in turn is nothing but a new mbox-file). The hard part about this is integrating it into $MUA, but there might be some hook around for that. Actually looks like a perfect mini-project to learn a new language with. ;) Cheers. Mario [1]: IIRC: the header of a mail starts with /^From / and terminates with /^$/ and the other way around for the body of a mail. Can't get more simple than that.