From owner-freebsd-chat  Sun Dec 22 20:48:12 1996
Return-Path: <owner-chat>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.4/8.8.4) id UAA15682
          for chat-outgoing; Sun, 22 Dec 1996 20:48:12 -0800 (PST)
Received: from time.cdrom.com (root@time.cdrom.com [204.216.27.226])
          by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id UAA15676
          for <freebsd-chat@FreeBSD.org>; Sun, 22 Dec 1996 20:48:10 -0800 (PST)
Received: from time.cdrom.com (jkh@localhost [127.0.0.1]) by time.cdrom.com (8.8.4/8.6.9) with ESMTP id UAA04163; Sun, 22 Dec 1996 20:48:05 -0800 (PST)
To: Marc Slemko <marcs@znep.com>
cc: freebsd-chat@FreeBSD.org
Subject: Re: mailing list archives 
In-reply-to: Your message of "Sun, 22 Dec 1996 21:26:17 MST."
             <Pine.BSF.3.95.961222211544.13336B-100000@alive.ampr.ab.ca> 
Date: Sun, 22 Dec 1996 20:48:05 -0800
Message-ID: <4159.851316485@time.cdrom.com>
From: "Jordan K. Hubbard" <jkh@time.cdrom.com>
Sender: owner-chat@FreeBSD.org
X-Loop: FreeBSD.org
Precedence: bulk

Erm.  I wasn't exactly kidding about the idea of putting things into a
simplistic database of some sort.  Since all *standard* storage
formats suck, and since we have, from the very beginning, also been
archiving this stuff without a whole heck of a lot of regard to how we
might actually *use* the information, doesn't this suggest a new
approach to the problem?

We archive all this mail just *in case* someone might use it, yet we
make almost no provisions for really making it all that easy to search
and view threads of discussion, nor do we provide a meaningful way of
aging and deleting (or archiving) older information.  Databases do all
those things, and they let you easily come up with new ways of viewing
the data as you collect user feedback on what's useful and what's not.
Databases also, in most cases, deal with *large* amounts of data
efficiently.  Seems like our needs to the tenth decimal place.  The
only really big question is - how could we implement something like
this?  There's gotta be at least one database weenie in the crowd
here! :-)

					Jordan

> Since all standard storage formats for mail archives have problems when
> you are dealing with this volume, how about for now just making a snapshot
> of the archives as they are right now available somewhere in whatever form
> they may be stored in.  I don't care if I need to ftp a 500 meg file;
> that's well under an hour if it is coming from wcarchive.  <g>
> 
> Any format will be unmanagable for most people due to sheer volume.  If
> you know what you are looking for, less is a pretty good search utility.
> 
> On Sun, 22 Dec 1996, Jordan K. Hubbard wrote:
> 
> > >    This isn't any improvement, IMO. The files would still be way-too-larg
e
> > > for people to deal with and it doesn't make it any easier to index the
> > > contents. One message per file is the only scheme that addresses these
> > > problems.
> > 
> > Except then we'll almost certainly run out of inodes in the target
> > directory sooner rather than later.  Just judging by the mailstats
> > output and calculating about 90 days ahead, the math does not look
> > promising. :-)
> > 
> > Sigh.  Face it, we need a database. :-)
> > 
> > 					Jordan
> > 
>