From owner-freebsd-database Mon Mar 30 07:41:18 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id HAA21426 for freebsd-database-outgoing; Mon, 30 Mar 1998 07:41:18 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from tyree.iii.co.uk (tyree.iii.co.uk [195.89.149.230]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA21418; Mon, 30 Mar 1998 07:41:13 -0800 (PST) (envelope-from nik@iii.co.uk) From: nik@iii.co.uk Received: from carrig.strand.iii.co.uk (carrig.strand.iii.co.uk [192.168.7.25]) by tyree.iii.co.uk (8.8.8/8.8.8) with ESMTP id QAA20303; Mon, 30 Mar 1998 16:40:39 +0100 (BST) Received: (from nik@localhost) by carrig.strand.iii.co.uk (8.8.8/8.8.7) id QAA07247; Mon, 30 Mar 1998 16:40:25 +0100 (BST) Message-ID: <19980330164024.47510@iii.co.uk> Date: Mon, 30 Mar 1998 16:40:24 +0100 To: John Fieber Cc: shimon@simon-shapiro.org, Wolfram Schneider , freebsd-database@FreeBSD.ORG, andreas@klemm.gtn.com, scrappy@hub.org, Satoshi Asami , Amancio Hasty Subject: Re: Mailing list search interface References: <19980330110200.17368@iii.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.85e In-Reply-To: ; from John Fieber on Mon, Mar 30, 1998 at 09:48:45AM -0500 Organization: interactive investor Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On Mon, Mar 30, 1998 at 09:48:45AM -0500, John Fieber wrote: > > At the end of the conversion (which consisted of running MHonArc 2.2.0 > > over the files, and then using Glimpse 4.1 to index them) I had a total > > of 32,910K HTML and index files. > > > > The output of 'time -l' on the conversion process was: > > > > 626.11 real 438.83 user 93.13 sys > > On what sort of hardware? 200 Mhz PPro w/64MB of RAM and 256MB of swap. At the time I was running XFree86 3.3.2, Netscape, Xemacs and a dozen or so xterms (tcsh, mutt, slrn). Load hovered around the .9-1.1 mark. Interactive response was fine. My disk is single 2GB Atlas II, with tagged queuing turned *off* (because of buggy firmware which I haven't updated yet). > By quick back-of-an-envelope calculations, this is slower than > the current indexing scheme on hub by at least a factor of 10. The time above was for creation of the HTML archives and for indexing, not just indexing alone. > Indexing anything large is typically an I/O bound operation and > when you start indexing much more than can fit in RAM, your > performance will degrade dramatically, so it is probably slower > by much more than a factor of 10. Don't know. I'll grab last years archive of -hackers (or another one, if there's another you think would be more representative) and try that. I can bring back figures for the time to create the entire archive (and index), the time just to index, and the time to add a new message and then reindex. I'd try this with the whole of the archives, but I don't have the spare disk space (yet). > Three months of -hackers != to 5 years of all the mailings lists. > I am confident that you will find that this scheme becomes a big > hairy hassle when you throw the whole thing at it. True enough. As I say, I'll try it and see. > The ranking algorithm that Glimpse uses (or used last I checked) > is primative. (In an survey of what people liked, hated and most > wanted in the mailing list archives, people wanted thread > searching and date sorting, but only second and third *after* the > currently implemented ranking algorithm, which most people found > to work very well most of the time.) Are those survey results available online somewhere? > It isn't that things like MHonArc are not valliant efforts, but > they are merely refinemests of what is fundamentally a > quick-and-dirty, non-scalable solution. As I hinted in another > message, a proper solution would be based on a hybrid full > text/RDBMS. Whether a true hybrid system is built, or just the > illusion is built using some crafty CGI scripts is a detail to be > worked out. A hybrid system is on my list of things to build here (but it'll be Oracle based). I haven't investigated Postgres enough to know if it's up to the task. N -- Work: nik@iii.co.uk | FreeBSD + Perl + Apache Rest: nik@nothing-going-on.demon.co.uk | Remind me again why we need Play: nik@freebsd.org | Microsoft? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message