From owner-freebsd-bugs Thu May 7 08:34:56 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id IAA02096 for freebsd-bugs-outgoing; Thu, 7 May 1998 08:34:56 -0700 (PDT) (envelope-from owner-freebsd-bugs@FreeBSD.ORG) Received: from gateman.zeus.leitch.com (gateman.zeus.leitch.com [204.187.61.193]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id IAA02089 for ; Thu, 7 May 1998 08:34:52 -0700 (PDT) (envelope-from woods@tap.zeus.leitch.com) Received: from zeus.leitch.com (tap.zeus.leitch.com [204.187.61.10]) by gateman.zeus.leitch.com (8.8.5/8.7.3/1.0) with ESMTP id LAA17457; Thu, 7 May 1998 11:34:28 -0400 (EDT) Received: from brain.zeus.leitch.com (brain.zeus.leitch.com [204.187.61.32]) by zeus.leitch.com (8.7.5/8.7.3/1.0) with ESMTP id LAA17870; Thu, 7 May 1998 11:34:36 -0400 (EDT) Received: (from woods@localhost) by brain.zeus.leitch.com (8.8.8/8.8.8) id LAA14018; Thu, 7 May 1998 11:34:35 -0400 (EDT) (envelope-from woods@tap.zeus.leitch.com) Date: Thu, 7 May 1998 11:34:35 -0400 (EDT) Message-Id: <199805071534.LAA14018@brain.zeus.leitch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: woods@zeus.leitch.com (Greg A. Woods) To: Wolfram Schneider Cc: "Jordan K. Hubbard" , Randall Hopper , Poul-Henning Kamp , freebsd-bugs@FreeBSD.ORG Subject: Re: bin/5296 In-Reply-To: Wolfram Schneider's message of ", May 5, 1998 23:07:04 +0200" regarding "Re: bin/5296" id References: <199805041546.LAA16661@brain.zeus.leitch.com> <6821.894313171@time.cdrom.com> <199805042253.SAA21201@brain.zeus.leitch.com> X-Mailer: VM 6.45 under Emacs 20.2.1 Reply-To: woods@zeus.leitch.com (Greg A. Woods) Organization: Planix, Inc.; Toronto, Ontario; Canada Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org [ On , May 5, 1998 at 23:07:04 (+0200), Wolfram Schneider wrote: ] > Subject: Re: bin/5296 > > There are currently ~6000 PRs. A linear full text search require ~30 > seconds on disk (~35MB) and 7 seconds if cached in memory. > > Glimpse would not help. Glimpse put the 6000 filenames into 256 > blocks, thats 24 files per block. A search for a word which exists > once require to open (in average) 24 files. A search for a word which > exists in 10 PRs require to open ~200 files ;-( Hmmm. I guess that means using a real full text search engine, which means writing a bit more interface glue code (to stuff new PRs into the full-text engine, and to access PRs given the index search output) and allocating disk space for whatever percentage more the full text database takes (usually at least 50%). This is probably work that should be done directly in GNATS -- it's certainly a feature GNATS could use in general (i.e. not specific to just FreeBSD's needs) Liam Quin's text retrieval package (lq-text) would be a good engine to work with (and it's freely available). BTW, Thanks for doing the analysis of the viability of gimpse... -- Greg A. Woods +1 416 443-1734 VE3TCP Planix, Inc. ; Secrets of the Weird To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message