From owner-cvs-usrbin Sun Sep 1 09:14:43 1996 Return-Path: owner-cvs-usrbin Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id JAA02624 for cvs-usrbin-outgoing; Sun, 1 Sep 1996 09:14:43 -0700 (PDT) Received: from mail.cs.tu-berlin.de (root@mail.cs.tu-berlin.de [130.149.17.13]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id JAA02611; Sun, 1 Sep 1996 09:14:32 -0700 (PDT) Received: from campa.panke.de (anonymous233.ppp.cs.tu-berlin.de [130.149.17.233]) by mail.cs.tu-berlin.de (8.6.12/8.6.12) with ESMTP id RAA00507; Sun, 1 Sep 1996 17:57:52 +0200 Received: (from wosch@localhost) by campa.panke.de (8.6.12/8.6.12) id RAA01063; Sun, 1 Sep 1996 17:54:21 +0200 Date: Sun, 1 Sep 1996 17:54:21 +0200 From: Wolfram Schneider Message-Id: <199609011554.RAA01063@campa.panke.de> To: =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (Andrey A. Chernov) Cc: CVS-committers@freefall.freebsd.org, cvs-all@freefall.freebsd.org, cvs-usrbin@freefall.freebsd.org Subject: Re: cvs commit: src/usr.bin/locate/locate fastfind.c util.c Makefile locate.1 locate.c In-Reply-To: <199609011351.RAA03477@nagual.ru> References: <199609011157.NAA00610@campa.panke.de> <199609011351.RAA03477@nagual.ru> Reply-to: Wolfram Schneider MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-cvs-usrbin@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk KOI8-R writes: >Why? Historically? inherent. locate use the 8th bit for bigram compression. >I have a lot of national file/directory names f.e. >It seems locate must be fixed to be 8bit clean... Use GNU locate. GNU moved to a new database format without bigram. The database would be ~30% larger and the search speed would be slower. For what? 99,99% of us don't use Umlauts for file names. You got tons of problems if you use character less than 32 or greather than 127 for file names. Wolfram locate.c * Locate scans a file list for the full pathname of a file given only part * of the name. The list has been processed with with "front-compression" * and bigram coding. Front compression reduces space by a factor of 4-5, * bigram coding by a further 20-25%. * * The codes are: * * 0-28 likeliest differential counts + offset to make nonnegative * 30 switch code for out-of-range count to follow in next word * 128-255 bigram codes (128 most common, as determined by 'updatedb') * 32-127 single character (printable) ascii residue (ie, literal) *