From owner-cvs-usrbin Sun Sep 1 12:52:32 1996 Return-Path: owner-cvs-usrbin Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id MAA13154 for cvs-usrbin-outgoing; Sun, 1 Sep 1996 12:52:32 -0700 (PDT) Received: from sovcom.kiae.su (sovcom.kiae.su [193.125.152.1]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id MAA13093; Sun, 1 Sep 1996 12:52:17 -0700 (PDT) Received: by sovcom.kiae.su id AA03115 (5.65.kiae-1 ); Sun, 1 Sep 1996 22:47:55 +0300 Received: by sovcom.KIAE.su (UUMAIL/2.0); Sun, 1 Sep 96 22:47:55 +0300 Received: (from ache@localhost) by nagual.ru (8.7.5/8.7.3) id XAA00824; Sun, 1 Sep 1996 23:47:37 +0400 (MSD) Message-Id: <199609011947.XAA00824@nagual.ru> Subject: Re: cvs commit: src/usr.bin/locate/locate fastfind.c util.c Makefile locate.1 locate.c In-Reply-To: <199609011554.RAA01063@campa.panke.de> from "Wolfram Schneider" at "Sep 1, 96 05:54:21 pm" To: wosch@cs.tu-berlin.de Date: Sun, 1 Sep 1996 23:47:36 +0400 (MSD) Cc: CVS-committers@freefall.freebsd.org, cvs-all@freefall.freebsd.org, cvs-usrbin@freefall.freebsd.org From: =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (Andrey A. Chernov) Organization: self X-Class: Fast X-Mailer: ELM [version 2.4ME+ PL25 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-cvs-usrbin@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk [Charset ISO-8859-1 unsupported, filtering to ASCII...] > Use GNU locate. GNU moved to a new database format without bigram. > The database would be ~30% larger and the search speed would > be slower. For what? 99,99% of us don't use Umlauts for > file names. You got tons of problems if you use character less > than 32 or greather than 127 for file names. It is not Umlauts problem, *all* Russian characters lives above 128... I don't have _any_ problems with my Russian file names (expect locate problem) > * 0-28 likeliest differential counts + offset to make nonnegative > * 30 switch code for out-of-range count to follow in next word > * 128-255 bigram codes (128 most common, as determined by 'updatedb') > * 32-127 single character (printable) ascii residue (ie, literal) First visible solution is use some char code to escape 8bit chars in filenames, say 29 code. I.e. 255 char will look like 29 255 sequence. This way not makes pure ASCII database bigger nor decrease speed of searches. Can you live with that? -- Andrey A. Chernov http://www.nagual.ru/~ache/