Date: Sun, 1 Sep 1996 17:54:21 +0200 From: Wolfram Schneider <wosch@cs.tu-berlin.de> To: =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (Andrey A. Chernov) <ache@nagual.ru> Cc: CVS-committers@freefall.freebsd.org, cvs-all@freefall.freebsd.org, cvs-usrbin@freefall.freebsd.org Subject: Re: cvs commit: src/usr.bin/locate/locate fastfind.c util.c Makefile locate.1 locate.c Message-ID: <199609011554.RAA01063@campa.panke.de> In-Reply-To: <199609011351.RAA03477@nagual.ru> References: <199609011157.NAA00610@campa.panke.de> <199609011351.RAA03477@nagual.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
KOI8-R writes: >Why? Historically? inherent. locate use the 8th bit for bigram compression. >I have a lot of national file/directory names f.e. >It seems locate must be fixed to be 8bit clean... Use GNU locate. GNU moved to a new database format without bigram. The database would be ~30% larger and the search speed would be slower. For what? 99,99% of us don't use Umlauts for file names. You got tons of problems if you use character less than 32 or greather than 127 for file names. Wolfram locate.c * Locate scans a file list for the full pathname of a file given only part * of the name. The list has been processed with with "front-compression" * and bigram coding. Front compression reduces space by a factor of 4-5, * bigram coding by a further 20-25%. * * The codes are: * * 0-28 likeliest differential counts + offset to make nonnegative * 30 switch code for out-of-range count to follow in next word * 128-255 bigram codes (128 most common, as determined by 'updatedb') * 32-127 single character (printable) ascii residue (ie, literal) *
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609011554.RAA01063>