Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Sep 1996 17:54:21 +0200
From:      Wolfram Schneider <wosch@cs.tu-berlin.de>
To:        =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (Andrey A. Chernov) <ache@nagual.ru>
Cc:        CVS-committers@freefall.freebsd.org, cvs-all@freefall.freebsd.org, cvs-usrbin@freefall.freebsd.org
Subject:   Re: cvs commit:  src/usr.bin/locate/locate fastfind.c util.c Makefile locate.1 locate.c
Message-ID:  <199609011554.RAA01063@campa.panke.de>
In-Reply-To: <199609011351.RAA03477@nagual.ru>
References:  <199609011157.NAA00610@campa.panke.de> <199609011351.RAA03477@nagual.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
KOI8-R writes:
>Why? Historically? 

inherent. locate use the 8th bit for bigram compression.


>I have a lot of national file/directory names f.e.
>It seems locate must be fixed to be 8bit clean...

Use GNU locate. GNU moved to a new database format without bigram.
The database would be ~30% larger and the search speed would
be slower. For what? 99,99% of us don't use Umlauts for
file names. You got tons of problems if you use character less
than 32 or greather than 127 for file names.

Wolfram

locate.c
 * Locate scans a file list for the full pathname of a file given only part
 * of the name.  The list has been processed with with "front-compression"
 * and bigram coding.  Front compression reduces space by a factor of 4-5,
 * bigram coding by a further 20-25%.
 *
 * The codes are:
 *
 *      0-28    likeliest differential counts + offset to make nonnegative
 *      30      switch code for out-of-range count to follow in next word
 *      128-255 bigram codes (128 most common, as determined by 'updatedb')
 *      32-127  single character (printable) ascii residue (ie, literal)
 *



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609011554.RAA01063>