Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Sep 1996 23:47:36 +0400 (MSD)
From:      =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (Andrey A. Chernov) <ache@nagual.ru>
To:        wosch@cs.tu-berlin.de
Cc:        CVS-committers@freefall.freebsd.org, cvs-all@freefall.freebsd.org, cvs-usrbin@freefall.freebsd.org
Subject:   Re: cvs commit:  src/usr.bin/locate/locate fastfind.c util.c Makefile locate.1 locate.c
Message-ID:  <199609011947.XAA00824@nagual.ru>
In-Reply-To: <199609011554.RAA01063@campa.panke.de> from "Wolfram Schneider" at "Sep 1, 96 05:54:21 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> Use GNU locate. GNU moved to a new database format without bigram.
> The database would be ~30% larger and the search speed would
> be slower. For what? 99,99% of us don't use Umlauts for
> file names. You got tons of problems if you use character less
> than 32 or greather than 127 for file names.

It is not Umlauts problem, *all* Russian characters lives above 128...
I don't have _any_ problems with my Russian file names (expect locate
problem)

>  *      0-28    likeliest differential counts + offset to make nonnegative
>  *      30      switch code for out-of-range count to follow in next word
>  *      128-255 bigram codes (128 most common, as determined by 'updatedb')
>  *      32-127  single character (printable) ascii residue (ie, literal)

First visible solution is use some char code to escape 8bit chars
in filenames, say 29 code.
I.e. 255 char will look like 29 255 sequence. This way not makes
pure ASCII database bigger nor decrease speed of searches.
Can you live with that?

-- 
Andrey A. Chernov
<ache@nagual.ru>
http://www.nagual.ru/~ache/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609011947.XAA00824>