From owner-cvs-usrbin  Sun Sep  1 12:52:32 1996
Return-Path: owner-cvs-usrbin
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id MAA13154
          for cvs-usrbin-outgoing; Sun, 1 Sep 1996 12:52:32 -0700 (PDT)
Received: from sovcom.kiae.su (sovcom.kiae.su [193.125.152.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id MAA13093;
          Sun, 1 Sep 1996 12:52:17 -0700 (PDT)
Received: by sovcom.kiae.su id AA03115
  (5.65.kiae-1 ); Sun, 1 Sep 1996 22:47:55 +0300
Received: by sovcom.KIAE.su (UUMAIL/2.0); Sun,  1 Sep 96 22:47:55 +0300
Received: (from ache@localhost) by nagual.ru (8.7.5/8.7.3) id XAA00824; Sun, 1 Sep 1996 23:47:37 +0400 (MSD)
Message-Id: <199609011947.XAA00824@nagual.ru>
Subject: Re: cvs commit:  src/usr.bin/locate/locate fastfind.c util.c Makefile locate.1 locate.c
In-Reply-To: <199609011554.RAA01063@campa.panke.de> from "Wolfram Schneider" at "Sep 1, 96 05:54:21 pm"
To: wosch@cs.tu-berlin.de
Date: Sun, 1 Sep 1996 23:47:36 +0400 (MSD)
Cc: CVS-committers@freefall.freebsd.org, cvs-all@freefall.freebsd.org,
        cvs-usrbin@freefall.freebsd.org
From: =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (Andrey A. Chernov) <ache@nagual.ru>
Organization: self
X-Class: Fast
X-Mailer: ELM [version 2.4ME+ PL25 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-cvs-usrbin@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> Use GNU locate. GNU moved to a new database format without bigram.
> The database would be ~30% larger and the search speed would
> be slower. For what? 99,99% of us don't use Umlauts for
> file names. You got tons of problems if you use character less
> than 32 or greather than 127 for file names.

It is not Umlauts problem, *all* Russian characters lives above 128...
I don't have _any_ problems with my Russian file names (expect locate
problem)

>  *      0-28    likeliest differential counts + offset to make nonnegative
>  *      30      switch code for out-of-range count to follow in next word
>  *      128-255 bigram codes (128 most common, as determined by 'updatedb')
>  *      32-127  single character (printable) ascii residue (ie, literal)

First visible solution is use some char code to escape 8bit chars
in filenames, say 29 code.
I.e. 255 char will look like 29 255 sequence. This way not makes
pure ASCII database bigger nor decrease speed of searches.
Can you live with that?

-- 
Andrey A. Chernov
<ache@nagual.ru>
http://www.nagual.ru/~ache/