FreeBSD Mail Archives

Date:      Sat, 05 Apr 2014 07:35:48 +0400
From:      Andrey Chernov <ache@freebsd.org>
To:        sbruno@freebsd.org
Cc:        Gleb Smirnoff <glebius@FreeBSD.org>, i18n@freebsd.org, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Subject:   Re: login.conf --> UTF-8
Message-ID:  <533F7A14.7060403@freebsd.org>
In-Reply-To: <1396665553.2415.0.camel@powernoodle.corp.yahoo.com>
References:  <1396457629.2280.2.camel@powernoodle.corp.yahoo.com>	 <20140402171546.GL44326@FreeBSD.org> <533C8269.7040305@freebsd.org>	 <20140404124634.GC44326@glebius.int.ru> <533F5DF5.9020803@freebsd.org> <1396665553.2415.0.camel@powernoodle.corp.yahoo.com>

index | next in thread | previous in thread | raw e-mail


[-- Attachment #1 --]
On 05.04.2014 6:39, Sean Bruno wrote:
> On Sat, 2014-04-05 at 05:35 +0400, Andrey Chernov wrote:
>> On 04.04.2014 16:46, Gleb Smirnoff wrote:
>>> On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote:
>>> A> On 02.04.2014 21:15, Gleb Smirnoff wrote:
>>> A> > S> +	:lang=en_US.UTF-8:\
>>> A> > S> +	:charset=UTF-8:
>>> A> > 
>>> A> > And I'd like to do same change for the 'russian' login class
>>> A> > in /etc/login.conf.
>>> A> 
>>> A> Please everybody remember that we don't have UTF-8 collation
>>> A> implemented, just fallback to bytecode comparison.
>>>
>>> Any objections on checking in FreeBSD-compatible[1] UTF-8 collation
>>> implementation from Alex Tutubalin?
>>>
>>> http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html
>>>
>>
>> Even his "version 2" have my objections. I already reply Alex about this
>> in 2008. In short:
>> 1) It is error there: almost all single chars above ASCII should be
>> "chains", i.t. two bytes minimum, since there almost no intersections
>> with ISO8859-1 as UTF-8 subset.
>> 2) The table itself is very incomplete, f.e. not covering either whole
>> KOI8-R, nor ISO8859-5, nor CP866. It is made from CP1251 with all its
>> restrictions. So, switching from f.e. KOI8-R to UTF-8 will cause sorting
>> regression. Russian UTF-8 collation should be able to sort all major
>> Russian charsets mentioned, i.e. we need combined table.
>> 3) "charmap map.ISO8859-1" declaration is missing (needed mainly for
>> using pure ASCII chars mnemonic names).
>>
>> Even in case above mentioned errors will be removed and the code will be
>> committed afterwards, we should understand that this way (implementing
>> multibyte collation via single byte one) even while being possible is a
>> big hack and slowing sorting down up to 10 times.
>>
>> Proper "Unicode collation algorithm" is already implemented by ICU and
>> other projects. See
>> http://unicode.org/reports/tr10/
>> It will be better if someone adopt it instead of hacks.
>>
> 
> 
> If you have a different patch, I'd appreciate seeing it.  

I don't have a different patch. In case you have enough time to fix
above mentioned obstacles, I can review yours (or somebody else's) one.
"No code" situation doesn't mean wrong code can be committed. Do it
properly even when it is a hack.

-- 
http://ache.vniz.net/


[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlM/eiMACgkQVg5YK5ZEdN1tvwCcDf+on6g+N/KZ2c3qD7zxNCmN
YKsAoKt2mzGExaqJxIpkfHhVpzHv1VMp
=1bq4
-----END PGP SIGNATURE-----

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?533F7A14.7060403>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation