Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 04 Apr 2014 19:39:13 -0700
From:      Sean Bruno <sbruno@ignoranthack.me>
To:        Andrey Chernov <ache@freebsd.org>
Cc:        Gleb Smirnoff <glebius@FreeBSD.org>, i18n@freebsd.org, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Subject:   Re: login.conf --> UTF-8
Message-ID:  <1396665553.2415.0.camel@powernoodle.corp.yahoo.com>
In-Reply-To: <533F5DF5.9020803@freebsd.org>
References:  <1396457629.2280.2.camel@powernoodle.corp.yahoo.com> <20140402171546.GL44326@FreeBSD.org> <533C8269.7040305@freebsd.org> <20140404124634.GC44326@glebius.int.ru>  <533F5DF5.9020803@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-JSIQsGw2JMYHc19g1Ve3
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2014-04-05 at 05:35 +0400, Andrey Chernov wrote:
> On 04.04.2014 16:46, Gleb Smirnoff wrote:
> > On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote:
> > A> On 02.04.2014 21:15, Gleb Smirnoff wrote:
> > A> > S> +	:lang=3Den_US.UTF-8:\
> > A> > S> +	:charset=3DUTF-8:
> > A> >=20
> > A> > And I'd like to do same change for the 'russian' login class
> > A> > in /etc/login.conf.
> > A>=20
> > A> Please everybody remember that we don't have UTF-8 collation
> > A> implemented, just fallback to bytecode comparison.
> >=20
> > Any objections on checking in FreeBSD-compatible[1] UTF-8 collation
> > implementation from Alex Tutubalin?
> >=20
> > http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_pop=
itka.html
> >=20
>=20
> Even his "version 2" have my objections. I already reply Alex about this
> in 2008. In short:
> 1) It is error there: almost all single chars above ASCII should be
> "chains", i.t. two bytes minimum, since there almost no intersections
> with ISO8859-1 as UTF-8 subset.
> 2) The table itself is very incomplete, f.e. not covering either whole
> KOI8-R, nor ISO8859-5, nor CP866. It is made from CP1251 with all its
> restrictions. So, switching from f.e. KOI8-R to UTF-8 will cause sorting
> regression. Russian UTF-8 collation should be able to sort all major
> Russian charsets mentioned, i.e. we need combined table.
> 3) "charmap map.ISO8859-1" declaration is missing (needed mainly for
> using pure ASCII chars mnemonic names).
>=20
> Even in case above mentioned errors will be removed and the code will be
> committed afterwards, we should understand that this way (implementing
> multibyte collation via single byte one) even while being possible is a
> big hack and slowing sorting down up to 10 times.
>=20
> Proper "Unicode collation algorithm" is already implemented by ICU and
> other projects. See
> http://unicode.org/reports/tr10/
> It will be better if someone adopt it instead of hacks.
>=20


If you have a different patch, I'd appreciate seeing it. =20

Sean

--=-JSIQsGw2JMYHc19g1Ve3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAABAgAGBQJTP2y6AAoJEBkJRdwI6BaHU/UH/AyyXy6LyJLSZlYsuMMtbAYx
HqUYt3k3gsLbfYXGyEJJNFwVBcPwDgUreiHlsQ35+Uiy5eROPyaumemNauS7YS8O
xNyOJHiq/lQ2Rxk2aYEVX0IjrOiiZsm3n75h6qWHfIUyLsGtSPI0sJq0aiLjdJQ3
RHwmRCd1p8zluc17FDjYzoCRFtQpJne1Ttvz5L+0KF6uKPPx2obrvHVLrkQzZtgr
/GAEJn7nDxC5yaAS317V97k1U/QC2XqMXUVOr8W77UbL4bEfq3IgGT1Wi697vBPl
MGRP+hn5KLdXybJ20VKYCG+d4tfu/1sCbrayoN3HdpndQJpaZ6UYySa74bIhSV8=
=0Aje
-----END PGP SIGNATURE-----

--=-JSIQsGw2JMYHc19g1Ve3--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1396665553.2415.0.camel>