Date: Wed, 12 Mar 2014 16:45:29 +0100 From: Rolf Nielsen <rmg70swe@yahoo.com> To: stable@freebsd.org Cc: Gerhard Schmidt <estartu@ze.tum.de> Subject: Re: UTF-8 Sorting Message-ID: <53208119.6060009@yahoo.com> In-Reply-To: <53207613.2090801@ze.tum.de> References: <5320297F.1080400@ze.tum.de> <53207451.3010305@yahoo.com> <53207613.2090801@ze.tum.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Gerhard Schmidt skrev 2014-03-12 15:58: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 12.03.2014 15:50, Rolf Nielsen wrote: >> >> >> Gerhard Schmidt skrev 2014-03-12 10:31: >>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>> >>> Hi, >>> >>> I've a problem with FreeBSD, UTF-8 and Sorting. >>> >>> e.g. there is a file with the following content >>> >>> Meier Müller Öger Ofner Schmidt >>> >>> I have set my Terminal to ISO-8859-1 Encoding and call sort on >>> this file I get the following output. >>> >>> Meier Müller Ofner Öger Schmidt >>> >>> Which is correctly sorted. >>> >>> When i change my Terminal to UTF-8 encoding and convert the file >>> to UTF-8 and call sort again I get the following output. >>> >>> Meier Müller Ofner Schmidt Öger >>> >>> which is wrong. >>> >>> The problem seams to be that the LC_COLLATE file in the >>> de_DE.UTF-8 locale is linked to ../la_LN.US-ASCII/LC_COLLATE (as >>> are all LC_COLLATE Files in any UTF-8 locale). >>> >>> After some Research i found a Mail from Kuba Lida in December >>> 2008 (yeah that's 5 Years ago) stating the same Problem and got >>> no response. >>> >>> Why isn't there a UTF-8 LC_COLLATE file for any language. Kuba >>> Lida believed there was a Problem with multibyte collate files in >>> FreeBSD. Is this true and are there plans to fix this problem. >>> >>> The same test under Linux works without problem. >>> >>> Regards Estartu >>> >>> - -- - >> >> Hi, >> >> Hmm, to me the result that you claim is wrong looks perfectly >> correct, however, it may of course differ between languages. In >> Swedish Ö is a separate letter, located last in the alphabet (from >> A to Z we have the exact same alphabet as English, and then come Å, >> Ä and Ö, in that order). > > Yeah, Sweedisch sorts these characters after Z but in German Ö equals > Oe in Names and O in all other cases. There have to be collation > tables for different languages as there are different one for dieffent > languages in ISO encoding. I know that the direfrence in Name and Not > name will not be implementable but the default whould be much of an > improvement. > > The same difference is between German German (de_DE) and Austrian > German (de_AT). > > Regards > Estartu I see. Well, different countries, different customs. :) (I should have included the list in my previous reply, but I hit the wrong button. I apologise for that). Regards, Rolf
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53208119.6060009>