From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 18 08:38:10 2008 Return-Path: Delivered-To: hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2BF4F1065678; Wed, 18 Jun 2008 08:38:10 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 659F18FC14; Wed, 18 Jun 2008 08:38:09 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.2/8.14.2) with ESMTP id m5I8beil087291; Wed, 18 Jun 2008 12:37:40 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1213778260; bh=8lG5rJOfqnxSuIlRHEGiKozQBrZtnzGostq1iFQ /5Kk=; l=1379; h=Date:From:To:Cc:Subject:Message-ID:References: MIME-Version:Content-Type:In-Reply-To; b=hjyiizlsjsXKr3hXZufPLD5DZ Ucuqf4gJUrGyx8kVoVnGfOBHLCiXnO92iXUv1FPPblrdUjVg354xr9uo/P7JtXcCbNn M+wQ4C0yijfFuGu1Z4KtpNsAJvr9X/aszPyMbaUJ1iffBjO+Hl11eMnD+Vw/DDbZQtL +HmU0lorQdVg= Received: (from ache@localhost) by nagual.pp.ru (8.14.2/8.14.2/Submit) id m5I8bejW087290; Wed, 18 Jun 2008 12:37:40 +0400 (MSD) (envelope-from ache) Date: Wed, 18 Jun 2008 12:37:39 +0400 From: Andrey Chernov To: Dag-Erling Sm??rgrav Message-ID: <20080618083739.GA87100@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Dag-Erling Sm??rgrav , Gabor Kovesdan , Konrad Jankowski , Diomidis Spinellis , Doug Barton , K?vesd?n G?bor , hackers@FreeBSD.org, current@FreeBSD.org, "Sean C. Farley" , Max Khon References: <20080617002224.GA16122@nagual.pp.ru> <20080617002808.GB16122@nagual.pp.ru> <20080617004647.GA16546@nagual.pp.ru> <48576610.9080808@FreeBSD.org> <48577510.4020007@aueb.gr> <48577BD2.4070205@bluemedia.pl> <20080617102900.GA46479@nagual.pp.ru> <485798C4.2050605@FreeBSD.org> <20080618055851.GA85018@nagual.pp.ru> <86zlpjduew.fsf@ds4.des.no> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86zlpjduew.fsf@ds4.des.no> User-Agent: Mutt/1.5.18 (2008-05-17) X-Mailman-Approved-At: Wed, 18 Jun 2008 10:29:59 +0000 Cc: Doug Barton , current@FreeBSD.org, Konrad Jankowski , Diomidis Spinellis , hackers@FreeBSD.org, Gabor Kovesdan , Max Khon , "Sean C. Farley" , K?vesd?n G?bor Subject: Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jun 2008 08:38:10 -0000 On Wed, Jun 18, 2008 at 10:22:31AM +0200, Dag-Erling Sm??rgrav wrote: > I think part of the problem is that there aren't enough people who truly > understand localization. I think I understand most of it, but I'm > pretty sure I *don't* understand how collation works, or is supposed to > work. Amongst other things, I don't understand how (or whether) it > handles cases like "aa" and "??", which are considered the same letter in > Norwegian. Single byte locales collation works through strcoll() via chains, i.e. seek all chains starting with given letter. Multibyte locales collation currently is not implemented and can't be properly implemented under existen single byte framework (it will consume resourses badly in that case). I know semi-hacking attempts to implement multibyte collattion via single byte one, but all they are only for small ASCII + national alphabet subset, rest of Unicode left unsorted. > Perhaps you could create a Localization page on wiki.freebsd.org which > addresses these issues, or at least points to relevant resources? IMHO single byte collating will be obsolete soon when Unicode collation will be implemented as SoC project, we needs something like ICU library which performs as described below, i.e. unified sorting for all possible chars: http://unicode.org/reports/tr10/ -- http://ache.pp.ru/