From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 18 10:40:27 2008 Return-Path: Delivered-To: hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 931B8106567D; Wed, 18 Jun 2008 10:40:27 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 41CE58FC12; Wed, 18 Jun 2008 10:40:27 +0000 (UTC) (envelope-from des@des.no) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id 28E0E20BB; Wed, 18 Jun 2008 12:40:25 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Konrad Jankowski References: <20080617002224.GA16122@nagual.pp.ru> <20080617002808.GB16122@nagual.pp.ru> <20080617004647.GA16546@nagual.pp.ru> <48576610.9080808@FreeBSD.org> <48577510.4020007@aueb.gr> <48577BD2.4070205@bluemedia.pl> <20080617102900.GA46479@nagual.pp.ru> <485798C4.2050605@FreeBSD.org> <20080618055851.GA85018@nagual.pp.ru> <86zlpjduew.fsf@ds4.des.no> <20080618083739.GA87100@nagual.pp.ru> <867icndqv5.fsf@ds4.des.no> <4858DBF6.5070001@bluemedia.pl> Date: Wed, 18 Jun 2008 12:40:24 +0200 In-Reply-To: <4858DBF6.5070001@bluemedia.pl> (Konrad Jankowski's message of "Wed\, 18 Jun 2008 11\:57\:10 +0200") Message-ID: <86skvbc9gn.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/23.0.60 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Wed, 18 Jun 2008 10:47:07 +0000 Cc: Doug Barton , current@FreeBSD.org, Andrey Chernov , Diomidis Spinellis , hackers@FreeBSD.org, Gabor Kovesdan , Max Khon , "Sean C. Farley" , K?vesd?n G?bor Subject: Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jun 2008 10:40:27 -0000 Konrad Jankowski writes: > Dag-Erling Sm=C3=B8rgrav writes: > > In any case, this is a libc issue, right? As long as sort / grep > > uses the API correctly, they will work fine once libc is fixed? > Correct. Given sort uses strcoll()/wcscoll()/strxfrm()/wcsxfrm() and > call setlocale(). I don't know about grep. For grep, I believe it should simply be a matter of calling setlocale(), using wide strings, and using a multibyte regex engine (for appropriate values of "simply"). Another thing I'm unsure about is the matter of input and output. Do mbstowcs() / mbtowc() simply trust the input to conform to LC_CTYPE and convert accordingly? When reading UTF, do they recognize and handle BOMs, or simply treat them as zero-width non-breaking space? In the absence of a BOM, do they assume that the input follows the system's native byte order? (IMHO, the API is broken, since there is no way for the same program to simultaneously handle streams with different encodings, but I guess it's too late to fix that) DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no