From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 5 11:35:45 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 114A116A41F for ; Mon, 5 Sep 2005 11:35:45 +0000 (GMT) (envelope-from wigry@uninet.ee) Received: from mail.neti.ee (mx1.elion.ee [194.126.101.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id C4DC743D5C for ; Mon, 5 Sep 2005 11:35:40 +0000 (GMT) (envelope-from wigry@uninet.ee) Message-ID: <431C2D8A.2090801@uninet.ee> Date: Mon, 05 Sep 2005 14:35:38 +0300 From: Rein Kadastik User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-hackers@freebsd.org References: <43196C96.6040504@uninet.ee> <20050903101800.GA77285@cirb503493.alcatel.com.au> <43198251.6070606@uninet.ee> <43198354.3000402@uninet.ee> <4319864A.3040706@uninet.ee> <20050903154526.GA1247@gothmog.gr> <431A9606.6010401@uninet.ee> <431C1284.70207@freebsd.org> In-Reply-To: <431C1284.70207@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new-2.2.1 (20041222) (Debian) at neti.ee Subject: Re: sed not working X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Sep 2005 11:35:45 -0000 Tim Robbins wrote: > Rein Kadastik wrote: > >> Giorgos Keramidas wrote: >> >>> On 2005-09-03 14:17, Rein Kadastik wrote: >>> >>> >>>> Rein Kadastik wrote: >>>> >>>> >>>>> Well I have one guess here. In estonian alphabet, the z comes >>>>> immediately after s and before t. So as the regex orders [a-z] the >>>>> characters t, u, v, w, x, y are left out >>>>> >>>>> How to order the sed to use english alphabet? >>>>> >>>> >>>> >>>> Well, My guess was right. I have a following line in the /etc/profile: >>>> >>>> export LANG=et_EE.ISO8859-15 >>>> >>>> After I expoerted LANG=en_US.ISO8859-1, the sed started to work. >>>> >>>> I did not thought that LANG parameter will also alter the alfabet and >>>> therefore the expression [a-z] does not cover the full alphabet >>>> anymore. >>>> >>> >>> >>> >>> By using a character class: >>> >>> [[:alpha:]] >>> >>> AFAIK, if you are using non-English locales, there's no guarantee that >>> [a-z] will be the entire set of lowercase letters, or that it will only >>> include lowercase letters, for that matter. >>> >>> _______________________________________________ >>> freebsd-hackers@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>> To unsubscribe, send any mail to >>> "freebsd-hackers-unsubscribe@freebsd.org" >>> >>> >>> >> Yep, I know but it does not matter. The form [a-z] is used all over >> the place in the FreeBSD source (1629 lines in 4.11-RELEASE-p11 and >> almost 1600 in 5-STABLE). Totally hopeless. Seems, that no developer >> have ever heard about character classes and it VERY UNSAFE to try to >> compile (and actually even run) FreeBSD with some other locale than >> C/en_US.ISO8859-1. >> >> I actually searched for existance of character classes in source >> code. Found around 30 matches. Mostly in manual pages. Perl configure >> script checks if tr supports them, but it actually never uses the >> featuire (even if available). >> >> I am totally dissappointed about this. I thought about reporting a >> bug, but as it is everywhere, there is no point to do so. > > > I think you're blowing things out of proportion. Providing that you > build world as root (which most people do), and that you don't change > the LANG setting for root (think single-user mode), the following > command will give you an approximate idea of which utilities are > affected: > $ find /usr/src -name \*.c | xargs grep -e '".*a-z' -e '".*A-Z' > 25 > > Of these 25 hits, about half are in comments or test code that is > never built. The utilities that are genuinely affected are: kbdmap, > scon, ppp (when using ATM), m4 (in GNU compatibility mode), fdisk, > named, cvs, diff and vi. > > Tim > Well not quite. For starters, the modules that fail for my buildworld are ncurses, csh/tcsh and gdb (interesting that so few as the problem itself is way bigger). Secondly there are not 25 results but a bit more (most of the regex'es are not in .c files). Third, I already sent email to Ruslan and am waiting fore a response. I am fully aware of the size of such a project and quite willing to try to make things better. And BTW my systemwide LANG is set to et_EE.ISO8859-15 which I personally like. As the system provides localization functionality, it must handle it in every situation apropriately (which is not the case right now). Peace -- Rein Rein