Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 05 Sep 2005 19:40:20 +1000
From:      Tim Robbins <tjr@freebsd.org>
To:        Rein Kadastik <wigry@uninet.ee>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: sed not working
Message-ID:  <431C1284.70207@freebsd.org>
In-Reply-To: <431A9606.6010401@uninet.ee>
References:  <43196C96.6040504@uninet.ee>	<20050903101800.GA77285@cirb503493.alcatel.com.au>	<43198251.6070606@uninet.ee>	<43198354.3000402@uninet.ee>	<4319864A.3040706@uninet.ee>	<20050903154526.GA1247@gothmog.gr> <431A9606.6010401@uninet.ee>

next in thread | previous in thread | raw e-mail | index | archive | help
Rein Kadastik wrote:

> Giorgos Keramidas wrote:
>
>> On 2005-09-03 14:17, Rein Kadastik <wigry@uninet.ee> wrote:
>>  
>>
>>> Rein Kadastik wrote:
>>>   
>>>
>>>> Well I have one guess here. In estonian alphabet, the z comes
>>>> immediately after s and before t. So as the regex orders [a-z] the
>>>> characters t, u, v, w, x, y are left out
>>>>
>>>> How to order the sed to use english alphabet?
>>>>     
>>>
>>> Well, My guess was right. I have a following line in the /etc/profile:
>>>
>>> export LANG=et_EE.ISO8859-15
>>>
>>> After I expoerted LANG=en_US.ISO8859-1, the sed started to work.
>>>
>>> I did not thought that LANG parameter will also alter the alfabet and
>>> therefore the expression [a-z] does not cover the full alphabet 
>>> anymore.
>>>   
>>
>>
>> By using a character class:
>>
>>     [[:alpha:]]
>>
>> AFAIK, if you are using non-English locales, there's no guarantee that
>> [a-z] will be the entire set of lowercase letters, or that it will only
>> include lowercase letters, for that matter.
>>
>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to 
>> "freebsd-hackers-unsubscribe@freebsd.org"
>>
>>  
>>
> Yep, I know but it does not matter. The form [a-z] is used all over 
> the place in the FreeBSD source (1629 lines in 4.11-RELEASE-p11 and 
> almost 1600 in 5-STABLE). Totally hopeless. Seems, that no developer 
> have ever heard about character classes and it VERY UNSAFE to try to 
> compile (and actually even run) FreeBSD with some other locale than 
> C/en_US.ISO8859-1.
>
> I actually searched for existance of character classes in source code. 
> Found around 30 matches. Mostly in manual pages. Perl configure script 
> checks if tr supports them, but it actually never uses the featuire 
> (even if available).
>
> I am totally dissappointed about this. I thought about reporting a 
> bug, but as it is everywhere, there is no point to do so.

I think you're blowing things out of proportion. Providing that you 
build world as root (which most people do), and that you don't change 
the LANG setting for root (think single-user mode), the following 
command will give you an approximate idea of which utilities are affected:
$ find /usr/src -name \*.c | xargs grep -e '".*a-z' -e '".*A-Z'
      25

Of these 25 hits, about half are in comments or test code that is never 
built. The utilities that are genuinely affected are: kbdmap, scon, ppp 
(when using ATM), m4 (in GNU compatibility mode), fdisk, named, cvs, 
diff and vi.

Tim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?431C1284.70207>