Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 07 Apr 2017 03:06:33 -0300
From:      =?UTF-8?Q?Nilton_Jos=C3=A9_Rizzo?= <rizzo@i805.com.br>
To:        Garrett Wollman <wollman@hergotha.csail.mit.edu>
Cc:        freebsd-current@freebsd.org
Subject:   Re: problem with ls, not show a correct list
Message-ID:  <b9c828ee4f6e8fa57ba1056947783edc@i805.com.br>
In-Reply-To: <201704070529.v375T5ux031766@hergotha.csail.mit.edu>
References:  <fe2da09242ff63acb0c62dd0519cfa1f@i805.com.br> <3a8b8ade882d1486aa41b448a9c83b6c@i805.com.br> <201704070529.v375T5ux031766@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Em 2017-04-07 02:29, Garrett Wollman escreveu:
> In article <3a8b8ade882d1486aa41b448a9c83b6c@i805.com.br> you write:
>> 
>> 
>>   It's a terrible!!!! Is it a locale bug? Look!
>> 
>> % locale
>> LANG=pt_BR.UTF-8
>> % touch E
>> % ls -l [a-z]*
>> -rw-r--r--  1 rizzo  wheel  0  7 abr 02:06 E
> 
> No, it's the specification of how character ranges in glob(3) and
> fnmatch(3) work.  In effect, character ranges like [a-z] must be
> treated as ranges of *collating elements*, not byte ranges, and in
> your locale, <a> and <A> are considered to be the same collating
> element, so [a-z] matches both upper- and lower-case Latin letters.
> This is documented, very obliquely, in sh(1), which also tells you the
> workaround:
> 
>      a character class.  A character class matches any of the 
> characters
>      between the square brackets.  A locale-dependent range of 
> characters may
>      be specified using a minus sign.  A named class of characters (see
>      wctype(3)) may be specified by surrounding the name with `[:' and 
> `:]'.
>      For example, `[[:alpha:]]' is a shell pattern that matches a 
> single let-
>      ter.
> 
> So, to match only lower-case letters regardless of your current locale
> setting, you must use the correct character class:
> 
> 	$ locale
> 	LANG=pt_BR.UTF-8
> 	LC_CTYPE="pt_BR.UTF-8"
> 	LC_COLLATE="pt_BR.UTF-8"
> 	LC_TIME="pt_BR.UTF-8"
> 	LC_NUMERIC="pt_BR.UTF-8"
> 	LC_MONETARY="pt_BR.UTF-8"
> 	LC_MESSAGES="pt_BR.UTF-8"
> 	LC_ALL=
> 	$ ls
> 	D       E       F       a       b       c
> 	$ ls [[:lower:]]*
> 	a       b       c
> 
> The same applies to character class ranges in regular expressions, not
> just glob(3) patterns.
> 
> -GAWollman


   It's only work in SH, in C shell (or tcsh) not work
    and it's not work if I need to do this:

    I think this not correct.

% setenv LANG C
% echo "Using C " && ls && echo "---" && ls [a-c,k-m]*
Using C
A       a       d       g       j       m       p       s       v       
y
D       b       e       h       k       n       q       t       w       
z
E       c       f       i       l       o       r       u       x
---
a       b       c       k       l       m
% setenv LANG pt_BR.UTF-8
% echo "Using pt_BR.UTF-8" && ls && echo "---" && ls [a-c,k-m]*
Using pt_BR.UTF-8
a       c       e       g       j       m       p       s       v       
y
A       d       E       h       k       n       q       t       w       
z
b       D       f       i       l       o       r       u       x
---
a       A       b       c       k       l       m
% sh
$ ls [a-c,k-l]*
a       A       b       c       k       l
$ ls [[:lower:]a-c,k-l]*
a       c       f       i       l       o       r       u       x
A       d       g       j       m       p       s       v       y
b       e       h       k       n       q       t       w       z


If I'll use the rm command I'll erase file that not match with my 
selection.

Imagine if I has a script to work in batch mode and it's occur, can be a 
too
dangerous.



-- 
********************************************************
* Nilton José Rizzo     Sistema de Informação    UFRRJ *
* http://cursos.ufrrj.br/grad/sistemas/                *
* lattes:http://lattes.cnpq.br/0079460703536198        *
********************************************************



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b9c828ee4f6e8fa57ba1056947783edc>