Date: Fri, 7 Apr 2017 01:29:05 -0400 (EDT) From: Garrett Wollman <wollman@hergotha.csail.mit.edu> To: rizzo@i805.com.br Cc: freebsd-current@freebsd.org Subject: Re: problem with ls, not show a correct list Message-ID: <201704070529.v375T5ux031766@hergotha.csail.mit.edu> References: <fe2da09242ff63acb0c62dd0519cfa1f@i805.com.br> <3a8b8ade882d1486aa41b448a9c83b6c@i805.com.br>
next in thread | previous in thread | raw e-mail | index | archive | help
In article <3a8b8ade882d1486aa41b448a9c83b6c@i805.com.br> you write: > > > It's a terrible!!!! Is it a locale bug? Look! > >% locale >LANG=pt_BR.UTF-8 >% touch E >% ls -l [a-z]* >-rw-r--r-- 1 rizzo wheel 0 7 abr 02:06 E No, it's the specification of how character ranges in glob(3) and fnmatch(3) work. In effect, character ranges like [a-z] must be treated as ranges of *collating elements*, not byte ranges, and in your locale, <a> and <A> are considered to be the same collating element, so [a-z] matches both upper- and lower-case Latin letters. This is documented, very obliquely, in sh(1), which also tells you the workaround: a character class. A character class matches any of the characters between the square brackets. A locale-dependent range of characters may be specified using a minus sign. A named class of characters (see wctype(3)) may be specified by surrounding the name with `[:' and `:]'. For example, `[[:alpha:]]' is a shell pattern that matches a single let- ter. So, to match only lower-case letters regardless of your current locale setting, you must use the correct character class: $ locale LANG=pt_BR.UTF-8 LC_CTYPE="pt_BR.UTF-8" LC_COLLATE="pt_BR.UTF-8" LC_TIME="pt_BR.UTF-8" LC_NUMERIC="pt_BR.UTF-8" LC_MONETARY="pt_BR.UTF-8" LC_MESSAGES="pt_BR.UTF-8" LC_ALL= $ ls D E F a b c $ ls [[:lower:]]* a b c The same applies to character class ranges in regular expressions, not just glob(3) patterns. -GAWollman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201704070529.v375T5ux031766>