Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Apr 2023 23:33:04 +0200
From:      Yuri <yuri@aetern.org>
To:        Xin LI <delphij@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc:        current@freebsd.org
Subject:   Re: find(1): I18N gone wild ?
Message-ID:  <6dd71202-4144-8587-b42c-8db44a4b737e@aetern.org>
In-Reply-To: <CAGMYy3tz6iCU_tiE6NHoVPdXOZGtP%2BfskWMrLXyev8SR=xRSqQ@mail.gmail.com>
References:  <202304172106.33HL6RUX051407@critter.freebsd.dk> <CAGMYy3tz6iCU_tiE6NHoVPdXOZGtP%2BfskWMrLXyev8SR=xRSqQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Xin LI wrote:
> This is expected behavior (in en_US.UTF-8 the ordering is AaBb, not
> ABab).  You might want to set LC_COLLATE to C if C behavior is desirable.
> 
> On Mon, Apr 17, 2023 at 2:06 PM Poul-Henning Kamp <phk@phk.freebsd.dk
> <mailto:phk@phk.freebsd.dk>> wrote:
> 
>     This surprised me:
> 
>             # mkdir /tmp/P
>             # cd /tmp/P
>             # touch FOO
>             # touch bar
>             # env LANG=C.UTF-8 find . -name '[A-Z]*' -print
>             ./FOO
>             # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print
>             ./FOO
>             ./bar
> 
>     Really ?!

A bit more detail:

find uses fnmatch(3) here, where the RE Bracket Expression rules apply
(except for ! instead of ^, but that's unrelated):

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

...which has the following note:

7. In the POSIX locale, a range expression represents the set of
collating elements that fall between two elements in the collation
sequence, inclusive. In other locales, a range expression has
unspecified behavior: strictly conforming applications shall not rely on
whether the range expression is valid, or on the set of collating
elements matched.

Indeed, it's unfortunate that collations in non-POSIX are not that...
linear and range expressions can break, but I don't see an easy way of
"fixing" this.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6dd71202-4144-8587-b42c-8db44a4b737e>