Date: Mon, 17 Apr 2023 23:33:04 +0200 From: Yuri <yuri@aetern.org> To: Xin LI <delphij@gmail.com>, Poul-Henning Kamp <phk@phk.freebsd.dk> Cc: current@freebsd.org Subject: Re: find(1): I18N gone wild ? Message-ID: <6dd71202-4144-8587-b42c-8db44a4b737e@aetern.org> In-Reply-To: <CAGMYy3tz6iCU_tiE6NHoVPdXOZGtP%2BfskWMrLXyev8SR=xRSqQ@mail.gmail.com> References: <202304172106.33HL6RUX051407@critter.freebsd.dk> <CAGMYy3tz6iCU_tiE6NHoVPdXOZGtP%2BfskWMrLXyev8SR=xRSqQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Xin LI wrote: > This is expected behavior (in en_US.UTF-8 the ordering is AaBb, not > ABab). You might want to set LC_COLLATE to C if C behavior is desirable. > > On Mon, Apr 17, 2023 at 2:06 PM Poul-Henning Kamp <phk@phk.freebsd.dk > <mailto:phk@phk.freebsd.dk>> wrote: > > This surprised me: > > # mkdir /tmp/P > # cd /tmp/P > # touch FOO > # touch bar > # env LANG=C.UTF-8 find . -name '[A-Z]*' -print > ./FOO > # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print > ./FOO > ./bar > > Really ?! A bit more detail: find uses fnmatch(3) here, where the RE Bracket Expression rules apply (except for ! instead of ^, but that's unrelated): https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 ...which has the following note: 7. In the POSIX locale, a range expression represents the set of collating elements that fall between two elements in the collation sequence, inclusive. In other locales, a range expression has unspecified behavior: strictly conforming applications shall not rely on whether the range expression is valid, or on the set of collating elements matched. Indeed, it's unfortunate that collations in non-POSIX are not that... linear and range expressions can break, but I don't see an easy way of "fixing" this.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6dd71202-4144-8587-b42c-8db44a4b737e>