Date: Fri, 21 Apr 2023 20:03:30 +0200 From: Yuri <yuri@aetern.org> To: Current FreeBSD <freebsd-current@freebsd.org> Subject: Re: find(1): I18N gone wild ? Message-ID: <3e473603-f384-f176-e7cb-03409e16ec9c@aetern.org> In-Reply-To: <E427B1B8-22E0-47C0-BF47-0C4F1D5F962F@yahoo.com> References: <E427B1B8-22E0-47C0-BF47-0C4F1D5F962F.ref@yahoo.com> <E427B1B8-22E0-47C0-BF47-0C4F1D5F962F@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Mark Millard wrote: > Dimitry Andric <dim_at_FreeBSD.org> wrote on > Date: Fri, 21 Apr 2023 10:38:05 UTC : > >> On 21 Apr 2023, at 12:01, Ronald Klop <ronald-lists@klop.ws> wrote: >>> Van: Poul-Henning Kamp <phk@phk.freebsd.dk> >>> Datum: maandag, 17 april 2023 23:06 >>> Aan: current@freebsd.org >>> Onderwerp: find(1): I18N gone wild ? >>> This surprised me: >>> >>> # mkdir /tmp/P >>> # cd /tmp/P >>> # touch FOO >>> # touch bar >>> # env LANG=C.UTF-8 find . -name '[A-Z]*' -print >>> ./FOO >>> # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print >>> ./FOO >>> ./bar >>> >>> Really ?! >> ... >>> My Mac and a Linux server only give ./FOO in both cases. Just a 2 cents remark. >> >> Same here. However, I have read that with unicode, you should *never* >> use [A-Z] or [0-9], but character classes instead. That seems to give >> both files on macOS and Linux with [[:alpha:]]: >> >> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print >> ./BAR >> ./foo >> >> and only the lowercase file with [[:lower:]]: >> >> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print >> ./foo >> >> But on FreeBSD, these don't work at all: >> >> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print >> <nothing> >> >> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print >> <nothing> >> >> This is an interesting rabbit hole... :) > > FreeBSD: > > -name pattern > True if the last component of the pathname being examined matches > pattern. Special shell pattern matching characters (“[”, “]”, > “*”, and “?”) may be used as part of pattern. These characters > may be matched explicitly by escaping them with a backslash > (“\”). > > I conclude that [[:alpha:]] and [[:lower:]] were not > considered "Special shell pattern"s. "man glob" > indicates it is a shell specific builtin. > > macOS says similarly. Different shells, different > pattern notations and capabilities? Well, "man bash" > reports: [snip] > Seems like: pick your shell (as shown by echo $SHELL) and > that picks the pattern match rules used. (May be controllable > in the specific shell.) No, the pattern is not passed to shell and shell used should not matter (pattern should be properly escaped). The rules are here: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 ...which in turn refers to the following link for bracket expressions: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 Why we don't support all of that is different story.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3e473603-f384-f176-e7cb-03409e16ec9c>