Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Apr 2023 20:03:30 +0200
From:      Yuri <yuri@aetern.org>
To:        Current FreeBSD <freebsd-current@freebsd.org>
Subject:   Re: find(1): I18N gone wild ?
Message-ID:  <3e473603-f384-f176-e7cb-03409e16ec9c@aetern.org>
In-Reply-To: <E427B1B8-22E0-47C0-BF47-0C4F1D5F962F@yahoo.com>
References:  <E427B1B8-22E0-47C0-BF47-0C4F1D5F962F.ref@yahoo.com> <E427B1B8-22E0-47C0-BF47-0C4F1D5F962F@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Mark Millard wrote:
> Dimitry Andric <dim_at_FreeBSD.org> wrote on
> Date: Fri, 21 Apr 2023 10:38:05 UTC :
> 
>> On 21 Apr 2023, at 12:01, Ronald Klop <ronald-lists@klop.ws> wrote:
>>> Van: Poul-Henning Kamp <phk@phk.freebsd.dk>
>>> Datum: maandag, 17 april 2023 23:06
>>> Aan: current@freebsd.org
>>> Onderwerp: find(1): I18N gone wild ?
>>> This surprised me:
>>>
>>> # mkdir /tmp/P
>>> # cd /tmp/P
>>> # touch FOO
>>> # touch bar
>>> # env LANG=C.UTF-8 find . -name '[A-Z]*' -print
>>> ./FOO
>>> # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print
>>> ./FOO
>>> ./bar
>>>
>>> Really ?!
>> ...
>>> My Mac and a Linux server only give ./FOO in both cases. Just a 2 cents remark.
>>
>> Same here. However, I have read that with unicode, you should *never*
>> use [A-Z] or [0-9], but character classes instead. That seems to give
>> both files on macOS and Linux with [[:alpha:]]:
>>
>> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print
>> ./BAR
>> ./foo
>>
>> and only the lowercase file with [[:lower:]]:
>>
>> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print
>> ./foo
>>
>> But on FreeBSD, these don't work at all:
>>
>> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print
>> <nothing>
>>
>> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print
>> <nothing>
>>
>> This is an interesting rabbit hole... :)
> 
> FreeBSD:
> 
>      -name pattern
>              True if the last component of the pathname being examined matches
>              pattern.  Special shell pattern matching characters (“[”, “]”,
>              “*”, and “?”) may be used as part of pattern.  These characters
>              may be matched explicitly by escaping them with a backslash
>              (“\”).
> 
> I conclude that [[:alpha:]] and [[:lower:]] were not
> considered "Special shell pattern"s. "man glob"
> indicates it is a shell specific builtin.
> 
> macOS says similarly. Different shells, different
> pattern notations and capabilities? Well, "man bash"
> reports:
[snip]
> Seems like: pick your shell (as shown by echo $SHELL) and
> that picks the pattern match rules used. (May be controllable
> in the specific shell.)

No, the pattern is not passed to shell and shell used should not matter
(pattern should be properly escaped).  The rules are here:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13

...which in turn refers to the following link for bracket expressions:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

Why we don't support all of that is different story.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3e473603-f384-f176-e7cb-03409e16ec9c>