Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Apr 2023 21:36:05 +0200
From:      Yuri <yuri@aetern.org>
To:        freebsd-current@freebsd.org
Subject:   Re: find(1): I18N gone wild? [[:alpha:]] not a substitute to refer 26 English letters A-Z
Message-ID:  <86efedcf-e3ed-be0c-79ab-03f0d4a743af@aetern.org>
In-Reply-To: <CABObuOpJg%2BC-fSjfUXB9bH2JO0py6OqP2WzqLDN3NRERqYSJGQ@mail.gmail.com>
References:  <CABObuOpJg%2BC-fSjfUXB9bH2JO0py6OqP2WzqLDN3NRERqYSJGQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
parv/FreeBSD wrote:
> Wrote Dimitry Andric on Fri, 21 Apr 2023 10:38:05 UTC
> (via
> https://lists.freebsd.org/archives/freebsd-current/2023-April/003556.html <https://lists.freebsd.org/archives/freebsd-current/2023-April/003556.html>; )
>>
>> ... However, I have read that with unicode, you should *never*
>> use [A-Z] or [0-9], but character classes instead. That seems to give
>> both files on macOS and Linux with [[:alpha:]]:
> ...
> 
> Subject to the locale, problem with that is "[[:alpha:]]" will match
> more than 26 English letters "A" through "Z" (besides also matching
> lower case "a" through "z") even if none of 26 * 2 English alphabets
> appear in a string.

(replying to random recent message)

And there is a bit of quite recent history for fnmatch() related to
[a-z], same was done for regex with the same outcome -- attempt to make
[a-z] (guess [A-Z] as well) range non-collating failed.  I am not aware
of the encountered failures, hopefully someone should remember:

--------
commit 5a5807dd4ca34467ac5fb458bc19f12bf62075a5
Author: Andrey A. Chernov <ache@FreeBSD.org>
Date:   Sun Jul 10 03:49:38 2016 +0000

Remove broken support for collation in [a-z] type ranges.
Only first 256 wide chars are considered currently, all other are just
dropped from the range. Proper implementation require reverse tables
database lookup, since objects are really big as max UTF-8 (1114112
code points), so just the same scanning as it was for 256 chars will
slow things down.

POSIX does not require collation for [a-z] type ranges and does not
prohibit it for non-POSIX locales. POSIX require collation for ranges
only for POSIX (or C) locale which is equal to ASCII and binary for
other chars, so we already have it.

No other *BSD implements collation for [a-z] type ranges.

Restore ABI compatibility with unused now __collate_range_cmp() which
is visible from outside (will be removed later).
--------
commit 1daad8f5ad767dfe7896b8d1959a329785c9a76b
Author: Andrey A. Chernov <ache@FreeBSD.org>
Date:   Thu Jul 14 08:18:12 2016 +0000

Back out non-collating [a-z] ranges.
Instead of changing whole course to another POSIX-permitted way
for consistency and uniformity I decide to completely ignore missing
regex fucntionality and concentrace on fixing bugs in what we have now,
too many small obstacles instead, counting ports.
--------
commit 12eae8c8f346cb459a388259ca98faebdac47038
Author: Andrey A. Chernov <ache@FreeBSD.org>
Date:   Thu Jul 14 09:07:25 2016 +0000

1) Eliminate possibility to call __*collate_range_cmp() with inclomplete
locale (which cause core dump) by removing whole 'table' argument
by which it passed.

2) Restore __collate_range_cmp() in __sccl().

3) Collating [a-z] range in regcomp() only for single bytes locales
(we can't do it now for other ones). In previous state only first 256
wchars are considered and all others are just silently dropped from the
range.
--------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86efedcf-e3ed-be0c-79ab-03f0d4a743af>