Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Aug 2021 16:17:55 +0200 (CEST)
From:      freebsd@oldach.net (Helge Oldach)
To:        se@freebsd.org (Stefan =?ISO-8859-1?Q?E=DFer?=)
Cc:        stable@freebsd.org
Subject:   Re: Confusion with grep & locale?
Message-ID:  <202108201417.17KEHt0w022450@nuc.oldach.net>
In-Reply-To: <fbb028fa-19f4-60b2-24e9-549961c3f92f@freebsd.org> from Stefan Esser at "20 Aug 2021 14:47:11"

next in thread | previous in thread | raw e-mail | index | archive | help
Stefan Esser wrote on Fri, 20 Aug 2021 14:47:11 +0200 (CEST):
> Am 20.08.21 um 11:03 schrieb Helge Oldach:
> But POSIX makes no guarantees for locales other than POSIX or C.

OK, thanks for the explanation. That clarifies a lot for me. Although
it's not really POLA. :-)

Thanks a lot also to Stefan Ehmann for the pointer to gawk oddities.

> > # export LANG=en_US.ISO8859-1
> > # (echo bla; echo Bla) | grep '[A-Z]'
> > bla
> > Bla
> 
> This one is unexpected, the upper case should be a range of its own
> and should not include any lower case letters.
> 
> > # export LANG=en_US.UTF-8
> > # (echo bla; echo Bla) | grep '[A-Z]'
> > Bla
> 
> Here I had expected the result you got with en_US.ISO8859-1 ...

> Definitely a bug in the definition of the collating sequences.
> 
> And I have just verified that de_DE.ISO8859-1 wrongly considers "ö"
> to be within [a-z], while de_DE.UTF-8 does not (but should).
> 
> Seems that the correct collating sequences for ISO8859-1 and UTF-8 are
> each assigned to the other one.

PR 257972 raised.

> > There is nothing special in the environment, specifically no LC_xxx nor
> > MM_CHARSET in either case.
> 
> LANG defines LC_COLLATE, unless overridden.

Indeed. I just explicitly mentioned *no* LC_xxx to clarify that it's not
overriden. :-)

> BTW, character classes work for your examples and more:

Certainly they do. But they harder to type... :-)

Kind regards
Helge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202108201417.17KEHt0w022450>