Date: Fri, 20 Aug 2021 16:17:55 +0200 (CEST) From: freebsd@oldach.net (Helge Oldach) To: se@freebsd.org (Stefan =?ISO-8859-1?Q?E=DFer?=) Cc: stable@freebsd.org Subject: Re: Confusion with grep & locale? Message-ID: <202108201417.17KEHt0w022450@nuc.oldach.net> In-Reply-To: <fbb028fa-19f4-60b2-24e9-549961c3f92f@freebsd.org> from Stefan Esser at "20 Aug 2021 14:47:11"
next in thread | previous in thread | raw e-mail | index | archive | help
Stefan Esser wrote on Fri, 20 Aug 2021 14:47:11 +0200 (CEST): > Am 20.08.21 um 11:03 schrieb Helge Oldach: > But POSIX makes no guarantees for locales other than POSIX or C. OK, thanks for the explanation. That clarifies a lot for me. Although it's not really POLA. :-) Thanks a lot also to Stefan Ehmann for the pointer to gawk oddities. > > # export LANG=en_US.ISO8859-1 > > # (echo bla; echo Bla) | grep '[A-Z]' > > bla > > Bla > > This one is unexpected, the upper case should be a range of its own > and should not include any lower case letters. > > > # export LANG=en_US.UTF-8 > > # (echo bla; echo Bla) | grep '[A-Z]' > > Bla > > Here I had expected the result you got with en_US.ISO8859-1 ... > Definitely a bug in the definition of the collating sequences. > > And I have just verified that de_DE.ISO8859-1 wrongly considers "ö" > to be within [a-z], while de_DE.UTF-8 does not (but should). > > Seems that the correct collating sequences for ISO8859-1 and UTF-8 are > each assigned to the other one. PR 257972 raised. > > There is nothing special in the environment, specifically no LC_xxx nor > > MM_CHARSET in either case. > > LANG defines LC_COLLATE, unless overridden. Indeed. I just explicitly mentioned *no* LC_xxx to clarify that it's not overriden. :-) > BTW, character classes work for your examples and more: Certainly they do. But they harder to type... :-) Kind regards Helge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202108201417.17KEHt0w022450>