Date: Fri, 20 Aug 2021 11:03:26 +0200 (CEST) From: freebsd@oldach.net (Helge Oldach) To: stable@freebsd.org Subject: Confusion with grep & locale? Message-ID: <202108200903.17K93QN3091126@nuc.oldach.net>
next in thread | raw e-mail | index | archive | help
Hi all, I'm confused about the FreeBSD behaviour with respect to locale's and grep - specifically, it seems case sensitivity is not handled consistently when grepping character ranges. It looks to me like 11 and 13 are not behaving consistently however I'm unclear why. # uname -a FreeBSD 11STABLE 11.4-STABLE FreeBSD 11.4-STABLE #1059 r368289M: Thu Dec 3 01:48:30 UTC 2020 root@XXX amd64 # export LANG=en_US.ISO8859-1 # (echo bla; echo Bla) | grep '[A-Z]' Bla # export LANG=C # (echo bla; echo Bla) | grep '[A-Z]' Bla # export LANG=en_US.UTF-8 # (echo bla; echo Bla) | grep '[A-Z]' bla Bla # # uname -a FreeBSD 13STABLE 13.0-STABLE FreeBSD 13.0-STABLE #49 stable/13-n246779-64085efb677-dirty: Mon Aug 16 08:42:53 CEST 2021 root@XXX amd64 # export LANG=en_US.ISO8859-1 # (echo bla; echo Bla) | grep '[A-Z]' bla Bla # export LANG=C # (echo bla; echo Bla) | grep '[A-Z]' Bla # export LANG=en_US.UTF-8 # (echo bla; echo Bla) | grep '[A-Z]' Bla # For comparison, a Linux RHEL box delivers the expected results: # uname -a Linux rhel.local 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux # export LANG=en_US.ISO8859-1 # (echo bla; echo Bla) | grep '[A-Z]' Bla # export LANG=C # (echo bla; echo Bla) | grep '[A-Z]' Bla # export LANG=en_US.UTF-8 # (echo bla; echo Bla) | grep '[A-Z]' Bla # There is nothing special in the environment, specifically no LC_xxx nor MM_CHARSET in either case. Any guidance is appreciated... Thanks! Kind regards Helge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202108200903.17K93QN3091126>