Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Aug 2021 11:03:26 +0200 (CEST)
From:      freebsd@oldach.net (Helge Oldach)
To:        stable@freebsd.org
Subject:   Confusion with grep & locale?
Message-ID:  <202108200903.17K93QN3091126@nuc.oldach.net>

next in thread | raw e-mail | index | archive | help
Hi all,

I'm confused about the FreeBSD behaviour with respect to locale's
and grep - specifically, it seems case sensitivity is not handled
consistently when grepping character ranges. It looks to me like 11 and
13 are not behaving consistently however I'm unclear why.

# uname -a
FreeBSD 11STABLE 11.4-STABLE FreeBSD 11.4-STABLE #1059 r368289M: Thu Dec  3 01:48:30 UTC 2020     root@XXX  amd64
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
bla
Bla
#

# uname -a
FreeBSD 13STABLE 13.0-STABLE FreeBSD 13.0-STABLE #49 stable/13-n246779-64085efb677-dirty: Mon Aug 16 08:42:53 CEST 2021     root@XXX  amd64
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
bla
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
#

For comparison, a Linux RHEL box delivers the expected results:

# uname -a
Linux rhel.local 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
#

There is nothing special in the environment, specifically no LC_xxx nor
MM_CHARSET in either case.

Any guidance is appreciated... Thanks!

Kind regards
Helge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202108200903.17K93QN3091126>