Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Nov 2016 20:23:25 -0500 (CDT)
From:      Greg Rivers <gcr+freebsd-stable@tharned.org>
To:        freebsd-stable@freebsd.org
Subject:   Uppercase RE matching problems in FreeBSD 11
Message-ID:  <alpine.BSF.2.20.1611051912260.2462@flake.tharned.org>

next in thread | raw e-mail | index | archive | help
I happened to run an old script today that uses sed(1) to extract the 
system boot time from the kern.boottime sysctl MIB. On 11.0 this no longer 
works as expected:

$ sysctl kern.boottime
kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov  5 16:18:34 2016
$ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/'
v  5 16:18:34 2016

sed passes over 'S' and 'N' until it hits 'v', which it considers 
uppercase apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it 
works as expected:

$ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/'
Nov  5 16:18:34 2016

Testing every lowercase character separately gives even more inconsistent 
results:

$ cat <<! | LANG=en_US.UTF-8 sed -n -e '/^[A-Z]$/'p
> a
> b
> c
> d
> e
> f
> g
> h
> i
> j
> k
> l
> m
> n
> o
> p
> q
> r
> s
> t
> u
> v
> w
> x
> y
> z
> !
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

Here sed thinks every lowercase character except for 'a' is uppercase! 
This differs from the first test where sed did not think 'o' is uppercase. 
Again, the above behaves as expected with LANG=C.

Does anyone have any insight into this? This is likely to break a lot of 
existing code.

-- 
Greg



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.20.1611051912260.2462>