From owner-freebsd-stable@freebsd.org  Sun Nov  6 21:14:59 2016
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E3A64C34CEE
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Sun,  6 Nov 2016 21:14:59 +0000 (UTC)
 (envelope-from shoesoft@gmx.net)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21])
 (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 41F407F3;
 Sun,  6 Nov 2016 21:14:58 +0000 (UTC)
 (envelope-from shoesoft@gmx.net)
Received: from walrus.pepperland ([81.217.70.96]) by mail.gmx.com (mrgmx102)
 with ESMTPSA (Nemesis) id 0LjLwB-1cYQ4n0NoN-00dWFv; Sun, 06 Nov 2016 22:14:54
 +0100
Subject: Re: Uppercase RE matching problems in FreeBSD 11
To: Stefan Bethke <stb@lassitu.de>, Baptiste Daroussin <bapt@FreeBSD.org>
References: <alpine.BSF.2.20.1611051912260.2462@flake.tharned.org>
 <20161106110729.z2px7mzlhcwxvrvu@ivaldir.etoilebsd.net>
 <29451103-E8DB-4656-A5BB-AEB924A728D6@lassitu.de>
Cc: Greg Rivers <gcr+freebsd-stable@tharned.org>, freebsd-stable@freebsd.org
From: Stefan Ehmann <shoesoft@gmx.net>
Message-ID: <a3f401a7-9dc9-d567-bf21-139364702599@gmx.net>
Date: Sun, 6 Nov 2016 22:14:50 +0100
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <29451103-E8DB-4656-A5BB-AEB924A728D6@lassitu.de>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:JcXyqPqmtwVSV1j/cF/KBRcou+aIJ8vcMxmcdMEeUVJDK6TvEuT
 UypktZ19a3IrYYQwMcb2bZmDDz2J70vZkeHozsOsITC6+O+hDAXRwMf+/6H6aXN0Xnuh9qV
 8HN1wTukmGLvbBLiE0QwtcbL4LPIbkxjHVtRHk/j1WSCv8HaFVTcCwJX3ygtRq8gBkW1ftY
 jSPnr9d1LtiqsZQy0YK9g==
X-UI-Out-Filterresults: notjunk:1;V01:K0:JJEDCYSoCck=:nVNbVqWrFZ7stMHKHhxK3F
 XMe/C/ZkG0KoOt9dZPyxnYpGi3c8nvxmOSYnuuqpn9THXW0cqsY5kNtbzQUXPKBCfEAMvsRBl
 cgPdLs3BkqvfaqNxHJfY2Jq2p5lk5l52SGZKR+Yp6PkefqzZwY0BmPTI4/DPmQ8Qp5jYIRQmf
 AWJTmmGtih40w6/eD8FJWFwdr8Ik/BEJ6nyJlAImxRlkDZDRPg4j8npHYz+a5yQoQ6gdUCS34
 ugLwmdYgt5VsDqOqAHv+gVYx1Idjy3W4Wd35/szlfX20zfcQju6xrgMmn7eQ3vFywzAc74tm5
 2dxPyCRFkQmBpjugsQ/7OYIHuCsM81XqrzsG58g2lQC9hPBRcFpk6/GMYfkUaAvsZLP78qOPF
 yfyMn+IPIh+HhJw5EZ03aabYZvWdmBoV9cSm35zEj+0XJRtTgpj3xmpxh4y0NGAneWARjRUTq
 6iS8O79pKd1jMWzb8J3edKTRoRPMx60vDn+0SH2JdUVMiQCYX6i1uOhkqSPZEXmDj9dBaS1mR
 brRWMnKFe7d9q6/VnZHB5dSqktB2G1ujtbt4AaJE5zkkaUqNDhEDxllRRCYaWCdrWwSVlxfIm
 WbN0A2MSD42yRnrFdnvcVGQBNu6OLAmsw+PFBc2Fs26AJueyl15tnQzC6TAdO3D+FvOykX4Q6
 TRgVaSiS/Pu3U2djbwxyspEPI2GnzzPztKrDZPUP0xNgL0SThSROpOyET98u3rfAHWKEFeeAA
 q9XixXfsMCfSRv9Wh1J4hCyYWByizN4ZeMVsQTY5ZXOdS8DeQK1Hu67HtRuP5MER83bKzT5T0
 lbxvGDV
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Nov 2016 21:15:00 -0000

On 06.11.2016 21:57, Stefan Bethke wrote:
> 
>> Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin
>> <bapt@FreeBSD.org>:
>> 
>> On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote:
>>> I happened to run an old script today that uses sed(1) to extract
>>> the system boot time from the kern.boottime sysctl MIB. On 11.0
>>> this no longer works as expected:
..
>>> Here sed thinks every lowercase character except for 'a' is
>>> uppercase! This differs from the first test where sed did not
>>> think 'o' is uppercase. Again, the above behaves as expected with
>>> LANG=C.
>>> 
>>> Does anyone have any insight into this? This is likely to break a
>>> lot of existing code.
>>> 
>> 
>> Yes A-Z only means uppercase in an ASCII only world in a unicode
>> world it means AaBb... Z because there are way more characters that
>> simple A-Z. In FreeBSD 11 we have a unicode collation instead of
>> falling back in on LC_COLLATE=C which means ascii only
>> 
>> For regrexp for example one should use the classes: :upper: or
>> :lower:.
> 
> That is rather surprising.  Is there a normative reference for the
> treatment of bracket expressions and character classes when using
> locales other than C and/or encodings like UTF-8?

I found an interesting article about this issue in gawk:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html

Apparently the meaning of ranges is unspecified outside the "C" locale.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
says:

"In the POSIX locale, a range expression represents the set of collating
elements that fall between two elements in the collation sequence,
inclusive. In other locales, a range expression has unspecified
behavior: strictly conforming applications shall not rely on whether the
range expression is valid, or on the set of collating elements matched"