From owner-freebsd-stable@freebsd.org Sun Nov 6 21:06:32 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75B08C3491B for ; Sun, 6 Nov 2016 21:06:32 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 06E54FD1 for ; Sun, 6 Nov 2016 21:06:32 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: by mail-wm0-x22c.google.com with SMTP id p190so147841225wmp.1 for ; Sun, 06 Nov 2016 13:06:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=GHl68k4l246MoUVFKXnbphKB0lThgJ9atzdnrbkl9mg=; b=zSMw00hWgbo/JcZDKJQXzZ/cUAtHvPfbAyYrm3x49c2CgtldwLWesumVKMkoQ1R3I1 bnF8oreplBnoEIAarEaQR9OLRjW4AuzKEnH3AAt9bCL6schceop2DHNkXgyaSY1DMceG xSxIS1w5aGPaEI9/cMeV5elRxigEouGHemNZL/N6L/Y5Zz2qzoA8ukefDb4mhXnYn9cO kMDgTMFXHwqoxTo0PAYN7e7qIedmMwZ6NmPP/hT0cY7L6pdt5Izgq/0zOmKq0FFLB8Nz 8gsUq30FngnA4T7jTwZYOSQnxiwLLYgJHGJMU/3NO07eTXyJPocsMrAXvlrxzXEJfUsM rZDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=GHl68k4l246MoUVFKXnbphKB0lThgJ9atzdnrbkl9mg=; b=jGyrhqLiHu1GgSm6IQqmX+qMAhpoiDTLGW6kpQPfLYqtK98dRbq+FAtEsjGcrent27 c3ZvHycEjo+YRwcy6R8wARWi/s1OD3MWuN0BnVF0rHPaXTipzPF4tcJRW9rUsraJrcaU xXEZZykXZJbYX+QDGrA56fGcpKmISRePde9KLYVo6l2TNFeEM9KR2uOVcUeTkfVaWEyx xQvQv9sZse7uYYbw+Ny9R5Am3kKov8r4rUp/jDABoIt/Tf740cgW+xNUE8nK8sSQhDLg NAVSGFNR+FvaUO2Cgjbedsw7jRftRhW637YjQAeoxW4rVLzfuh3DyjgYRkyj8hcsFXcu t3Gg== X-Gm-Message-State: ABUngvesE4VIbCqA9oU6jcZpTB1K7T+6Qd6EDJR0Ok5SPDR7MIarBJdXlmA/aJnQkTxnfw== X-Received: by 10.194.202.195 with SMTP id kk3mr2784251wjc.37.1478466390124; Sun, 06 Nov 2016 13:06:30 -0800 (PST) Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1]) by smtp.gmail.com with ESMTPSA id r72sm9540876wmd.21.2016.11.06.13.06.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 06 Nov 2016 13:06:29 -0800 (PST) Sender: Baptiste Daroussin Date: Sun, 6 Nov 2016 22:06:28 +0100 From: Baptiste Daroussin To: Stefan Bethke Cc: Greg Rivers , freebsd-stable@freebsd.org Subject: Re: Uppercase RE matching problems in FreeBSD 11 Message-ID: <20161106210628.hg3dcpozfjtuo3nt@ivaldir.etoilebsd.net> References: <20161106110729.z2px7mzlhcwxvrvu@ivaldir.etoilebsd.net> <29451103-E8DB-4656-A5BB-AEB924A728D6@lassitu.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="63idfhg4r23xf2ri" Content-Disposition: inline In-Reply-To: <29451103-E8DB-4656-A5BB-AEB924A728D6@lassitu.de> User-Agent: NeoMutt/20161104 (1.7.1) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Nov 2016 21:06:32 -0000 --63idfhg4r23xf2ri Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Nov 06, 2016 at 09:57:00PM +0100, Stefan Bethke wrote: >=20 > > Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin : > >=20 > > On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: > >> I happened to run an old script today that uses sed(1) to extract the = system > >> boot time from the kern.boottime sysctl MIB. On 11.0 this no longer wo= rks as > >> expected: > >>=20 > >> $ sysctl kern.boottime > >> kern.boottime: { sec =3D 1478380714, usec =3D 145351 } Sat Nov 5 16:1= 8:34 2016 > >> $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' > >> v 5 16:18:34 2016 > >>=20 > >> sed passes over 'S' and 'N' until it hits 'v', which it considers uppe= rcase > >> apparently. This is with LANG=3Den_US.UTF-8. If I set LANG=3DC, it wor= ks as > >> expected: > >>=20 > >> $ sysctl kern.boottime | LANG=3DC sed -e 's/.*\([A-Z].*\)$/\1/' > >> Nov 5 16:18:34 2016 > >>=20 > >> Testing every lowercase character separately gives even more inconsist= ent > >> results: > >>=20 > >> $ cat <=20 > >> Here sed thinks every lowercase character except for 'a' is uppercase!= This > >> differs from the first test where sed did not think 'o' is uppercase. = Again, > >> the above behaves as expected with LANG=3DC. > >>=20 > >> Does anyone have any insight into this? This is likely to break a lot = of > >> existing code. > >>=20 > >=20 > > Yes A-Z only means uppercase in an ASCII only world in a unicode world = it means > > AaBb... Z because there are way more characters that simple A-Z. In Fre= eBSD 11 > > we have a unicode collation instead of falling back in on LC_COLLATE=3D= C which > > means ascii only > >=20 > > For regrexp for example one should use the classes: :upper: or :lower:. >=20 > That is rather surprising. Is there a normative reference for the treatm= ent of bracket expressions and character classes when using locales other t= han C and/or encodings like UTF-8? http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html For example: "Regular expressions are a context-independent syntax that can represent a = wide variety of character sets and character set orderings, where these character sets are interpreted according to the current locale. While many regular expressions can be interpreted differently depending on the current locale,= many features, such as character class expressions, provide for contextual invar= iance across locales." Best regards, Bapt --63idfhg4r23xf2ri Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJYH5tUAAoJEGOJi9zxtz5aiBMP/2QqhR2T1w63hASc40+uOxiV eaZy+8KssQsuxe4g83f2cY0T9gvDEAp5bT2q7hh5WfLj/Sban/hf6V0Q0quVY2gD tqmDx39nwTw8DYEcGJnGSMzoc3RiH/ugEmT+o0BlILcVoO0Tyq30sulXPcGyH9h9 EPNnmFbkddO2amXib5TzxKX0aYAZHbVZMcgcyswcQSzF8EQI7k7HsYkSmBc0hFAe ZCqPpBSYGYp7EAZLPGLrkS8Q7jirErr+dWLuVxaR3zutYZ2ysEKYlV+UXVhnsAq6 n+hWjcIDd/HT272NB4bEjg2VO+LaWPDT3AuRwtmbryyANrEpN3Fi5iesSBouYmAv Tb6Ly+TcCe+V38FhaEhhiI7G+JriRp4F299bQFXdJ+JOo24tE/Qmux8h2cfFxP/s bJR4we/iegnvgeVcDMc0PHDeWCE803DT5dZDJt2l1xTsmLhh65vcUzTdZ9HQzrFe 16AaK74OKTeI+N51ibQihmzbxuS0MBTxhS7Q1c6KSAsazZvB1EtRiAW9ZXZMkhOs Ut4EHZmqzm82V+K0wM6i8qfsLFsA7TZpBAb+df3ML6XG1nczi3cQnxITWcN0lRj0 /HOXnPaY9+qPMi0kZiFkmoBRgjmosO2CcdfbFSYox8iClFnmBadvEPLDmhb50SeY 5O/b/VhVzkrD0yOWEeWF =Hy2n -----END PGP SIGNATURE----- --63idfhg4r23xf2ri--