From owner-freebsd-stable@freebsd.org Sun Nov 6 11:07:34 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F9E5C32EAD for ; Sun, 6 Nov 2016 11:07:34 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B648219FE for ; Sun, 6 Nov 2016 11:07:33 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: by mail-wm0-x229.google.com with SMTP id a197so130754685wmd.0 for ; Sun, 06 Nov 2016 03:07:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=hRbOYF8eBbvPRSoqmoL7WKeK75ByGIAJLi7TI7dp2/E=; b=toS8Lj2HWtt6o7xuZ+eYTJaMfNNbvMyXsg9GBwnwxXN2DUuw6cPQBs6tEoe7Mzdf0t Ff9d3U16cK/aDL2l5A+pd2eIyFLlkNI7iy8qMZ+bLeOAyYrilGMTGS9k7+LP2ry04HzB J9AVtJlvQEqaAjvdRElI+t+L3zVKUP9izcF2IN9kf60f5REKHnCrcTEJVWX3M2gcvBq/ MvBDRjOyALuIFT88FgdABPxuw0iDD60+Wzsww3Vrx0CmWX0UkyXte6KDHcV4H2hsZQ0H ZBroqk/J1FcrKmJMab94MK/JI/FJKPAk555U1kHyoFpLdZVW8ExzzEegVV/eLzM45CY7 yVOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=hRbOYF8eBbvPRSoqmoL7WKeK75ByGIAJLi7TI7dp2/E=; b=UtOeHwtgCPlWwiatCEatV/lDSps3hY48rY440hEB7uY2j8VSEbMXQYVFwlhMY6zGxD K+XQMJ0Jx41pAJYZPmXQ8CwC6I+/xf1lyzm938mK2sA1/mErRazmpHaNyu5Pt1P/owxl POXa3RlmEWEWM5eTghsjTt2e3X9xmiFdzShd1A0F3J74cuzHFfgZoRucnWxPa3Izx3Mz E4bIGqQQkZdYGAOW5I0z3dwBNjYBZlOGJ00tUv4A93ud5zJZ/GTdBItSqXov7YeB2sJc 1X+OIogqaJvg6K/qhgr8hBSh92NMPUwrrhDkKW6ZiCWl55KSqrdHJ63MezSDJCOtqZyg Ivgg== X-Gm-Message-State: ABUngvcJ8cRhTbGiWP706STjmovzTOcDlkswTKJRR9IlvfyXgt0hrTwPgvIkXidJeZ2fKA== X-Received: by 10.194.22.34 with SMTP id a2mr1450287wjf.95.1478430451330; Sun, 06 Nov 2016 03:07:31 -0800 (PST) Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1]) by smtp.gmail.com with ESMTPSA id g6sm24530152wjp.45.2016.11.06.03.07.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 06 Nov 2016 03:07:30 -0800 (PST) Sender: Baptiste Daroussin Date: Sun, 6 Nov 2016 12:07:29 +0100 From: Baptiste Daroussin To: Greg Rivers Cc: freebsd-stable@freebsd.org Subject: Re: Uppercase RE matching problems in FreeBSD 11 Message-ID: <20161106110729.z2px7mzlhcwxvrvu@ivaldir.etoilebsd.net> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="6tpwwlpjmvkdsy5z" Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20161104 (1.7.1) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Nov 2016 11:07:34 -0000 --6tpwwlpjmvkdsy5z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: > I happened to run an old script today that uses sed(1) to extract the sys= tem > boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works= as > expected: >=20 > $ sysctl kern.boottime > kern.boottime: { sec =3D 1478380714, usec =3D 145351 } Sat Nov 5 16:18:3= 4 2016 > $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' > v 5 16:18:34 2016 >=20 > sed passes over 'S' and 'N' until it hits 'v', which it considers upperca= se > apparently. This is with LANG=3Den_US.UTF-8. If I set LANG=3DC, it works = as > expected: >=20 > $ sysctl kern.boottime | LANG=3DC sed -e 's/.*\([A-Z].*\)$/\1/' > Nov 5 16:18:34 2016 >=20 > Testing every lowercase character separately gives even more inconsistent > results: >=20 > $ cat < > a > > b > > c > > d > > e > > f > > g > > h > > i > > j > > k > > l > > m > > n > > o > > p > > q > > r > > s > > t > > u > > v > > w > > x > > y > > z > > ! > b > c > d > e > f > g > h > i > j > k > l > m > n > o > p > q > r > s > t > u > v > w > x > y > z >=20 > Here sed thinks every lowercase character except for 'a' is uppercase! Th= is > differs from the first test where sed did not think 'o' is uppercase. Aga= in, > the above behaves as expected with LANG=3DC. >=20 > Does anyone have any insight into this? This is likely to break a lot of > existing code. >=20 Yes A-Z only means uppercase in an ASCII only world in a unicode world it m= eans AaBb... Z because there are way more characters that simple A-Z. In FreeBSD= 11 we have a unicode collation instead of falling back in on LC_COLLATE=3DC wh= ich means ascii only For regrexp for example one should use the classes: :upper: or :lower:. Best regards, Bapt --6tpwwlpjmvkdsy5z Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJYHw7xAAoJEGOJi9zxtz5anyQQANztz/d2fUYBiCo5QcF3iPHn C98qrd7aqQWEXPE+hdhrqC4r82UaYNNqvaYdoaArV6WIQOqEDzu/Eju8c6VidOkj uSJuai9mAxQTzbSi8oSka8kyGGUJZYKA0wZpGfqdWTCigQcE9yjFdnVYbkIn8LNp Y4+N9ZEOm0pGDxbD7aOTCT4sZY7znqaZuoiA6Fid6jNe/dEIKnfDDoMOyUrt8YF7 v1O6RUILizjDpfs4VzrE2MmoUs5hXKREv1+rez87wLTUhj08d3h93vvQrtrzt/Zc 0sKBiJ3azbCuKGnz2y7HjIAO3kU1Do3RqqsjDA3catzc8n8qUt2j0iBJhmEMw/Oj 1A4Hbiem2EQXX5OTzvFkrQ2S3L4MhAjOjFDsPG6Edjt18Z8DSuuy94j6PYlnm02h Cl0W2I/70fCegg2uYiO7aNg31eF48hc19Yar5c4UpYORV0iaf8pLX5Xc1E8AixH3 T9/oakMh9o5JS/1J+gRprxbN+tdHNlVky46hAz0Hq4uB2wcJdsS/yPqGKjdRYGIZ ajmRewVcnoDVaJrdv1fqKbAdxfOkgi01fgSUq8+KRzP5Vleuj9H9mLEJRgpj6RXo irpyTZbevLqNnmCCuCBdC/t1akpk1tXWCE+sP8I2JwURbMNK1+PpXgIxCLxIsmr5 h9oPHjvUPmd5GisZbtYa =6UV7 -----END PGP SIGNATURE----- --6tpwwlpjmvkdsy5z--