From owner-freebsd-stable@freebsd.org Tue Nov 8 20:07:05 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 384BAC36F40 for ; Tue, 8 Nov 2016 20:07:05 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from mail-in7.apple.com (mail-out7.apple.com [17.151.62.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 17A20FC0 for ; Tue, 8 Nov 2016 20:07:04 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from relay8.apple.com (relay8.apple.com [17.128.113.102]) by mail-in7.apple.com (Apple Secure Mail Relay) with SMTP id C7.66.32245.76032285; Tue, 8 Nov 2016 12:07:03 -0800 (PST) X-AuditID: 11973e16-f7e959a000007df5-0f-58223067fc16 Received: from [17.149.236.205] (Unknown_Domain [17.149.236.205]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by relay8.apple.com (Apple SCV relay) with SMTP id DE.80.29380.76032285; Tue, 8 Nov 2016 12:07:03 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\)) Subject: Re: Uppercase RE matching problems in FreeBSD 11 From: Chuck Swiger In-Reply-To: Date: Tue, 8 Nov 2016 12:07:03 -0800 Cc: Stefan Bethke , freebsd-stable Content-Transfer-Encoding: quoted-printable Message-Id: <81CABF69-8B12-40D8-9E65-CCF5D183441F@mac.com> References: <20161106110729.z2px7mzlhcwxvrvu@ivaldir.etoilebsd.net> <29451103-E8DB-4656-A5BB-AEB924A728D6@lassitu.de> <20161106210628.hg3dcpozfjtuo3nt@ivaldir.etoilebsd.net> <20161106212729.z2edg44kg7hc4r2z@ivaldir.etoilebsd.net> <99E209EA-75B0-430D-8F0C-E51D614143BA@mac.com> To: Stefan Ehmann X-Mailer: Apple Mail (2.3251) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrLLMWRmVeSWpSXmKPExsUi2FCYpptuoBRhsOGStcXhZiGLZYs2MVuc PrSAzYHZY8an+SweizftZ/NYNPc/UwBzFJdNSmpOZllqkb5dAlfGh4ffmApO8Fa8mL+AqYHx OVcXIyeHhICJxNemBcxdjFwcQgL7GCXaG44ywyQmN5xihUhMZ5L4fmYVG0iCWUBd4s+8S2BF vAL6Ele+PWPsYuTgEBawlNj8uh4kzCagIrF41n+wEk4Ba4mm09vZQWwWoPiC1zuZIMYEShze CNIKYmtLLFv4GmqklcTde82MEHvfMUusu3IQrEgEqHnyl7MsEMfJSnx6/pMdpEhC4C+rxMPP H1kmMArOQnLfLCT3zUKyZAEj8ypGodzEzBzdzDxzvcSCgpxUveT83E2MoPCdbie2g/HhKqtD jAIcjEo8vBkPFSOEWBPLiitzDzFKc7AoifNu5peNEBJITyxJzU5NLUgtii8qzUktPsTIxMEp 1cAo/zZgjmftvZAMZp274ccWHzvDxnN6d4Hnq05DlpM18RP27Gv/u7HXVvTzq739tk82S685 7X8lx/jeKnb+g42Fq9MeZb7JluJdMWfX4SUbzI+FH72iXrZj+r7CBfO9i6esKrm68Z7V++uh +yctv5HY/9zpwSRpsyBR3Se/S9un8l25auX7l1NfRImlOCPRUIu5qDgRAPFLa2VAAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrDLMWRmVeSWpSXmKPExsUiOPXNWd10A6UIg3n79S0ONwtZLFu0idni 9KEFbA7MHjM+zWfxWLxpP5vHorn/mQKYo7hsUlJzMstSi/TtErgyPjz8xlRwgrfixfwFTA2M z7m6GDk5JARMJCY3nGKFsMUkLtxbz9bFyMUhJDCdSeL7mVVsIAlmAXWJP/MuMYPYvAL6Ele+ PWPsYuTgEBawlNj8uh4kzCagIrF41n+wEk4Ba4mm09vZQWwWoPiC1zuZIMYEShzeCNIKYmtL LFv4GmqklcTde82MEHvfMUusu3IQrEgEqHnyl7MsEMfJSnx6/pN9AiP/LCQnzUJy0iwkcxcw Mq9iFChKzUmstNBLLCjISdVLzs/dxAgKw4bCtB2MTcutDjEKcDAq8fAK3FeMEGJNLCuuzD3E KMHBrCTCe1RXKUKINyWxsiq1KD++qDQntfgQozQHi5I477UO+QghgfTEktTs1NSC1CKYLBMH p1QDY+MR5XCn/d17biaKagsUv/9rcOzivoc7rEpfi5mqse37ey8myl8099etDseyjpkzJNV3 NHhwt2++sfTJpu+OjH8uTvk7T8Iye8EvKxbXirYvS/ZO27657aW8UM4SuTZGs6I1xfvrC2Vd /6juaTXOUojQFNZ8qND+5+PdlgNf/8Y1f132uOsnhxJLcUaioRZzUXEiAInDyuA/AgAA X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2016 20:07:05 -0000 On Nov 8, 2016, at 11:54 AM, Stefan Ehmann wrote: > On 07.11.2016 22:13, Charles Swiger wrote: >> On Nov 6, 2016, at 1:49 PM, Stefan Bethke wrote: >>> Am 06.11.2016 um 22:27 schrieb Baptiste Daroussin >>> : >>>> That works for POSIX locale aka C aka ASCII only world >>>=20 >>> So what do I set my LANG and LC variables to? I do want UTF-8, but >>> I do also want my scripts to continue to work. Clearly, >>> en_US.UTF-8 is not what I want. Is it C.UTF-8? Or do I set >>> LANG=3Den_US.UTF-8 and LC_COLLATE=3DC? >>=20 >> If you want to use a UTF8 locale, then you must start using character >> classes like '[:upper:]' and '[:lower:]' because those will-- or at >> least "should", modulo bugs-- properly handle the collation issues >> including for languages which do not possess a 1-1 mapping between >> upper and lower case letters. >>=20 >> Someone with a German email address is presumably familiar with =C3=9F = / >> Eszett...? :-) >=20 > Character classes work fine for [a-z], but I don't know of a simple = way > to match a range like [a-k]. True. If you need smaller ranges, I don't see a portable way of doing so in a non-POSIX / "C" locale beyond listing them out. Or: > Personally, I prefer the "Rational Range Interpretation" because it > doesn't break backward compatibility and is still standard compliant. ...yes, +1. Many of the GNU tools like grep and gawk have adopted this, but they are replacing the system regex routines with their own code. However, you can't rely on RRI without testing whether you've got a gawk in the $PATH or whether /usr/bin/awk or whichever is really GNU awk. Regards, --=20 -Chuck