From owner-freebsd-stable@freebsd.org Tue Nov 8 19:54:51 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7346DC36A2E for ; Tue, 8 Nov 2016 19:54:51 +0000 (UTC) (envelope-from shoesoft@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C6191951 for ; Tue, 8 Nov 2016 19:54:50 +0000 (UTC) (envelope-from shoesoft@gmx.net) Received: from walrus.pepperland ([81.217.70.96]) by mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id 0Mc8Pz-1cMgnd43xH-00Je7u; Tue, 08 Nov 2016 20:54:35 +0100 Subject: Re: Uppercase RE matching problems in FreeBSD 11 To: Charles Swiger , Stefan Bethke References: <20161106110729.z2px7mzlhcwxvrvu@ivaldir.etoilebsd.net> <29451103-E8DB-4656-A5BB-AEB924A728D6@lassitu.de> <20161106210628.hg3dcpozfjtuo3nt@ivaldir.etoilebsd.net> <20161106212729.z2edg44kg7hc4r2z@ivaldir.etoilebsd.net> <99E209EA-75B0-430D-8F0C-E51D614143BA@mac.com> Cc: freebsd-stable From: Stefan Ehmann Message-ID: Date: Tue, 8 Nov 2016 20:54:33 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <99E209EA-75B0-430D-8F0C-E51D614143BA@mac.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K0:urT7j69QC8I6KFnTKE7IHcDiORgBlg7gocmLDJl0dJPlz8cPm6/ gWNi3dziDZeM1XmyRFKd53cTFPnSMkLr/6EcvX1TUajyqNE2RAJEcsQoimfspDKJvy5TWZH a44ecwDl8RpY/5Fbq2O3MT93Y9csqYqlcr9hYJ5tvtOK2fxzBeTyy+N0zx9GwFfeJRkCF0f AB8e4B2jj4AMx1UoJa8Dw== X-UI-Out-Filterresults: notjunk:1;V01:K0:ZQX3NvzTUbk=:PsamvjvZig8pCRCcDKxwfB eR2/but51IzYl3AofaMM3znyNGvSQQu4vanst/yJMSguLXHhliXoM3Ha0URV45CBT8OEUNL9+ 7fANWXH/CG3DcgRggNVldQG+1R0bxlCn+vcWDJ57ZyS2P0qAzL8AcaiFnKw5rKfh39J2ofz+6 EUz+H6jnfeETegmP6Vcwl/Cnzbg/762FhHFjU1FUJi/SoxCXp2R8kjJ64JisKHZswo9ttM14u /6chusNSrT4zTSPZyDKXvJ356gjgLvCbyjRXPHCSssO7nBt3D2TrUCp63yB0a2sQArAQyvTuK BuwtL/r3ulLxee5qrlzK6yI8AtvH+UcV1brSAiQdKDfHJEMM74EZAIJbk6uZCaXUaHOi5gGnG DJpNNqaahRz/lCgayL7oA6FnjWbuE290qNfz7MeFv3zx6rMEDpgtHW6peRjWCVLWTieiIc9ZC vQ18CmIVw5D1RXHVi6bErGC4OfKmYQAkLeRlM9Su9cRUQOw7r4MmM57SezonkAVtoUOr7Dn6J bL0nrIfxarkS9knY5ecKTsycAeNriJoB2w1N5LPRt/o2vZHxPrHn4LvKw+X+ePNYFd0EnLbc/ moSUqxR/tY64oPgPCQ2qzkN2h62v/SW9kMPX+E57yDxF8sMkZTv86nMaDQGP8cDqBrqd7zWYZ AzyOo7z++PTG9/8+mr5tL/DNQnBREDFGu9YX1K78rHwpjIgqm1+x49cODlJt+Bj61umQiKb/N WVhNj8/A6H/EP5WuKC4jFD+uMRFFawaQZTgJIRo7poctbsJHaCsqrExS9DGa127GcayIAb9RA CVcVobc X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2016 19:54:51 -0000 On 07.11.2016 22:13, Charles Swiger wrote: > On Nov 6, 2016, at 1:49 PM, Stefan Bethke wrote: >> Am 06.11.2016 um 22:27 schrieb Baptiste Daroussin >> : >>> That works for POSIX locale aka C aka ASCII only world >> >> So what do I set my LANG and LC variables to? I do want UTF-8, but >> I do also want my scripts to continue to work. Clearly, >> en_US.UTF-8 is not what I want. Is it C.UTF-8? Or do I set >> LANG=en_US.UTF-8 and LC_COLLATE=C? > > If you want to use a UTF8 locale, then you must start using character > classes like '[:upper:]' and '[:lower:]' because those will-- or at > least "should", modulo bugs-- properly handle the collation issues > including for languages which do not possess a 1-1 mapping between > upper and lower case letters. > > Someone with a German email address is presumably familiar with ß / > Eszett...? :-) Character classes work fine for [a-z], but I don't know of a simple way to match a range like [a-k]. Personally, I prefer the "Rational Range Interpretation" because it doesn't break backward compatibility and is still standard compliant.