From owner-freebsd-hackers@FreeBSD.ORG Tue Jun 7 21:17:14 2011 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11CB31065674; Tue, 7 Jun 2011 21:17:14 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id A7F5E8FC21; Tue, 7 Jun 2011 21:17:13 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 10FD41DD415; Tue, 7 Jun 2011 23:17:13 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id 071BF173FD; Tue, 7 Jun 2011 23:17:13 +0200 (CEST) Date: Tue, 7 Jun 2011 23:17:12 +0200 From: Jilles Tjoelker To: Andrey Chernov , freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG Message-ID: <20110607211712.GA16994@stack.nl> References: <20110606224105.GA92410@stack.nl> <20110607002442.GA89483@vniz.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110607002442.GA89483@vniz.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2011 21:17:14 -0000 On Tue, Jun 07, 2011 at 04:24:43AM +0400, Andrey Chernov wrote: > On Tue, Jun 07, 2011 at 12:41:05AM +0200, Jilles Tjoelker wrote: > > There is a related issue with ranges in regular expressions, glob and > > fnmatch (likewise unspecified by POSIX outside the POSIX locale), but > > this is less likely to cause problems. > You care about ports, but suggested change is americano-centrism which > kills tr usage for national language documents due to impossibility to > specify whole national alphabet easily, just by two letters. Hmm, so that's with translation to a constant, or with the -d and/or -s options. In such cases, there may be a range for all letters with collation order, but not with codeset order (mainly if "all letters" includes letters with diacritical marks). In FreeBSD, upper case sorts before lower case, so cases can be distinguished this way but all letters may require two ranges. In most other operating systems the cases go together so a single range is sufficient, but cases cannot be distinguished. Making such things work on multiple operating systems requires careful testing. > Moreover, having differently treated regex ranges in tr vs other places > you mention will produce additional chaos. I think this is already inconsistent because some programs do not enable locale or use different locale code. With UTF-8 or other multibyte character sets, this is even more so because functions like isalpha work very poorly by definition and there is no collation support for such character sets in FreeBSD. > Back to the ports: it is not hard to run _any_ port's make or configure > with LANG=C directly by the ports Mk system to eliminate that problem. True, but some ports install scripts with problematic tr calls. -- Jilles Tjoelker