From owner-freebsd-i18n@FreeBSD.ORG Wed Jun 8 03:25:09 2011 Return-Path: Delivered-To: freebsd-i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21C5A106566B; Wed, 8 Jun 2011 03:25:09 +0000 (UTC) (envelope-from ache@vniz.net) Received: from vniz.net (vniz.net [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 7DABD8FC14; Wed, 8 Jun 2011 03:25:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by vniz.net (8.14.4/8.14.4) with ESMTP id p583P66x011239; Wed, 8 Jun 2011 07:25:06 +0400 (MSD) (envelope-from ache@vniz.net) Received: (from ache@localhost) by localhost (8.14.4/8.14.4/Submit) id p583P6oI011238; Wed, 8 Jun 2011 07:25:06 +0400 (MSD) (envelope-from ache) Date: Wed, 8 Jun 2011 07:25:06 +0400 From: Andrey Chernov To: Jilles Tjoelker Message-ID: <20110608032506.GA11098@vniz.net> Mail-Followup-To: Andrey Chernov , Jilles Tjoelker , freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG References: <20110606224105.GA92410@stack.nl> <20110607002442.GA89483@vniz.net> <20110607211712.GA16994@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110607211712.GA16994@stack.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2011 03:25:09 -0000 On Tue, Jun 07, 2011 at 11:17:12PM +0200, Jilles Tjoelker wrote: > In FreeBSD, upper case sorts before lower case, so cases can be > distinguished this way but all letters may require two ranges. In most > other operating systems the cases go together so a single range is > sufficient, but cases cannot be distinguished. Making such things work > on multiple operating systems requires careful testing. Such thing can't work consistenly on multiple operating systems by definition, because POSIX states "undefined" here. So the best we can is to concentrace on our system. No program should relay on that until POSIX define that somehow. > > Moreover, having differently treated regex ranges in tr vs other places > > you mention will produce additional chaos. > > I think this is already inconsistent because some programs do not enable > locale or use different locale code. I say the same, producing additional chaos is not bringing chaos from nowhere. AFAIK nobody use different locale code but often different regex implemetation. > > Back to the ports: it is not hard to run _any_ port's make or configure > > with LANG=C directly by the ports Mk system to eliminate that problem. > > True, but some ports install scripts with problematic tr calls. What count says, how many ports do that? Summarizing I suggest to consider two models: 1) Developer/programer etc. tr coderange does good for it. 2) Working with national language docs/end user/ tr coderange does bad for it. Sacrificing model 2) for 1) is not the thing we need, if such ports number is low. If such ports number is significant, we can consider additional options like automatically search and replace such tr's through pkg-plist (similar scanning we already do for security reasons). -- http://ache.vniz.net/