From owner-freebsd-i18n@FreeBSD.ORG Mon Jun 6 16:08:18 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B946106564A for ; Mon, 6 Jun 2011 16:08:18 +0000 (UTC) (envelope-from switch@trueswitch.com) Received: from mail.trueswitch.com (mail.trueswitch.com [4.78.168.24]) by mx1.freebsd.org (Postfix) with ESMTP id 8E7D88FC16 for ; Mon, 6 Jun 2011 16:08:17 +0000 (UTC) Received: from service512.trueswitch.com ([192.168.0.182]) by mail.trueswitch.com (8.14.3/8.14.3) with ESMTP id p569Yhto070768 for ; Mon, 6 Jun 2011 05:34:44 -0400 (EDT) (envelope-from switch@trueswitch.com) Date: Mon, 6 Jun 2011 05:34:38 -0400 (EDT) From: New Facebook For Singles Sender: switch@trueswitch.com To: "freebsd-i18n@freebsd.org" Message-ID: <95585096.207640.1307352878222.JavaMail.vmail@service512.trueswitch.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: karl48866@gmail.com has a new email address X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ashley.mccoy@aol.com List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2011 16:08:18 -0000 [canvas1.gif] [canvas2.gif] New Facebook For Singles has a new e-mail address. Old E-mail Address: karl48866@gmail.com New E-mail Address:[1]ashley.mccoy@aol.com HERE IS THE NEW FACEBOOK FOR SINGLES [2]WWW.FBOOK-SINGLES.COM [3]Check out the new AOL. Most comprehensive set of free safety and security tools, free access to millions of high-quality videos from across the web, free AOL Mail and more. References 1. mailto:ashley.mccoy@aol.com 2. http://www.fbook-singles.com/ 3. http://free.aol.com/thenewaol/index.adp From owner-freebsd-i18n@FreeBSD.ORG Mon Jun 6 18:29:47 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C7971065677 for ; Mon, 6 Jun 2011 18:29:47 +0000 (UTC) (envelope-from switch@trueswitch.com) Received: from mailer3.trueswitch.com (mail3.trueswitch.com [64.152.25.242]) by mx1.freebsd.org (Postfix) with ESMTP id 48E6E8FC17 for ; Mon, 6 Jun 2011 18:29:47 +0000 (UTC) Received: from service402.trueswitch.com ([192.168.0.196]) by mailer3.trueswitch.com (8.14.3/8.14.3) with ESMTP id p56ITiC8037657 for ; Mon, 6 Jun 2011 14:29:45 -0400 (EDT) (envelope-from switch@trueswitch.com) Date: Mon, 6 Jun 2011 14:30:15 -0400 (EDT) From: New Facebook For Singles Sender: switch@trueswitch.com To: "freebsd-i18n@freebsd.org" Message-ID: <1437416607.260760.1307385015969.JavaMail.vmail@service402.trueswitch.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: karl48866@gmail.com has a new email address X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ashley.mccoy@aol.com List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2011 18:29:47 -0000 [canvas1.gif] [canvas2.gif] New Facebook For Singles has a new e-mail address. Old E-mail Address: karl48866@gmail.com New E-mail Address:[1]ashley.mccoy@aol.com HERE IS THE NEW FACEBOOK FOR SINGLES [2]WWW.FBOOK-SINGLES.COM [3]Check out the new AOL. Most comprehensive set of free safety and security tools, free access to millions of high-quality videos from across the web, free AOL Mail and more. References 1. mailto:ashley.mccoy@aol.com 2. http://www.fbook-singles.com/ 3. http://free.aol.com/thenewaol/index.adp From owner-freebsd-i18n@FreeBSD.ORG Mon Jun 6 22:41:06 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB965106566C; Mon, 6 Jun 2011 22:41:06 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id 609998FC1B; Mon, 6 Jun 2011 22:41:06 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 621C01DD630; Tue, 7 Jun 2011 00:41:05 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id 5BC02173D9; Tue, 7 Jun 2011 00:41:05 +0200 (CEST) Date: Tue, 7 Jun 2011 00:41:05 +0200 From: Jilles Tjoelker To: freebsd-hackers@freebsd.org, freebsd-i18n@freebsd.org Message-ID: <20110606224105.GA92410@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2011 22:41:06 -0000 A few years ago, when locale support was added to the tr utility, character ranges (except ones containing one or two octal escapes) were changed to use the collation order instead of the character code order. At the time, this matched other implementations of tr and was apparently somewhat generally accepted. However, this behaviour is not intuitive, not portable as it deeply depends on the collation order and it is very hard to find a useful use for it. Perhaps there is a use case in EBCDIC locales that only contain the 2*26 basic Latin letters, but that is rather exotic. The command tr A-Z a-z may do something unexpected even if there is an 1:1 mapping between upper and lower case, since it also assumes that 'z' is the last letter. This is not a POSIX issue as POSIX leaves character ranges in tr unspecified for locales other than the POSIX locale (except for ranges containing octal escapes). If there is no reason to keep using the collation order, I would like to change tr's character ranges back to character codes. GNU tr does this and many ports wrongly take advantage of it, so following it will reduce the need to patch ports. The below patch demonstrates the new behaviour. The code could be simplified more as the flags for octal escapes are no longer needed. The man page may need some additional change as well. In particular, the command tr "[:upper:]" "[:lower:]" in a user's locale is a good choice for text specified by the user, but a poor choice for doing case-insensitive comparisons of constant strings, because in Turkish locales the upper case version of 'i' is a capital I with dot and the lower case version of 'I' is a lower case i without dot. In such cases, LC_ALL=C tr "[:upper:]" "[:lower:]" may be a better option (A-Z a-z could be used at the cost of breaking EBCDIC support). There is a related issue with ranges in regular expressions, glob and fnmatch (likewise unspecified by POSIX outside the POSIX locale), but this is less likely to cause problems. Index: usr.bin/tr/tr.1 =================================================================== --- usr.bin/tr/tr.1 (revision 222648) +++ usr.bin/tr/tr.1 (working copy) @@ -31,7 +31,7 @@ .\" @(#)tr.1 8.1 (Berkeley) 6/6/93 .\" $FreeBSD$ .\" -.Dd October 13, 2006 +.Dd June 6, 2011 .Dt TR 1 .Os .Sh NAME @@ -158,12 +158,7 @@ .Pp A backslash followed by any other character maps to that character. .It c-c -For non-octal range endpoints -represents the range of characters between the range endpoints, inclusive, -in ascending order, -as defined by the collation sequence. -If either or both of the range endpoints are octal sequences, it -represents the range of specific coded values between the +A range represents the range of specific coded values between the range endpoints, inclusive. .Pp .Bf Em @@ -309,20 +304,18 @@ .Pp .Dl "tr \*q[=e=]\*q \*qe\*q" .Sh COMPATIBILITY -Previous -.Fx -implementations of -.Nm -did not order characters in range expressions according to the current -locale's collation order, making it possible to convert unaccented Latin +Some implementations of +.Nm , +including the ones in previous versions of +.Fx , +order characters in range expressions according to the current +locale's collation order, making it impossible to convert unaccented Latin characters (esp.\& as found in English text) from upper to lower case using the traditional .Ux idiom of .Dq Li "tr A-Z a-z" . -Since -.Nm -now obeys the locale's collation order, this idiom may not produce +In such implementations, this idiom may not produce correct results when there is not a 1:1 mapping between lower and upper case, or when the order of characters within the two cases differs. As noted in the Index: usr.bin/tr/str.c =================================================================== --- usr.bin/tr/str.c (revision 222648) +++ usr.bin/tr/str.c (working copy) @@ -260,37 +260,13 @@ stopval = wc; s->str += clen; } - /* - * XXX Characters are not ordered according to collating sequence in - * multibyte locales. - */ - if (octal || was_octal || MB_CUR_MAX > 1) { - if (stopval < s->lastch) { - s->str = savestart; - return (0); - } - s->cnt = stopval - s->lastch + 1; - s->state = RANGE; - --s->lastch; - return (1); - } - if (charcoll((const void *)&stopval, (const void *)&(s->lastch)) < 0) { + if (stopval < s->lastch) { s->str = savestart; return (0); } - if ((s->set = p = malloc((NCHARS_SB + 1) * sizeof(int))) == NULL) - err(1, "genrange() malloc"); - for (cnt = 0; cnt < NCHARS_SB; cnt++) - if (charcoll((const void *)&cnt, (const void *)&(s->lastch)) >= 0 && - charcoll((const void *)&cnt, (const void *)&stopval) <= 0) - *p++ = cnt; - *p = OOBCH; - n = p - s->set; - - s->cnt = 0; - s->state = SET; - if (n > 1) - mergesort(s->set, n, sizeof(*(s->set)), charcoll); + s->cnt = stopval - s->lastch + 1; + s->state = RANGE; + --s->lastch; return (1); } -- Jilles Tjoelker From owner-freebsd-i18n@FreeBSD.ORG Mon Jun 6 23:08:52 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 723C51065678 for ; Mon, 6 Jun 2011 23:08:52 +0000 (UTC) (envelope-from switch@trueswitch.com) Received: from mailer3.trueswitch.com (mail3.trueswitch.com [64.152.25.242]) by mx1.freebsd.org (Postfix) with ESMTP id E43C98FC23 for ; Mon, 6 Jun 2011 23:08:51 +0000 (UTC) Received: from mail2.trueswitch.com (mail2 [192.168.0.26]) by mailer3.trueswitch.com (8.14.3/8.14.3) with ESMTP id p56N6iHi093172 for ; Mon, 6 Jun 2011 19:08:50 -0400 (EDT) (envelope-from switch@trueswitch.com) Received: from service402.trueswitch.com (unknown [192.168.0.196]) by mail2.trueswitch.com (Postfix) with ESMTP id AB2FDD680A2 for ; Mon, 6 Jun 2011 19:08:50 -0400 (EDT) Date: Mon, 6 Jun 2011 19:09:22 -0400 (EDT) From: New Facebook For Singles Sender: switch@trueswitch.com To: "freebsd-i18n@freebsd.org" Message-ID: <1945664634.264199.1307401762196.JavaMail.vmail@service402.trueswitch.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: karl48866@gmail.com has a new email address X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ashley.mccoy@aol.com List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2011 23:08:52 -0000 [canvas1.gif] [canvas2.gif] New Facebook For Singles has a new e-mail address. Old E-mail Address: karl48866@gmail.com New E-mail Address:[1]ashley.mccoy@aol.com HERE IS THE NEW FACEBOOK FOR SINGLES [2]WWW.FBOOK-SINGLES.COM [3]Check out the new AOL. Most comprehensive set of free safety and security tools, free access to millions of high-quality videos from across the web, free AOL Mail and more. References 1. mailto:ashley.mccoy@aol.com 2. http://www.fbook-singles.com/ 3. http://free.aol.com/thenewaol/index.adp From owner-freebsd-i18n@FreeBSD.ORG Tue Jun 7 00:37:31 2011 Return-Path: Delivered-To: freebsd-i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C87041065673; Tue, 7 Jun 2011 00:37:31 +0000 (UTC) (envelope-from ache@vniz.net) Received: from vniz.net (vniz.net [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 440DE8FC17; Tue, 7 Jun 2011 00:37:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by vniz.net (8.14.4/8.14.4) with ESMTP id p570OiPD089765; Tue, 7 Jun 2011 04:24:44 +0400 (MSD) (envelope-from ache@vniz.net) Received: (from ache@localhost) by localhost (8.14.4/8.14.4/Submit) id p570OheP089764; Tue, 7 Jun 2011 04:24:43 +0400 (MSD) (envelope-from ache) Date: Tue, 7 Jun 2011 04:24:43 +0400 From: Andrey Chernov To: Jilles Tjoelker Message-ID: <20110607002442.GA89483@vniz.net> Mail-Followup-To: Andrey Chernov , Jilles Tjoelker , freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG References: <20110606224105.GA92410@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110606224105.GA92410@stack.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2011 00:37:31 -0000 On Tue, Jun 07, 2011 at 12:41:05AM +0200, Jilles Tjoelker wrote: > > There is a related issue with ranges in regular expressions, glob and > fnmatch (likewise unspecified by POSIX outside the POSIX locale), but > this is less likely to cause problems. > You care about ports, but suggested change is americano-centrism which kills tr usage for national language documents due to impossibility to specify whole national alphabet easily, just by two letters. Moreover, having differently treated regex ranges in tr vs other places you mention will produce additional chaos. Back to the ports: it is not hard to run _any_ port's make or configure with LANG=C directly by the ports Mk system to eliminate that problem. -- http://ache.vniz.net/ From owner-freebsd-i18n@FreeBSD.ORG Tue Jun 7 21:17:14 2011 Return-Path: Delivered-To: freebsd-i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11CB31065674; Tue, 7 Jun 2011 21:17:14 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id A7F5E8FC21; Tue, 7 Jun 2011 21:17:13 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 10FD41DD415; Tue, 7 Jun 2011 23:17:13 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id 071BF173FD; Tue, 7 Jun 2011 23:17:13 +0200 (CEST) Date: Tue, 7 Jun 2011 23:17:12 +0200 From: Jilles Tjoelker To: Andrey Chernov , freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG Message-ID: <20110607211712.GA16994@stack.nl> References: <20110606224105.GA92410@stack.nl> <20110607002442.GA89483@vniz.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110607002442.GA89483@vniz.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2011 21:17:14 -0000 On Tue, Jun 07, 2011 at 04:24:43AM +0400, Andrey Chernov wrote: > On Tue, Jun 07, 2011 at 12:41:05AM +0200, Jilles Tjoelker wrote: > > There is a related issue with ranges in regular expressions, glob and > > fnmatch (likewise unspecified by POSIX outside the POSIX locale), but > > this is less likely to cause problems. > You care about ports, but suggested change is americano-centrism which > kills tr usage for national language documents due to impossibility to > specify whole national alphabet easily, just by two letters. Hmm, so that's with translation to a constant, or with the -d and/or -s options. In such cases, there may be a range for all letters with collation order, but not with codeset order (mainly if "all letters" includes letters with diacritical marks). In FreeBSD, upper case sorts before lower case, so cases can be distinguished this way but all letters may require two ranges. In most other operating systems the cases go together so a single range is sufficient, but cases cannot be distinguished. Making such things work on multiple operating systems requires careful testing. > Moreover, having differently treated regex ranges in tr vs other places > you mention will produce additional chaos. I think this is already inconsistent because some programs do not enable locale or use different locale code. With UTF-8 or other multibyte character sets, this is even more so because functions like isalpha work very poorly by definition and there is no collation support for such character sets in FreeBSD. > Back to the ports: it is not hard to run _any_ port's make or configure > with LANG=C directly by the ports Mk system to eliminate that problem. True, but some ports install scripts with problematic tr calls. -- Jilles Tjoelker From owner-freebsd-i18n@FreeBSD.ORG Tue Jun 7 22:23:24 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13808106566B for ; Tue, 7 Jun 2011 22:23:24 +0000 (UTC) (envelope-from atom@smasher.org) Received: from atom.smasher.org (atom.smasher.org [69.55.237.145]) by mx1.freebsd.org (Postfix) with SMTP id D0E098FC17 for ; Tue, 7 Jun 2011 22:23:23 +0000 (UTC) Received: (qmail 98971 invoked by uid 1000); 7 Jun 2011 21:56:42 -0000 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Date: Wed, 8 Jun 2011 09:56:39 +1200 (NZST) From: Atom Smasher In-Reply-To: <20110606224105.GA92410@stack.nl> Message-ID: <1106080945020.2239@smasher> MIME-Version: 1.0 OpenPGP: id=0xB88D52E4D9F57808; algo=1 (RSA); size=4096; url=http://atom.smasher.org/pgp.txt References: <20110606224105.GA92410@stack.nl> To: freebsd-hackers@freebsd.org X-POM: The Moon is Waxing Crescent (37% of Full) X-Hashcash: 1:20:1106072156:freebsd-hackers@freebsd.org::oAFr083FldiGMCkw:000000 0000000000000000000000000XyB X-Hashcash: 1:20:1106072156:freebsd-i18n@freebsd.org::q2DIAFaqBYjAk1RV:000000000 0000000000000000000000004Jzl Cc: freebsd-i18n@freebsd.org Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2011 22:23:24 -0000 the man page makes it clear... Translate the contents of file1 to upper-case. tr "[:lower:]" "[:upper:]" < file1 (This should be preferred over the traditional UNIX idiom of ``tr a-z A-Z'', since it works correctly in all locales.) for any other uses, either build the port with locale specified as "C" as mentioned, or patch the port so: tr '[a-z]' '[A-Z]' becomes: env LC_ALL=C tr '[a-z]' '[A-Z]' the only change that would be appropriate to the tr utility would be a command-line option to select a locale... something like: tr -l C '[a-z]' '[A-Z]' i don't think anyone would object to that, but it would still require patching some ports under some locales... maybe another option would be modifying tr to recognize other [new] environment variables... TR_LANG, TR_LC_ALL, TR_LC_CTYPE and TR_LC_COLLATE. done that way, things could be set in /etc/make.conf (or sys.mk), not need any patching, and not interfere with other uses of locale. -- ...atom ________________________ http://atom.smasher.org/ 762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808 ------------------------------------------------- "We in the West must bear in mind that the poor countries are poor primarily because we have exploited them through political or economic colonialism." -- Martin Luther King, Jr From owner-freebsd-i18n@FreeBSD.ORG Tue Jun 7 23:00:44 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8613F106566C; Tue, 7 Jun 2011 23:00:44 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id 287E78FC0C; Tue, 7 Jun 2011 23:00:44 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 27A8C1DD97B; Wed, 8 Jun 2011 01:00:43 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id 100431753C; Wed, 8 Jun 2011 01:00:43 +0200 (CEST) Date: Wed, 8 Jun 2011 01:00:43 +0200 From: Jilles Tjoelker To: Atom Smasher Message-ID: <20110607230042.GB16994@stack.nl> References: <20110606224105.GA92410@stack.nl> <1106080945020.2239@smasher> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1106080945020.2239@smasher> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org, freebsd-i18n@freebsd.org Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2011 23:00:44 -0000 On Wed, Jun 08, 2011 at 09:56:39AM +1200, Atom Smasher wrote: > the man page makes it clear... > Translate the contents of file1 to upper-case. > tr "[:lower:]" "[:upper:]" < file1 > (This should be preferred over the traditional UNIX idiom of ``tr a-z > A-Z'', since it works correctly in all locales.) > for any other uses, either build the port with locale specified as "C" as > mentioned, or patch the port so: > tr '[a-z]' '[A-Z]' > becomes: > env LC_ALL=C tr '[a-z]' '[A-Z]' > the only change that would be appropriate to the tr utility would be a > command-line option to select a locale... something like: > tr -l C '[a-z]' '[A-Z]' > i don't think anyone would object to that, but it would still require > patching some ports under some locales... That new option would provide zero benefit. If things are going to be patched anyway then patch them to be standards compliant. > maybe another option would be modifying tr to recognize other [new] > environment variables... TR_LANG, TR_LC_ALL, TR_LC_CTYPE and > TR_LC_COLLATE. done that way, things could be set in /etc/make.conf (or > sys.mk), not need any patching, and not interfere with other uses of > locale. That would be rather ugly. If tr a-z A-Z is supposed to be deceiving in some locales, then let it remain so unconditionally. -- Jilles Tjoelker From owner-freebsd-i18n@FreeBSD.ORG Wed Jun 8 03:06:50 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D55F6106566C for ; Wed, 8 Jun 2011 03:06:50 +0000 (UTC) (envelope-from atom@smasher.org) Received: from atom.smasher.org (atom.smasher.org [69.55.237.145]) by mx1.freebsd.org (Postfix) with SMTP id 9E2298FC1A for ; Wed, 8 Jun 2011 03:06:50 +0000 (UTC) Received: (qmail 78634 invoked by uid 1000); 8 Jun 2011 03:06:50 -0000 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Date: Wed, 8 Jun 2011 15:06:46 +1200 (NZST) From: Atom Smasher In-Reply-To: <20110607230042.GB16994@stack.nl> Message-ID: <1106081505190.2239@smasher> MIME-Version: 1.0 OpenPGP: id=0xB88D52E4D9F57808; algo=1 (RSA); size=4096; url=http://atom.smasher.org/pgp.txt References: <20110606224105.GA92410@stack.nl> <1106080945020.2239@smasher> <20110607230042.GB16994@stack.nl> To: freebsd-hackers@freebsd.org X-POM: The Moon is Waxing Crescent (40% of Full) X-Hashcash: 1:20:1106080306:freebsd-hackers@freebsd.org::0SUWRzFDEIdHn3C+:000000 0000000000000000000000006iB6 X-Hashcash: 1:20:1106080306:freebsd-i18n@freebsd.org::HoAlcUVBQxC/+3Wx:000000000 0000000000000000000000004F0W Cc: freebsd-i18n@freebsd.org Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2011 03:06:50 -0000 On Wed, 8 Jun 2011, Jilles Tjoelker wrote: >> maybe another option would be modifying tr to recognize other [new] >> environment variables... TR_LANG, TR_LC_ALL, TR_LC_CTYPE and >> TR_LC_COLLATE. done that way, things could be set in /etc/make.conf (or >> sys.mk), not need any patching, and not interfere with other uses of >> locale. > > That would be rather ugly. > > If tr a-z A-Z is supposed to be deceiving in some locales, then let it > remain so unconditionally. ================= it can still be as ugly as one wants it to be, and in some ports that might be fine. but this option would provide a very simple option to reign in how ugly it is. -- ...atom ________________________ http://atom.smasher.org/ 762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808 ------------------------------------------------- "The livestock sector is a major player [in climate change], responsible for 18% of greenhouse gas emissions measured in CO2 equivalent. This is a higher share than transport." -- Livestock's long shadow, 2006 UN report sponsored by WTO, EU, AS-AID, FAO, et al From owner-freebsd-i18n@FreeBSD.ORG Wed Jun 8 03:25:09 2011 Return-Path: Delivered-To: freebsd-i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21C5A106566B; Wed, 8 Jun 2011 03:25:09 +0000 (UTC) (envelope-from ache@vniz.net) Received: from vniz.net (vniz.net [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 7DABD8FC14; Wed, 8 Jun 2011 03:25:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by vniz.net (8.14.4/8.14.4) with ESMTP id p583P66x011239; Wed, 8 Jun 2011 07:25:06 +0400 (MSD) (envelope-from ache@vniz.net) Received: (from ache@localhost) by localhost (8.14.4/8.14.4/Submit) id p583P6oI011238; Wed, 8 Jun 2011 07:25:06 +0400 (MSD) (envelope-from ache) Date: Wed, 8 Jun 2011 07:25:06 +0400 From: Andrey Chernov To: Jilles Tjoelker Message-ID: <20110608032506.GA11098@vniz.net> Mail-Followup-To: Andrey Chernov , Jilles Tjoelker , freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG References: <20110606224105.GA92410@stack.nl> <20110607002442.GA89483@vniz.net> <20110607211712.GA16994@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110607211712.GA16994@stack.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2011 03:25:09 -0000 On Tue, Jun 07, 2011 at 11:17:12PM +0200, Jilles Tjoelker wrote: > In FreeBSD, upper case sorts before lower case, so cases can be > distinguished this way but all letters may require two ranges. In most > other operating systems the cases go together so a single range is > sufficient, but cases cannot be distinguished. Making such things work > on multiple operating systems requires careful testing. Such thing can't work consistenly on multiple operating systems by definition, because POSIX states "undefined" here. So the best we can is to concentrace on our system. No program should relay on that until POSIX define that somehow. > > Moreover, having differently treated regex ranges in tr vs other places > > you mention will produce additional chaos. > > I think this is already inconsistent because some programs do not enable > locale or use different locale code. I say the same, producing additional chaos is not bringing chaos from nowhere. AFAIK nobody use different locale code but often different regex implemetation. > > Back to the ports: it is not hard to run _any_ port's make or configure > > with LANG=C directly by the ports Mk system to eliminate that problem. > > True, but some ports install scripts with problematic tr calls. What count says, how many ports do that? Summarizing I suggest to consider two models: 1) Developer/programer etc. tr coderange does good for it. 2) Working with national language docs/end user/ tr coderange does bad for it. Sacrificing model 2) for 1) is not the thing we need, if such ports number is low. If such ports number is significant, we can consider additional options like automatically search and replace such tr's through pkg-plist (similar scanning we already do for security reasons). -- http://ache.vniz.net/ From owner-freebsd-i18n@FreeBSD.ORG Wed Jun 8 04:24:56 2011 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B7BBA106566C; Wed, 8 Jun 2011 04:24:56 +0000 (UTC) (envelope-from perryh@pluto.rain.com) Received: from agora.rdrop.com (agora.rdrop.com [IPv6:2607:f678:1010::34]) by mx1.freebsd.org (Postfix) with ESMTP id 77E5B8FC1B; Wed, 8 Jun 2011 04:24:56 +0000 (UTC) Received: from agora.rdrop.com (66@localhost [127.0.0.1]) by agora.rdrop.com (8.13.1/8.12.7) with ESMTP id p584OsWt067047 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 7 Jun 2011 21:24:54 -0700 (PDT) (envelope-from perryh@pluto.rain.com) Received: (from uucp@localhost) by agora.rdrop.com (8.13.1/8.12.9/Submit) with UUCP id p584OsRd067046; Tue, 7 Jun 2011 21:24:54 -0700 (PDT) Received: from fbsd61 by pluto.rain.com (4.1/SMI-4.1-pluto-M2060407) id AA01248; Tue, 7 Jun 11 21:12:42 PDT Date: Tue, 07 Jun 2011 21:12:37 -0700 From: perryh@pluto.rain.com To: jilles@stack.nl Message-Id: <4deef6b5.i/YJyk4lV85AhCTM%perryh@pluto.rain.com> References: <20110606224105.GA92410@stack.nl> <20110607002442.GA89483@vniz.net> <20110607211712.GA16994@stack.nl> In-Reply-To: <20110607211712.GA16994@stack.nl> User-Agent: nail 11.25 7/29/05 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, freebsd-i18n@freebsd.org Subject: Re: tr A-Z a-z in locales other than C X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2011 04:24:56 -0000 Jilles Tjoelker wrote: > On Tue, Jun 07, 2011 at 04:24:43AM +0400, Andrey Chernov wrote: ... > > Back to the ports: it is not hard to run _any_ port's make > > or configure with LANG=C directly by the ports Mk system to > > eliminate that problem. > > True, but some ports install scripts with problematic tr calls. So part of the porting effort may be to provide a patch that prepends something along the lines of "env LANG=C" to tr calls in those scripts. It would surely not be the only kind of situation in which a port needed to patch the ported code to get it to run correctly on FreeBSD :)