Date: Thu, 22 Sep 2005 09:10:07 GMT From: Oliver Fromme <olli@lurza.secnetix.de> To: freebsd-bugs@FreeBSD.org Subject: Re: bin/86450: tr translates wrong in german environment Message-ID: <200509220910.j8M9A7cc071387@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/86450; it has been noted by GNATS. From: Oliver Fromme <olli@lurza.secnetix.de> To: bug-followup@FreeBSD.org, andy.321@web.de Cc: Subject: Re: bin/86450: tr translates wrong in german environment Date: Thu, 22 Sep 2005 11:09:04 +0200 (CEST) Andreas <andy.321@web.de> wrote: > > Synopsis: tr translates wrong in german environment It doesn't. As far as I can tell, the PR can be closed, because it's a feature, not a bug. :-) > While playing with tr I fund this: > > > setenv LANG de_DE.ISO8859-15 > > echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | tr A-Z a-z > > abcdefghijklmnopqrsßtüvwx˙ > > same in LANG=de_AT.ISO8859-15 or de_CH.ISO8859-15 (all de_*), maybe in other lan > guages, but not in LANG=C or da_DK.ISO8859-15 (I do not try other languages) That's correct and expected behaviour (POSIX / SUS). The reason is that expressions like "a-z" depend on the locale, particularly LC_COLLATE which controls alphabetic ordering. In the German-language locales, the collation order specifies the German symbol "ß" (ß) right after "s", but there is no such symbol in the uppercase equivalent, so the collation sequences have different length. That's why you get garbage after that point. In general it is a bad idea to use expressions like "A-Z" or "a-z" with tr. You might get correct results in one locale, but garbage in others. The following will work and produce the expected result: $ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | tr "[:upper:]" "[:lower:]" abcdefghijklmnopqrstuvwxyz Another way to perform lowercase conversion is to use awk: $ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | awk '{print tolower($0)}' abcdefghijklmnopqrstuvwxyz Unfortunately there are (third-party) scripts which use tr in the wrong way. Therefore my recommendation is to not set LANG or LC_ALL, but instead only set LC_CTYPE (for ISO8859 character support), and maybe LC_MESSAGES, LC_NUMERIC and LC_TIME if desired (although these might have bad side effects, too). If LC_COLLATE is required for certain applications, then set it only for those appliactions, but not in the global environment. YMMV, of course. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. "[...] one observation we can make here is that Python makes an excellent pseudocoding language, with the wonderful attribute that it can actually be executed." -- Bruce Eckel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200509220910.j8M9A7cc071387>