From owner-freebsd-stable@FreeBSD.ORG Tue Feb 7 01:53:51 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.ORG Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E67F416A420 for ; Tue, 7 Feb 2006 01:53:51 +0000 (GMT) (envelope-from cinek@gmx.de) Received: from mail.gmx.net (mail.gmx.de [213.165.64.21]) by mx1.FreeBSD.org (Postfix) with SMTP id 2EBDC43D45 for ; Tue, 7 Feb 2006 01:53:50 +0000 (GMT) (envelope-from cinek@gmx.de) Received: (qmail invoked by alias); 07 Feb 2006 01:53:49 -0000 Received: from p5090F402.dip.t-dialin.net (EHLO klotz.local) [80.144.244.2] by mail.gmx.net (mp036) with SMTP; 07 Feb 2006 02:53:49 +0100 X-Authenticated: #989277 Received: from [192.168.0.2] (booky.local [192.168.0.2]) by klotz.local (8.13.4/8.13.4) with ESMTP id k171rkEU003868 for ; Tue, 7 Feb 2006 02:53:48 +0100 (CET) (envelope-from cinek@gmx.de) Message-ID: <43E7FDAA.3010409@gmx.de> Date: Tue, 07 Feb 2006 02:53:46 +0100 From: Martin Krzysiak User-Agent: Thunderbird 1.5 (X11/20060113) MIME-Version: 1.0 To: freebsd-stable@FreeBSD.ORG References: <200602061658.k16GwqLr068150@lurza.secnetix.de> In-Reply-To: <200602061658.k16GwqLr068150@lurza.secnetix.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Y-GMX-Trusted: 0 Cc: Subject: Re: tr(1) buggy with de_DE.ISO8859-1(5) locale? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Feb 2006 01:53:52 -0000 Oliver Fromme wrote: > It's not a bug. It's perfectly POSIX-compatible. I think this behavior is "undefined" in POSIX, as I found in some documents. This is a difference. > To convert lower case to upper case, use the command > "tr '[:lower:]' '[:upper:]'" (or enumerate all letters > explicitely, like "tr abcdef ABCDEF"). Skripts that > use things like "tr a-z A-Z" are broken and need to be > fixed. It's not only upper-lowercase conversion that is weird. Try "echo wxyz | tr w-z a-d". Ranges are broken generally in ISO-locales, in my opinion. > By the way: Do not set LANG or LC_ALL, expecially for > the root user, and especially when compiling things. One thing I like about FreeBSD is that I have my German environment. But you are right. The only locale that is expected to work correctly is "C". > Not only will tr behave in unexpected ways when used > like above, but also other things might break. For > example, German month names appear in "ls -l", which > will break scripts that try to parse them. Don't tell me about localization problems. I've seen lots of stupid things. The latest one was a localized "Date:" header produced by a commercial application. > Some tools > use decimal commas instead of decimal points, which > can lead to further confusion, etc. Yes, scripts > which try to do that are broken, but they do exist. Yes. You are right. How many times did you use tr(1) to convert your texts to upper/lower case? Do you expect that it works correctly? I would prefer to use it like: "tr a-zäöü A-ZÄÖÜ", _if_ I ever need to do it. > If you only need support for German umlauts, then only > set LC_CTYPE. That shouldn't break anything. I appreciate really really really that FreeBSD supports German locales. Let's stop arguing. I just wanted to ask about the behavior. Now I know that something might by fishy with tr(1) and I understand how to avoid this problem. That's all I need to know. For people who are interested in a simple workaround. Don't use de_DE.ISO8859-1(5). Instead use de_DE.UTF-8. tr(1)'s ranges work like expected there. Martin