From owner-freebsd-hackers@FreeBSD.ORG Sun Apr 7 20:32:06 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B5424A83; Sun, 7 Apr 2013 20:32:06 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107]) by mx1.freebsd.org (Postfix) with ESMTP id 803E5EF0; Sun, 7 Apr 2013 20:32:06 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id AF4C6120207; Sun, 7 Apr 2013 22:31:51 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 6A0322848C; Sun, 7 Apr 2013 22:31:51 +0200 (CEST) Date: Sun, 7 Apr 2013 22:31:51 +0200 From: Jilles Tjoelker To: Cedric Blancher Subject: Re: Fwd: Where does FreeBSD tr -C differ from tr -c? Message-ID: <20130407203151.GA47134@stack.nl> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org, freebsd-standards@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Apr 2013 20:32:06 -0000 On Sun, Apr 07, 2013 at 09:12:57PM +0200, Cedric Blancher wrote: > Forwarding to freebsd-hackers@/freebsd-standards@freebsd.org > The question remain open and I need help. tr -C is implemented by > FreeBSD tr -C but I can't find examples (or a testcase) where tr -c > and tr -C differ. Reading the rationale of POSIX, here is an example of a difference: % printf 'a\200'|LC_ALL=en_US.US-ASCII tr -cd '\000-\177'|hd 00000000 61 |a| 00000001 % printf 'a\200'|LC_ALL=en_US.US-ASCII tr -Cd '\000-\177'|hd 00000000 61 80 |a.| 00000002 Because the bytes 128..255 are not characters in us-ascii, they cannot be removed with -Cd, only with -cd. Here is another difference (using LC_CTYPE=en_US.UTF-8, rest C): % echo $'\U0001a000'|tr -cd '\U0001a000'|hd % echo $'\U0001a000'|tr -Cd '\U0001a000'|hd 00000000 f0 9a 80 80 |....| 00000004 The cause is that iswrune(3) returns false for the unassigned code point U+0001A000. This may well contain bugs because Unicode adds new characters from time to time and our tables seem to be updated very rarely. POSIX also says things about collation order. You may not have detected this because FreeBSD does not implement LC_COLLATE for multibyte locales yet. > PS: Who wrote tr -C and how can I contact the author? You can read the Subversion logs but people may no longer be around. -- Jilles Tjoelker