From owner-cvs-all@FreeBSD.ORG Fri Jul 9 02:08:07 2004 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C334516A4CE; Fri, 9 Jul 2004 02:08:07 +0000 (GMT) Received: from repoman.freebsd.org (repoman.freebsd.org [216.136.204.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id BCF5443D1F; Fri, 9 Jul 2004 02:08:07 +0000 (GMT) (envelope-from tjr@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.12.11/8.12.11) with ESMTP id i692879D035796; Fri, 9 Jul 2004 02:08:07 GMT (envelope-from tjr@repoman.freebsd.org) Received: (from tjr@localhost) by repoman.freebsd.org (8.12.11/8.12.11/Submit) id i69287du035795; Fri, 9 Jul 2004 02:08:07 GMT (envelope-from tjr) Message-Id: <200407090208.i69287du035795@repoman.freebsd.org> From: "Tim J. Robbins" Date: Fri, 9 Jul 2004 02:08:07 +0000 (UTC) To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org X-FreeBSD-CVS-Branch: HEAD Subject: cvs commit: src/usr.bin/tr Makefile cmap.c cmap.h cset.c cset.h extern.h str.c tr.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jul 2004 02:08:07 -0000 tjr 2004-07-09 02:08:07 UTC FreeBSD src repository Modified files: usr.bin/tr Makefile extern.h str.c tr.c Added files: usr.bin/tr cmap.c cmap.h cset.c cset.h Log: Add support for multibyte characters. The challenge here was to use data structures that scale better with large character sets, instead of arrays indexed by character value: - Sets of characters to delete/squeeze are stored in a new "cset" structure, which is implemented as a splay tree of extents. This structure has the ability to store character classes (ala wctype(3)), but this is not currently fully utilized. - Mappings between characters are stored in a new "cmap" structure, which is also a splay tree. - The parser no longer builds arrays containing all the characters in a particular class; instead, next() determines them on-the-fly using nextwctype(3). Revision Changes Path 1.2 +2 -1 src/usr.bin/tr/Makefile 1.1 +212 -0 src/usr.bin/tr/cmap.c (new) 1.1 +83 -0 src/usr.bin/tr/cmap.h (new) 1.1 +303 -0 src/usr.bin/tr/cset.c (new) 1.1 +75 -0 src/usr.bin/tr/cset.h (new) 1.9 +11 -10 src/usr.bin/tr/extern.h 1.23 +78 -87 src/usr.bin/tr/str.c 1.22 +116 -102 src/usr.bin/tr/tr.c