From owner-freebsd-current@FreeBSD.ORG Tue Aug 26 05:03:15 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F1121065689 for ; Tue, 26 Aug 2008 05:03:15 +0000 (UTC) (envelope-from kientzle@freebsd.org) Received: from kientzle.com (kientzle.com [66.166.149.50]) by mx1.freebsd.org (Postfix) with ESMTP id BC3458FC14 for ; Tue, 26 Aug 2008 05:03:14 +0000 (UTC) (envelope-from kientzle@freebsd.org) Received: from [10.0.0.128] (p54.kientzle.com [66.166.149.54]) by kientzle.com (8.12.9/8.12.9) with ESMTP id m7Q4aQtv008546; Mon, 25 Aug 2008 21:36:37 -0700 (PDT) (envelope-from kientzle@freebsd.org) Message-ID: <48B38895.9040000@freebsd.org> Date: Mon, 25 Aug 2008 21:37:41 -0700 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060422 X-Accept-Language: en-us, en MIME-Version: 1.0 To: =?UTF-8?B?U3ZhdmFyIEzDunRoZXJzc29u?= References: <3cb459ed0808221700w335b0906g6901d8b8bec4dad9@mail.gmail.com> <200808241415.31812.mitchell@wyatt672earp.force9.co.uk> <6a7033710808241239p1cbdc7adwd4f87814b428b10b@mail.gmail.com> <3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com> <48B28B8D.9030305@kjarrval.is> <3cb459ed0808250621s28a1b825u1cc16939951bb157@mail.gmail.com> <48B336D8.2030300@kjarrval.is> <3cb459ed0808251656l5716ee51y5bddf34fb8809b0c@mail.gmail.com> <48B3544B.4020601@kjarrval.is> In-Reply-To: <48B3544B.4020601@kjarrval.is> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: Unicode-based FreeBSD X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2008 05:03:15 -0000 > Going to UTF-8 might fix some of the character issues > but we would be in the same shoes when it comes to characters > which are in -16 and -32 but not in -8. You need to read the Unicode/ISO10646 standards again; you do not understand them. There are no characters in UTF-32 that are not in UTF-8. UTF-32, UTF-16, and UTF-8 all use exactly the same characters. UTF-8 encodes Unicode characters from U+000000 to U+10FFFF, using 1 to 4 bytes per character. UTF-16 encodes Unicode characters from U+000000 to U+10FFFF, using 2 to 4 bytes per character. UTF-32 encodes Unicode characters from U+000000 to U+10FFFF, using 4 bytes per character. Practically speaking, UTF-8 is a bit more convenient for file storage and transmission (including terminal support), UTF-16 or UTF-32 can be slightly more convenient for internal string manipulation. But all three encodings use exactly the same characters. Tim Kientzle