From owner-freebsd-current@FreeBSD.ORG  Tue Aug 26 05:03:15 2008
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3F1121065689
	for <freebsd-current@freebsd.org>; Tue, 26 Aug 2008 05:03:15 +0000 (UTC)
	(envelope-from kientzle@freebsd.org)
Received: from kientzle.com (kientzle.com [66.166.149.50])
	by mx1.freebsd.org (Postfix) with ESMTP id BC3458FC14
	for <freebsd-current@freebsd.org>; Tue, 26 Aug 2008 05:03:14 +0000 (UTC)
	(envelope-from kientzle@freebsd.org)
Received: from [10.0.0.128] (p54.kientzle.com [66.166.149.54])
	by kientzle.com (8.12.9/8.12.9) with ESMTP id m7Q4aQtv008546;
	Mon, 25 Aug 2008 21:36:37 -0700 (PDT)
	(envelope-from kientzle@freebsd.org)
Message-ID: <48B38895.9040000@freebsd.org>
Date: Mon, 25 Aug 2008 21:37:41 -0700
From: Tim Kientzle <kientzle@freebsd.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060422
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: =?UTF-8?B?U3ZhdmFyIEzDunRoZXJzc29u?= <svavar@kjarrval.is>
References: <3cb459ed0808221700w335b0906g6901d8b8bec4dad9@mail.gmail.com>		<200808241415.31812.mitchell@wyatt672earp.force9.co.uk>		<6a7033710808241239p1cbdc7adwd4f87814b428b10b@mail.gmail.com>		<3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com>		<48B28B8D.9030305@kjarrval.is>		<3cb459ed0808250621s28a1b825u1cc16939951bb157@mail.gmail.com>		<48B336D8.2030300@kjarrval.is>	<3cb459ed0808251656l5716ee51y5bddf34fb8809b0c@mail.gmail.com>
	<48B3544B.4020601@kjarrval.is>
In-Reply-To: <48B3544B.4020601@kjarrval.is>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-current@freebsd.org
Subject: Re: Unicode-based FreeBSD
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Aug 2008 05:03:15 -0000

> Going to UTF-8 might fix some of the character issues
> but we would be in the same shoes when it comes to characters
> which are in -16 and -32 but not in -8.

You need to read the Unicode/ISO10646 standards again;
you do not understand them.

There are no characters in UTF-32 that are not in UTF-8.

UTF-32, UTF-16, and UTF-8 all use exactly the same characters.

UTF-8 encodes Unicode characters from U+000000 to U+10FFFF, using 1 to 4 
bytes per character.

UTF-16 encodes Unicode characters from U+000000 to U+10FFFF, using 2 to 
4 bytes per character.

UTF-32 encodes Unicode characters from U+000000 to U+10FFFF, using 4 
bytes per character.

Practically speaking, UTF-8 is a bit more convenient for file
storage and transmission (including terminal support), UTF-16
or UTF-32 can be slightly more convenient for internal
string manipulation.  But all three encodings use exactly
the same characters.

Tim Kientzle