From owner-freebsd-current  Tue Jun 18 18:29:53 2002
Delivered-To: freebsd-current@freebsd.org
Received: from harrier.mail.pas.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12])
	by hub.freebsd.org (Postfix) with ESMTP id B5B5837B403
	for <current@freebsd.org>; Tue, 18 Jun 2002 18:29:45 -0700 (PDT)
Received: from pool0336.cvx22-bradley.dialup.earthlink.net ([209.179.199.81] helo=mindspring.com)
	by harrier.mail.pas.earthlink.net with esmtp (Exim 3.33 #2)
	id 17KUI3-0006sf-00; Tue, 18 Jun 2002 18:29:43 -0700
Message-ID: <3D0FDE3C.681A3207@mindspring.com>
Date: Tue, 18 Jun 2002 18:28:28 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: "Peter S. Housel" <housel@acm.org>
Cc: current@FreeBSD.ORG, Thomas David Rivers <rivers@dignus.com>
Subject: Re: PATCH: wchar_t is already defined in libstd++
References: <200206181119.g5IBJX954922@lakes.dignus.com> <3D0F1D98.31B49358@mindspring.com> <004401c216ec$844088f0$6621010a@housel7352>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

"Peter S. Housel" wrote:
> > o Complete disdain for ISO-10646 being 32 bits, when 16
> > of them are never anything but 0, and were put there just
> > so that people could grep -v other people's languages out
> > of documents
> >
> > o I'll believe Hieroglyphics and Linear B when I see the
> > fonts and the programs that use them.  Dead languages
> > pretty much justify purpose-built linguistics software
> > anyway.
> 
> If you were a MathML user, or had a Chinese name using an obscure character,
> you would probably feel differently.

Why?  Have the Chinese sent representatives to an international
standards body to get code pages other than 0 filled in with
these characters?  Have the MathML users?

Basically, it's not necessary to have bits to represent these
code points until they are parts of a standard character set.
The entire point of Unicode was to provide round-trip capability
between character sets.

For MathML, you can actually unify the code points with Zapf or
other characters thatdon't exist simultaneously in any character
sets.  Alrternately, you could use a "private use" area.


> > o A desire for raw storage of Unicode, rather than UTF-8 or
> > UTF-7 encoding.  This last one is:
> 
> You still need at least 21 bits to have "raw storage of Unicode".  With
> anything less, either UTF-16 surrogates or UTF-8 multi-byte encodings have
> to be used.  With a 16-bit wchar_t, even if I personally don't have any text
> that uses characters beyond the BMP, I still have to write my code to
> account for surrogates.

Unicode 3.2.0 is not an ISO/IEC standard.  It's a political thing.

You might have an argument for ISO-10646-2:2001; however "Klingon"
is not a script I'm really worried about.  8-).


> > o People might accept doubling data size for the benefit
> > of internationalization.  They aren't going to accept
> > a random multiplier between 1 and 5.
> 
> I suspect UTF-16 doesn't compress very well using standard tools, and it is
> subject to byte-order difficulties.  (That goes double for UTF-32, of
> course.)  wchar_t probably shouldn't be directly used for storage.

Anything larger than a byte has byte order problems; that was one
of the original rationales for UTF-8 encoding.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message