From owner-freebsd-current  Tue Jun 18  4:47:31 2002
Delivered-To: freebsd-current@freebsd.org
Received: from falcon.mail.pas.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74])
	by hub.freebsd.org (Postfix) with ESMTP id 1E35C37B405
	for <current@freebsd.org>; Tue, 18 Jun 2002 04:47:22 -0700 (PDT)
Received: from pool0040.cvx21-bradley.dialup.earthlink.net ([209.179.192.40] helo=mindspring.com)
	by falcon.mail.pas.earthlink.net with esmtp (Exim 3.33 #2)
	id 17KHS2-0003k7-00; Tue, 18 Jun 2002 04:47:11 -0700
Message-ID: <3D0F1D98.31B49358@mindspring.com>
Date: Tue, 18 Jun 2002 04:46:32 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Thomas David Rivers <rivers@dignus.com>
Cc: mb@imp.ch, current@FreeBSD.ORG, wollman@lcs.mit.edu
Subject: Re: PATCH: wchar_t is already defined in libstd++
References: <200206181119.g5IBJX954922@lakes.dignus.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Thomas David Rivers wrote:
> > Personally, I vote for u_int16_t... Unicode 16 bit, vs. ISO-10646
> > code page zero (other code pages aren't defined at all anyway, and
> > it matches Windows, in case you want to use an ELF library from a
> > Windows box, if you can figure out how).
> 
>  I noticed before that you mentioned you didn't want the
>  wchar_t to be int-sized (i.e. 32 bits.)  I was just wondering
>  why.
> 
>  If we "shrink" the size at this point, would that have some
>  impact on existing programs.  (Currently, the typedef
>  for `wchar_t' works down to an `int', if I'm not mistaken.)

My ulterior motives are:

o	Sloppily written code, ported from other platforms

o	Compatability with Windows (e.g. NTFS, VFAT32FS)

o	Complete disdain for ISO-10646 being 32 bits, when 16
	of them are never anything but 0, and were put there just
	so that people could grep -v other people's languages out
	of documents

o	I'll believe Hieroglyphics and Linear B when I see the
	fonts and the programs that use them.  Dead languages
	pretty much justify purpose-built linguistics software
	anyway.

o	A desire for raw storage of Unicode, rather than UTF-8 or
	UTF-7 encoding.  This last one is:

	o	UTF encoding is mostly so people using US-ASCII
		don't have to change their data (and to hell with
		the rest of the world).  ASCII centrism is why we're
		having to invent a new type today.

	o	UTF encoding breaks fixed field storage, which has
		always bean a measure of the number of characters
		you can put in a field.

	o	UTF encoding breaks the historical (and really nice)
		"size_of_file/sizeof(struct) := number_of_records"

	o	Not knowing if a character will take 1 byte or 5
		bytes means that your fixed length input fields in
		browsers have to be fixed at 1/5th the number of
		characters as bytes available to store the input
		result

	o	People might accept doubling data size for the benefit
		of internationalization.  They aren't going to accept
		a random multiplier between 1 and 5.

	o	Storage encoding and processing encoding should be
		the same thing, and not require conversion (yeah, I
		know, I was there for the comp.std.internat arguments
		with Ohta-san about hating Unicode because it didn't
		use EUC encoding, used Chinese dictionary ordering,
		and wan't "JIS-208 + extensions"; frankly, I think
		most Japanese don't care, as long as it works, which
		is why Windows hasn't suffered sales losses).

	I really, really hate doing field length conversions in code;
	I rather suspect it will lead to as many bugs as NUL terminated
	strings and "strcpy()" and "sprintf()" have led to buffer
	overflows.

More justification than I intended, but I think the GCC default on
most platforms was chosen to *intentionally* be incompatible with
Windows.  The decision should be made on technical merits, rather
than blind hatred.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message