Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Apr 1998 00:43:24 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        nate@mt.sri.com (Nate Williams)
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Euro key ?
Message-ID:  <199804230043.RAA15289@usr02.primenet.com>
In-Reply-To: <199804222054.OAA05690@mt.sri.com> from "Nate Williams" at Apr 22, 98 02:54:08 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > FreeBSD will have to switch to Unicode sooner or later anyway,
> > I think.
> 
> I disagree.  We're trying to use Unicode in a commercial application,
> and we've come to the conclusion that Unicode is *NO* better than
> shifted wide-char support.

??????????????

Multibyte encodings instead of raw wchar_t mean:

o	I can't know how many characters will fit in an N character
	field: it's not sizeof(field)/sizeof(wchar_t)

o	I can't do fixed field input because field lengths are
	no longer fixed.

o	I can't do fixed field storage (all my COBOL programs quit
	working, and no one wrote any replacements for me).

o	I can't use sizeof(file)/sizeof(struct) to get a record count

o	I can't know ahead of time whether or not I have enough
	disk space to store the document I created in memory
	(oops! -- better "store" it on the printer!).

o	Input buffer overrun is harder to prevent.

o	I have to translate between storage encoding and program
	internal (wchar_t) encoding.  Consider "cat a b | more".

o	I can't "attribute" a file system into a round trip
	character set at mount time: for example, a legacy
	file system that has not-7-bit data on it already... like
	ISO 8859-X or KOI-8 or KOI-8U, or ISO 2022 encoded JIS-208
	and JIS-212, which are not already multibyte encoded in
	UTF-8 (or -- gack! -- UTF-7).

o	Because I can't do that, all my CDROM's are now useless
	unless I twiddle my locale in-and-out, in-and-out.

o	I can't NFS mount a legacy system not in 7 bit US ASCII,
	even if I make a "magic" layer that applies only to NFS.

o	I can't round-trip between 8-bit ??? encoding and 16-bit
	Unicode encoding (ISO 10646 code page 0) and 32-bit
	ISO 10646 (of which only code page 0 is likely to be defined
	for the next 10 years, since others only exist as a nod to
	the language bigots) automatically, using page multiplication
	(where an on disk 4k page becomes 2 or 4, respectively, in
	core VM pages).

o	I can't use text data in mmap()'ed files without calling
	translation functions.

o	I can't support VFAT32 Unicode names directly.

o	I can't support LDAP and other ASN.1 encoded raw Unicode
	byte streams directly.

o	I can't support NTFS long file names directly.

o	I can't support NetWare client services, directly.

o	I can't support CIFS client services directly.

o	I can't know if the new directory entry will take more
	room thn the previous directory entry (in the FS directory
	block).


Basically, the only reason for them is so that 7 bit ASCII users
(read: English speakers) don't have to modify their legacy code
or US ASCII-centric data so that it will keep working.

ASCII bigotry: what a stupid excuse for all those limitations and
all that extra processing overhead.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804230043.RAA15289>