Date: Thu, 23 Apr 1998 00:43:24 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: nate@mt.sri.com (Nate Williams) Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Euro key ? Message-ID: <199804230043.RAA15289@usr02.primenet.com> In-Reply-To: <199804222054.OAA05690@mt.sri.com> from "Nate Williams" at Apr 22, 98 02:54:08 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > FreeBSD will have to switch to Unicode sooner or later anyway, > > I think. > > I disagree. We're trying to use Unicode in a commercial application, > and we've come to the conclusion that Unicode is *NO* better than > shifted wide-char support. ?????????????? Multibyte encodings instead of raw wchar_t mean: o I can't know how many characters will fit in an N character field: it's not sizeof(field)/sizeof(wchar_t) o I can't do fixed field input because field lengths are no longer fixed. o I can't do fixed field storage (all my COBOL programs quit working, and no one wrote any replacements for me). o I can't use sizeof(file)/sizeof(struct) to get a record count o I can't know ahead of time whether or not I have enough disk space to store the document I created in memory (oops! -- better "store" it on the printer!). o Input buffer overrun is harder to prevent. o I have to translate between storage encoding and program internal (wchar_t) encoding. Consider "cat a b | more". o I can't "attribute" a file system into a round trip character set at mount time: for example, a legacy file system that has not-7-bit data on it already... like ISO 8859-X or KOI-8 or KOI-8U, or ISO 2022 encoded JIS-208 and JIS-212, which are not already multibyte encoded in UTF-8 (or -- gack! -- UTF-7). o Because I can't do that, all my CDROM's are now useless unless I twiddle my locale in-and-out, in-and-out. o I can't NFS mount a legacy system not in 7 bit US ASCII, even if I make a "magic" layer that applies only to NFS. o I can't round-trip between 8-bit ??? encoding and 16-bit Unicode encoding (ISO 10646 code page 0) and 32-bit ISO 10646 (of which only code page 0 is likely to be defined for the next 10 years, since others only exist as a nod to the language bigots) automatically, using page multiplication (where an on disk 4k page becomes 2 or 4, respectively, in core VM pages). o I can't use text data in mmap()'ed files without calling translation functions. o I can't support VFAT32 Unicode names directly. o I can't support LDAP and other ASN.1 encoded raw Unicode byte streams directly. o I can't support NTFS long file names directly. o I can't support NetWare client services, directly. o I can't support CIFS client services directly. o I can't know if the new directory entry will take more room thn the previous directory entry (in the FS directory block). Basically, the only reason for them is so that 7 bit ASCII users (read: English speakers) don't have to modify their legacy code or US ASCII-centric data so that it will keep working. ASCII bigotry: what a stupid excuse for all those limitations and all that extra processing overhead. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804230043.RAA15289>