From owner-freebsd-current Tue Jun 18 4:47:31 2002 Delivered-To: freebsd-current@freebsd.org Received: from falcon.mail.pas.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74]) by hub.freebsd.org (Postfix) with ESMTP id 1E35C37B405 for ; Tue, 18 Jun 2002 04:47:22 -0700 (PDT) Received: from pool0040.cvx21-bradley.dialup.earthlink.net ([209.179.192.40] helo=mindspring.com) by falcon.mail.pas.earthlink.net with esmtp (Exim 3.33 #2) id 17KHS2-0003k7-00; Tue, 18 Jun 2002 04:47:11 -0700 Message-ID: <3D0F1D98.31B49358@mindspring.com> Date: Tue, 18 Jun 2002 04:46:32 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Thomas David Rivers Cc: mb@imp.ch, current@FreeBSD.ORG, wollman@lcs.mit.edu Subject: Re: PATCH: wchar_t is already defined in libstd++ References: <200206181119.g5IBJX954922@lakes.dignus.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thomas David Rivers wrote: > > Personally, I vote for u_int16_t... Unicode 16 bit, vs. ISO-10646 > > code page zero (other code pages aren't defined at all anyway, and > > it matches Windows, in case you want to use an ELF library from a > > Windows box, if you can figure out how). > > I noticed before that you mentioned you didn't want the > wchar_t to be int-sized (i.e. 32 bits.) I was just wondering > why. > > If we "shrink" the size at this point, would that have some > impact on existing programs. (Currently, the typedef > for `wchar_t' works down to an `int', if I'm not mistaken.) My ulterior motives are: o Sloppily written code, ported from other platforms o Compatability with Windows (e.g. NTFS, VFAT32FS) o Complete disdain for ISO-10646 being 32 bits, when 16 of them are never anything but 0, and were put there just so that people could grep -v other people's languages out of documents o I'll believe Hieroglyphics and Linear B when I see the fonts and the programs that use them. Dead languages pretty much justify purpose-built linguistics software anyway. o A desire for raw storage of Unicode, rather than UTF-8 or UTF-7 encoding. This last one is: o UTF encoding is mostly so people using US-ASCII don't have to change their data (and to hell with the rest of the world). ASCII centrism is why we're having to invent a new type today. o UTF encoding breaks fixed field storage, which has always bean a measure of the number of characters you can put in a field. o UTF encoding breaks the historical (and really nice) "size_of_file/sizeof(struct) := number_of_records" o Not knowing if a character will take 1 byte or 5 bytes means that your fixed length input fields in browsers have to be fixed at 1/5th the number of characters as bytes available to store the input result o People might accept doubling data size for the benefit of internationalization. They aren't going to accept a random multiplier between 1 and 5. o Storage encoding and processing encoding should be the same thing, and not require conversion (yeah, I know, I was there for the comp.std.internat arguments with Ohta-san about hating Unicode because it didn't use EUC encoding, used Chinese dictionary ordering, and wan't "JIS-208 + extensions"; frankly, I think most Japanese don't care, as long as it works, which is why Windows hasn't suffered sales losses). I really, really hate doing field length conversions in code; I rather suspect it will lead to as many bugs as NUL terminated strings and "strcpy()" and "sprintf()" have led to buffer overflows. More justification than I intended, but I think the GCC default on most platforms was chosen to *intentionally* be incompatible with Windows. The decision should be made on technical merits, rather than blind hatred. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message