From owner-freebsd-hackers Mon Sep 18 13:38:04 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id NAA01209 for hackers-outgoing; Mon, 18 Sep 1995 13:38:04 -0700 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id NAA01202 for ; Mon, 18 Sep 1995 13:37:59 -0700 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id NAA08492; Mon, 18 Sep 1995 13:34:25 -0700 From: Terry Lambert Message-Id: <199509182034.NAA08492@phaeton.artisoft.com> Subject: Re: Policy on printf format specifiers? To: bakul@netcom.com (Bakul Shah) Date: Mon, 18 Sep 1995 13:34:25 -0700 (MST) Cc: phk@critter.tfs.com, terry@lambert.org, hackers@freefall.freebsd.org In-Reply-To: <199509181727.KAA09594@netcom10.netcom.com> from "Bakul Shah" at Sep 18, 95 10:26:58 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3755 Sender: owner-hackers@FreeBSD.org Precedence: bulk > > As far as I recall there is still some concern about Sanskrit and 10646 > > isn't there ? > > Last I looked Unicode handled Sanskrit and other Indian > languages fine. [Indian languages support is dear to my > heart so I looked into it back when Unicode-1 was being > worked on -- AFAIK there have been no changes in this area > since then] Sanskrit is supported. So is Tamil, Devengari, Hebrew, Arabic, etc. The common factor on these is that they are ligatured languages, meaning the glyph for a character will differ based on the location relative to other glyphs. For English speakers, the bes explanation is "cursive writing" and then consider the number of ways you can connect the cursive letter 'e' to other letters, based on if it's at the first of the word, end of the word, or in the muddle of a word before or after a character like 'd', 'f', 'r', 'z', 'p', 'n', etc. > Presumably Terry wants Unicode support in the kernel so that > one can print kernel messages in any language. No, I want it for file names in a Unicode aware file system. I also want it for translation layers for remote mounts to Unicode unaware file systems (most NFS systems today). Finally, I want it for path name parsing translation for locally Unicode aware/unaware user space applications and underlying file systems. > While I agree with his sentiment IMHO we have a long way to go > before that becomes critical. We need a filesystem that'll > support Unicode file names, Got one. > common applications need support for Unicode input/output etc. Wrote an Xterm, have a 1M(!) 14 point font. Barely ROMable. 8-). > Hmm.... Support for reading/writing of Unicode filenames > may be required in the kernel. How else can you deal with > code like > > sprintf(name, "%s.core", p->p_comm); > > where p_comm points to a Unicode filename? Precisely. Also: #ifdef DIAGNOSTIC printf( "entering '%S' into cache\n", cnp->cn_cnp->pc_data); #endif /* DIAGNOSTIC*/ > Bruce writes: > > I think wchar_t's were made 32 bits so that they are the same as rune_t's. > > I don't know if this is important. > > I too think 16 bit is good enough. 10646 is a 32 bit > standard but given that other than Unicode no other pages > are populated and that Unicode supports all living and many > (most?) dead languages and that except for scholars of dead > languages (a tiny tiny percentage of people) no one else > will benefit *even if* pages beyond Unicode are ever used, > allowing for such extension now is IMHO a waste of space. > rune_t can be made 16 bit, too. No reason to not leave rune_t 32 bits so as to not throw out dead language support altogether. I'd like to play around with Egyptian Heirogplyphics and Linear B at some point (neither are supported by Unicode -- most dead languages without modern antecedants aren't). > Printf support for wchar_t (and wchar_t *) should really be > specified by the standards people. If they haven't, may be > they should be petitioned. I agree on that. But I think it is also being taken for granted that storage encoding will be distinct from process encoding. I think that this is a *big* mistake, for reasons pointed out in other posts. This implies either a content-based byte order translation (which I feel is an unacceptable performance penalty) or a specification of a storage encoding byte order on the premise that this will go over the wire. Which is what led me to propose network byte order in the first place. None of this would prevent switching from 16 to 32 bit wchar_t's at some future date, were it to be found to be desirable. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.