From owner-freebsd-hackers Wed Jun 10 19:33:22 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id TAA23841 for freebsd-hackers-outgoing; Wed, 10 Jun 1998 19:33:22 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from gershwin.tera.com (gershwin.tera.com [207.224.230.28]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id TAA23773 for ; Wed, 10 Jun 1998 19:33:03 -0700 (PDT) (envelope-from kline@tao.thought.org) Received: from tao.thought.org (tao.tera.com [207.108.223.55]) by gershwin.tera.com (8.8.8/8.8.8) with ESMTP id TAA03183; Wed, 10 Jun 1998 19:31:59 -0700 (PDT) Received: (from kline@localhost) by tao.thought.org (8.8.8/8.7.3) id TAA09494; Wed, 10 Jun 1998 19:31:42 -0700 (PDT) From: Gary Kline Message-Id: <199806110231.TAA09494@tao.thought.org> Subject: Re: internationalization In-Reply-To: <6351.897526003@coconut.itojun.org> from Jun-ichiro itojun Itoh at "Jun 11, 98 09:46:43 am" To: itojun@itojun.org (Jun-ichiro itojun Itoh) Date: Wed, 10 Jun 1998 19:31:42 -0700 (PDT) Cc: tlambert@primenet.com, hackers@FreeBSD.ORG Organization: <> thought.org: public access uNix in service... <> X-Mailer: ELM [version 2.4ME+ PL32 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG According to Jun-ichiro itojun Itoh: > > Hello, > > >> Another part of the problem is that XPG/4 is encoded multibyte, which > >> is bad from a number of major perspectives, starting with ISO2022. > > We've got v 2.0 of the xpg4 library in 2.2.6. > > Do you know if any other flavor of BSD has more > > complete support? > > I've been working on iso-2022 encoding support for runelocale (xpg4) > library. At this moment I'm working on some specific packages > (for example, nvi or scheduler software called "sch") but will be > able to merge the modification into xpg4 library part. Wonderful! With the broadly international reach of FreeBSD I was hoping that someone in China|Japan|Taiwan would be into this. There may be a broader need for wide character support--say Sanskrit and Thai. ... > > >> I would prefer going to a full-on Unicode implementation to support > >> all known human languages. > > This was my first leaning, but I'm increasingly > > going toward the ISO families. > > Yes, iso-2022 families are quite important for supporting > asian languages. Unicode is, for us Japanese, quite incomplete and > unexpandable. Is there a way of explaining (briefly :) how the iso-2022 character set is displayed? This point came up the other day and I guessed that it was done by a ((large)) table-lookup under X. > > >> I would suggest an initial 16 bit wchar_t with an assumption of a > >> zero valued code page designator. If ISO ever gets around to adding > >> other code pages, we can deal with that at that time using page > >> selection. Meanwhile, we'll be able to interportate with Microsoft > >> and JAVA, which use 16 bit wchar_t encodings. > > I would like wchar_t to be 32bit, OR MORE. We will see more mutliple > 96x96 character pages at the same time so 16bit is really not enough. > Modified xpg4 library assumes that wchar_t to be at least 32bit. > Otherwise I cannot encode iso-2022 variant character sets into. > Hm! In my world, our wchar_t is 32-bits. So your library would work. Since wchar_t can be redefined, I ought to be able to build it anywhere. > > nvi/nex already have been tweaked for 8-bit international > > support. I learned this accidently. WAs quite > > surprised to see messages in French and German. :-) > > Nonetheless, I see why you like the Unicode solution. > > Someone said, ``Well, French support is great, but how > > are you going to handle Japanese?'' > > Do you mean the internationalization of messages displayed by nvi? > or file content? If it is the latter one, please install nvi-m17n > from /usr/ports/{japanese,korean,chinese}/nvi-* and see how it works. > (I'm responsible for nvi-m17n...) > The messages. And probably the display, too. For the 8-bit character set languages, they can be coded in standard 8859-1 with \hex and catalogued. If iso-2022 can be similarly catalogued; then my initial idea is valid---however iso-2022 is displayed. Thanks for the pointer:: I'll ftp your port and see. > >> I have had FS-based Unicode support working for a very long time, > >> though it has failed to be committed. One big issue is that directory > >> entry blocks must grow from 512b to 1k. This has a number of > >> implications to the soft updates work currently in progress. This is > >> because, in order to support a maximally sized path component, 512 + 24 > >> bytes is needed for unicaode, as opposed to 256 + 24 (which fits in 512b) > >> for an 8 bit charaacter set. > > :-( ! > > How does the ISO2022 model work here? Isn't it the > > same for Japanese and Chinese? > > Yes, for Japanese, Chinese and Korean iso-2022 based model (euc-xx > falls into the category) is really important. However, I personally > believe that filenames must be kept in C locale for simplicity... > > itojun > I'll check out iso-2022 further; if you know of any english-language docs on this, please sent me a pointer. gary -- Gary D. Kline kline@tao.thought.org Public service uNix To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message