Date: Thu, 11 Jun 1998 09:46:43 +0900 From: Jun-ichiro itojun Itoh <itojun@itojun.org> To: Gary Kline <kline@tao.thought.org> Cc: tlambert@primenet.com (Terry Lambert), hackers@FreeBSD.ORG Subject: Re: internationalization Message-ID: <6351.897526003@coconut.itojun.org> In-Reply-To: kline's message of Wed, 10 Jun 1998 17:15:33 MST. <199806110015.RAA09151@tao.thought.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, >> Another part of the problem is that XPG/4 is encoded multibyte, which >> is bad from a number of major perspectives, starting with ISO2022. > We've got v 2.0 of the xpg4 library in 2.2.6. > Do you know if any other flavor of BSD has more > complete support? I've been working on iso-2022 encoding support for runelocale (xpg4) library. At this moment I'm working on some specific packages (for example, nvi or scheduler software called "sch") but will be able to merge the modification into xpg4 library part. >> I would prefer going to a full-on Unicode implementation to support >> all known human languages. > This was my first leaning, but I'm increasingly > going toward the ISO families. Yes, iso-2022 families are quite important for supporting asian languages. Unicode is, for us Japanese, quite incomplete and unexpandable. >> I would suggest an initial 16 bit wchar_t with an assumption of a >> zero valued code page designator. If ISO ever gets around to adding >> other code pages, we can deal with that at that time using page >> selection. Meanwhile, we'll be able to interportate with Microsoft >> and JAVA, which use 16 bit wchar_t encodings. I would like wchar_t to be 32bit, OR MORE. We will see more mutliple 96x96 character pages at the same time so 16bit is really not enough. Modified xpg4 library assumes that wchar_t to be at least 32bit. Otherwise I cannot encode iso-2022 variant character sets into. >> The last time I converted csh, this was absolute hell because the >> code was badly organized for internationalization. >> The next hardest step is the editors, starting with "vi". They have >> to be able to support Unicode. > nvi/nex already have been tweaked for 8-bit international > support. I learned this accidently. WAs quite > surprised to see messages in French and German. :-) > Nonetheless, I see why you like the Unicode solution. > Someone said, ``Well, French support is great, but how > are you going to handle Japanese?'' Do you mean the internationalization of messages displayed by nvi? or file content? If it is the latter one, please install nvi-m17n from /usr/ports/{japanese,korean,chinese}/nvi-* and see how it works. (I'm responsible for nvi-m17n...) >> I have had FS-based Unicode support working for a very long time, >> though it has failed to be committed. One big issue is that directory >> entry blocks must grow from 512b to 1k. This has a number of >> implications to the soft updates work currently in progress. This is >> because, in order to support a maximally sized path component, 512 + 24 >> bytes is needed for unicaode, as opposed to 256 + 24 (which fits in 512b) >> for an 8 bit charaacter set. > :-( ! > How does the ISO2022 model work here? Isn't it the > same for Japanese and Chinese? Yes, for Japanese, Chinese and Korean iso-2022 based model (euc-xx falls into the category) is really important. However, I personally believe that filenames must be kept in C locale for simplicity... itojun To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6351.897526003>