Date: Wed, 10 Jun 1998 21:55:44 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: kline@tao.thought.org (Gary Kline) Cc: hackers@FreeBSD.ORG Subject: Re: internationalization Message-ID: <199806102155.OAA13862@usr01.primenet.com> In-Reply-To: <199806101930.MAA08334@tao.thought.org> from "Gary Kline" at Jun 10, 98 12:30:40 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> I've been into this twice. The first time, briefly, in '96, and > for the past few weeks. Generating simple, efficient catalogues > for each of the 200+ utilities is a first task. > > There is a related issue of the system errs (currently in sys_errlist[]). > There are at least two rational ways to turn:: > > $ ENOENT > 2 No such file or directory > > into its French equiv:: > > $ ENOENT > 2 Fichier ou r\xe9pertoire introuvable > > I'm going ahead with my current implementation and look forward to > hearing from any other hackers who are interested in this. I'm interested. Part of the problem here is that FreeBSD doesn't fully support XPG/4. Another part of the problem is that XPG/4 is encoded multibyte, which is bad from a number of major perspectives, starting with ISO2022. I would prefer going to a full-on Unicode implementation to support all known human languages. I would suggest an initial 16 bit wchar_t with an assumption of a zero valued code page designator. If ISO ever gets around to adding other code pages, we can deal with that at that time using page selection. Meanwhile, we'll be able to interportate with Microsoft and JAVA, which use 16 bit wchar_t encodings. I think the first (and hardest) step is the shells. The shells need to be internationalized based on the fact that they (can) intrpret exit codes to the user as error messages. The last time I converted csh, this was absolute hell because the code was badly organized for internationalization. The next hardest step is the editors, starting with "vi". They have to be able to support Unicode. I have had FS-based Unicode support working for a very long time, though it has failed to be committed. One big issue is that directory entry blocks must grow from 512b to 1k. This has a number of implications to the soft updates work currently in progress. This is because, in order to support a maximally sized path component, 512 + 24 bytes is needed for unicaode, as opposed to 256 + 24 (which fits in 512b) for an 8 bit charaacter set. If we were to do something stupid, like UTF-7 or UTF-8, it would have to grow to 5 * 256 + 24, minimally, to support 5:1 character expansion possible, as opposed to the 2:1 of flat Unicode encoding. For character set attributed FS's (like NFS v2/v3 will have to be), you can do the translation in in the kernel on the blocks on their way out (a 2:1 expnasion in memory of a 1:1 disk image for a given ISO character set attribution for the filesystem). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199806102155.OAA13862>