Date: Fri, 12 Jun 1998 16:15:09 +0200 (MET DST) From: Nick Hibma <nick.hibma@jrc.it> To: Gary Kline <kline@tao.thought.org> Cc: FreeBSD hackers mailing list <hackers@FreeBSD.ORG> Subject: Re: internationalization Message-ID: <Pine.GSO.3.95q.980612160028.19080W-100000@elect8> In-Reply-To: <199806120451.VAA13595@tao.thought.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 11 Jun 1998, Gary Kline wrote: Could someone translate this into a language called English? Some people on this list are non native speakers of English, like myself, and have a 'syntax error' or 'decoding error' on the fragment below. My compliments for your grasp of the English language and my compliments as well to the makers of the dictionary at http://www.leo.org/cgi-bin/dict-search http://dict.leo.org/dict/dictionaries.html If people could restrict themselves to general English language, not slang nor uncommon words and no n dimensional (with n big) sentences, that would be appreciated. Nick P.S.: English German Dutch bigotry die Engstirnigkeit kortzichtigheid ilk die Sorte vorm premise die Voraussetzung de veronderstelling grasp die Auffassungsgabe het begrip > According to Terry Lambert: > > > Let me pose the same question, a bit more broadly. > > > Why cannot we support _both_ the ISO and Unicode > > > paradigms? Are these absolutely incompatible systems? > > > Is there some kind of ``religious-war''? Or is it > > > simply too difficult? > > > > ISO 10646 code page 0 *is* Unicode, by definition. > > > > The religious aspects have to do with the old trade-offs the various > > programmers are already used to, the new trade-offs the various > > programmers would have to start putting up with, and the various > > language bigotries people bring to the table. > > > I'm approaching this with relatively little bigotry or > other baggage; my bias is against bias itself. That said, > I've been around enough decades to realize that virtually > everyone carries latent bigotries of some ilk. I'd just > rather stay above as much of it as possible here. > > So far this discussion looks promiising; and thanks here > noted to everyone. > > > > > > > > Major premise: everyone is going to have to put up with a non-8-bit > > wchar_t internally in their applications. This is called the "raw" > > or "process" representation. > > > This, I not only believe, but agree with. Memory is cheap; > disk is cheap; so having character set be a wchar_t (either > 16 or 32 bits) is no major obstacle. > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > Are most of your files in ASCII? > > ---------------------------------------------------------------------- > > > > Then you want UTF7/UTF8/ISO2022 encoding, so you don't have to change > > them. Unless you plan to export your software. Let the non-English > > speaking world deal with the incompatabilities and storage bloat > > problems. You'll deal with it in your software when Japan and Europe > > "get their act together" and standardize on IBM-PC derived hardware > > so that your software won't have to be ported to run. > > > > Besides, C code is in the "C" locale, and that's US-ASCII already. > > GCC supports tri-glyphs, right? > > > > ---------------------------------------------------------------------- > > Are most of your files in ISO8859-X and/or KOI-8X? > > ---------------------------------------------------------------------- > > > > Then you don't want UTF7/UTF8, because if you get them, some > > characters that currently take up one byte will take up between one > > and three bytes (one if they are US ASCII, more if they are in the > > 0x80-0xff range). > > > > You also don't want ISO2022, because instead of simply choosing a > > locale for all your data, you will have to deal with character set > > shift processing. > > > > You could put up with UTF2, because you could do kernel magic to > > expand existing text files on existing filesystems by setting a > > per FS attribute that tells how to get the data in and out of > > Unicode representation. You still need a "magic doohickey" that > > tells the filesystem to do this for text files, but not for other > > files. > > > > ---------------------------------------------------------------------- > > Are most of your files in ISO2022-jp (JIS-208/JIS-212)? > > ---------------------------------------------------------------------- > > > > Then you don't want UTF7/UTF8/UTF2 encoding, because you don't > > want to have to convert your data. You don't want Unicode because > > it means you'll have to deal with the sorting problem all over > > again because Unicode's collation sequence isn't the JIS-208/JIS-212 > > collation sequence. > > > > I understand your point, Terry. Over the coming days, > weeks, I'll experiment with 16- and 32-bit wide chars, > and see how Ito-san's nvi's port works. If his iso-2022 > messages are catalogs, that's most of the battle. > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > > > What this boils down to is language bigotry, and whose language > > you prefer. Generally, the preference is either driven by personal > > or economic interests (like competitive advantage to your own locale > > from having your locale's preferred method chosen. > > > > The short sighted approach is to make the decision based on your own > > personal bigotry. > > > > The longer sighted approach is to make the decision which has the > > best workarounds for backward compatability and in-place conversion, > > and the least impact in the future based on the assumption that the > > software market is going to normalize all over the world at some point > > in the future, and you just may be around still and have to deal with > > it. Like the Y2K problem. > > > By the time the market normalizes we're likely to be dust. > Eventually tho, sure. > > > > > > > > > If the aliens land, and we end up needing more than 2^16 characters > > in out wchar_t space, well, we can deal with that problem when it > > happens. > > > > I think we already need 32-bit wchar_t's now. For the sake > of completeness. ... To be continued. > > gary > > > > > > > > -- > Gary D. Kline kline@tao.thought.org Public service uNix > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > STA-ISIS, T.P.270, Joint Research Centre, Italy building: 27A tel.: +39 332 78 9549 fax.: +39 332 78 9185 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.3.95q.980612160028.19080W-100000>