Date: Mon, 16 Oct 1995 19:00:48 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: ache@astral.msk.su (=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=) Cc: terry@lambert.org, hackers@freefall.freebsd.org, joerg_wunsch@uriah.heep.sax.de, kaleb@x.org Subject: Re: A couple problems in FreeBSD 2.1.0-950922-SNAP Message-ID: <199510170200.TAA26131@phaeton.artisoft.com> In-Reply-To: <WlS9mWmql2@ache.dialup.demos.ru> from "=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=" at Oct 17, 95 04:09:48 am
next in thread | previous in thread | raw e-mail | index | archive | help
> >For one, my "hack" meets the definition of the ISO ratification of X3J11 > >and at the same time conforms to ISO 8859-x character set rules. > > >It works for all ISO8859-x users, not just ISO8859-1. > > >The difference is wherein the character code points are set based on > >columnar location. This was, in fact, one of the stated design goals > >of the 8859-x standards. > > Well, lets consider D7 char from 8859-1 exactly: is it > ispunct() too f.e, in 8859-5? > Lets consider DF char exactly, is it islower() too f.e. in 8859-5? 8859-5 is broken. You admit this by using KOI-8 instead, yet you want to use it as an example? Let me point out that ispunct() is useful only in internationalized code; make a choice: is your program to which your hack applies internationalized or not? If it is, then your hack doesn't apply. If it isn't, then your ispunct() argument doesn't apply. The 8859-5 character set violates several design principles inherent in the 8859-x character set family. I will not appologize for that; the people who designed the character set were not my countrymen; they only ratified what the accepted standards body in charge of that set in their own country put forth. I remember the long argument on whether they had the right to make standards or not that boiled down to decision at the time being "not very representative", to be polite. I can dig up the references, since I have been saving everything I've seen of interest on internationalization on this list and on the net for forever. If this is the general consensus, then fine: lobby your standards body and replace 8859-5. But don't complain that compromises should be made because both 8859-5 and KOI-8 violate 8859-x design principles and that ISO arguably expanded the ISO C to include 8859-1. The point remains that this inconvenience is your incentive to fix the code and properly internationalize it. > BTW, why we even forced to be strictly in 8859 bounds? Why another > charset with lower half equal to ASCII can't live too? Because of precedent and because of ISO. If you can give us a competing standard to choose from, fine. Right now, we have XPG3/XPG4 and POSIX vs. ICO C locale definitions. If you can show a standard that doesn't point at the 'C' locale, then fine. If you can show a standard that doesn't conflict with linear indexing of Unicode or ISO10646 page 0 (both of which specify 0x0000-0x00ff as ISO 8859-1), fine. > >The one real issue is the collating sequence. This is a non-issue for > >"7-bit-ASCII-first" sort orders. They will be correct. It *IS* an > >issue for "non-internationalized code pretending to be internationalized". > > > >I have absolutely no sympathy for such code; it should be fixed. > > Well, it should be fixed by *WHOM* and *WHEN*? As you don't have sympathy, > maybe you take this task as contacting to authors, fixing, etc. > for each such program? Some of such programs needed right now, > and I can't say to my users that they 'should be fixed', it means > say nothing. You're welcome to run with a non-standard extention to ctr0.o, or better as a C library virtual base calss initializer using CTOR/DTOR magic to put it in your own C library and not everywhere (you could even default the locale to KOI-* in that case). It should be fixed by the people who are annoyed by it not being fixed. That means the end users, the people the end users complain to, and then (eventually) the authors of the code who get complained to. Just like Sun Microsystems. You annoy them until they fix it. > >If you need to make code that isn't internationalized and you want a hack, > >call the setlocale(,"") in main() if the desired program. > > It will be broken for locales wich char width > 8bits. > Proper thing is to call non-standard startup_setlocale() which > check char size not exceeds 8bit. Or to specify XPG/3 instead of XPG/4. XPG/4 marked the introduction of the heinously bogus runic encoding methods and thus wide character process encoding. If you specify XPG/3, then you will be fine. If you are worried about CJK and other "large glyph set" character sets (ie: won't fit in 0x00-0xff), they have ISO-2022 locales and aren't very interested in XPG/4 and/or Unicode/ISO10646 anyway because of the inability to build multinationalized applications for multilingual processing in the unified character sets. That doesn't mean the Win95 and WinNT won't cause Unicode to take over the world whether anyone likes it or not. It will take over the world. In the end, the end user, not the programmers make the decisions. All the end user cares about is that it works, not about the amount of effort programmers have to expend to make it work. Arguing with large glyph set internationalization using XPG/4 mechanisms as an example to the contrary is non-productive. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199510170200.TAA26131>