FreeBSD Mail Archives

Date:      Mon, 16 Oct 1995 19:00:48 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        ache@astral.msk.su (=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=)
Cc:        terry@lambert.org, hackers@freefall.freebsd.org, joerg_wunsch@uriah.heep.sax.de, kaleb@x.org
Subject:   Re: A couple problems in FreeBSD 2.1.0-950922-SNAP
Message-ID:  <199510170200.TAA26131@phaeton.artisoft.com>
In-Reply-To: <WlS9mWmql2@ache.dialup.demos.ru> from "=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=" at Oct 17, 95 04:09:48 am


> >For one, my "hack" meets the definition of the ISO ratification of X3J11
> >and at the same time conforms to ISO 8859-x character set rules.
> 
> >It works for all ISO8859-x users, not just ISO8859-1.
> 
> >The difference is wherein the character code points are set based on
> >columnar location.  This was, in fact, one of the stated design goals
> >of the 8859-x standards.
> 
> Well, lets consider D7 char from 8859-1 exactly: is it
> ispunct() too f.e, in 8859-5?
> Lets consider DF char exactly, is it islower() too f.e. in 8859-5?

8859-5 is broken.  You admit this by using KOI-8 instead, yet you want
to use it as an example?

Let me point out that ispunct() is useful only in internationalized code;
make a choice: is your program to which your hack applies internationalized
or not?  If it is, then your hack doesn't apply.  If it isn't, then your
ispunct() argument doesn't apply.

The 8859-5 character set violates several design principles inherent in
the 8859-x character set family.  I will not appologize for that; the
people who designed the character set were not my countrymen; they only
ratified what the accepted standards body in charge of that set in their
own country put forth.

I remember the long argument on whether they had the right to make
standards or not that boiled down to decision at the time being "not
very representative", to be polite.  I can dig up the references, since
I have been saving everything I've seen of interest on internationalization
on this list and on the net for forever.

If this is the general consensus, then fine: lobby your standards body
and replace 8859-5.

But don't complain that compromises should be made because both 8859-5
and KOI-8 violate 8859-x design principles and that ISO arguably expanded
the ISO C to include 8859-1.

The point remains that this inconvenience is your incentive to fix the
code and properly internationalize it.

> BTW, why we even forced to be strictly in 8859 bounds? Why another
> charset with lower half equal to ASCII can't live too?

Because of precedent and because of ISO.  If you can give us a competing
standard to choose from, fine.  Right now, we have XPG3/XPG4 and POSIX
vs. ICO C locale definitions.

If you can show a standard that doesn't point at the 'C' locale, then
fine.  If you can show a standard that doesn't conflict with linear
indexing of Unicode or ISO10646 page 0 (both of which specify 0x0000-0x00ff
as ISO 8859-1), fine.

> >The one real issue is the collating sequence.  This is a non-issue for
> >"7-bit-ASCII-first" sort orders.  They will be correct.  It *IS* an
> >issue for "non-internationalized code pretending to be internationalized".
> >
> >I have absolutely no sympathy for such code; it should be fixed.
> 
> Well, it should be fixed by *WHOM* and *WHEN*? As you don't have sympathy,
> maybe you take this task as contacting to authors, fixing, etc.
> for each such program? Some of such programs needed right now,
> and I can't say to my users that they 'should be fixed', it means
> say nothing.

You're welcome to run with a non-standard extention to ctr0.o, or better
as a C library virtual base calss initializer using CTOR/DTOR magic to
put it in your own C library and not everywhere (you could even default
the locale to KOI-* in that case).

It should be fixed by the people who are annoyed by it not being fixed.

That means the end users, the people the end users complain to, and then
(eventually) the authors of the code who get complained to.

Just like Sun Microsystems.  You annoy them until they fix it.

> >If you need to make code that isn't internationalized and you want a hack,
> >call the setlocale(,"") in main() if the desired program.
> 
> It will be broken for locales wich char width > 8bits.
> Proper thing is to call non-standard startup_setlocale() which
> check char size not exceeds 8bit.

Or to specify XPG/3 instead of XPG/4.  XPG/4 marked the introduction of
the heinously bogus runic encoding methods and thus wide character process
encoding.  If you specify XPG/3, then you will be fine.

If you are worried about CJK and other "large glyph set" character sets
(ie: won't fit in 0x00-0xff), they have ISO-2022 locales and aren't
very interested in XPG/4 and/or Unicode/ISO10646 anyway because of the
inability to build multinationalized applications for multilingual
processing in the unified character sets.  That doesn't mean the Win95
and WinNT won't cause Unicode to take over the world whether anyone
likes it or not.  It will take over the world.  In the end, the end
user, not the programmers make the decisions.  All the end user cares
about is that it works, not about the amount of effort programmers
have to expend to make it work.  Arguing with large glyph set
internationalization using XPG/4 mechanisms as an example to the
contrary is non-productive.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199510170200.TAA26131>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation