Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Oct 1995 05:05:20 +0300 (MSK)
From:      =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (aka Andrey A. Chernov, Black Mage) <ache@astral.msk.su>
To:        Terry Lambert <terry@lambert.org>
Cc:        hackers@freefall.freebsd.org, joerg_wunsch@uriah.heep.sax.de, kaleb@x.org
Subject:   Re: A couple problems in FreeBSD 2.1.0-950922-SNAP
Message-ID:  <ZlWzmWmyv3@ache.dialup.demos.ru>
In-Reply-To: <199510170115.SAA25982@phaeton.artisoft.com>; from Terry Lambert at Mon, 16 Oct 1995 18:15:15 -0700 (MST)
References:  <199510170115.SAA25982@phaeton.artisoft.com>

next in thread | previous in thread | raw e-mail | index | archive | help
In message <199510170115.SAA25982@phaeton.artisoft.com> Terry Lambert
    writes:

>I can *potentially* see ispunct() (though I can't think of any
>concrete examples off my head; maybe in -9?), and the collating
>sequence is a problem.

Not only ispunct(), if you dig deeper, just put all 8859-*
charset in front of you and see how they are really differ.

>But this is a problem regardless.  If the code isn't internationalized,
>it isn't internationalized, and anything you do to pretend it is without
>actually fixing the code is a kludge.

ASCII default table is the way to avoid pitfails because it restricts
all operations to 7bit (disallows all 8bit stuff). It is one of the
reasons why I vote for ASCII.

>The correct thing to do is to call setlocale() in the source.  You could,
>if you wanted a "quick fix", use setlocale(,""), per your crt0.o hack.

Calling setlocale() is impossible for 8bit clean programs if they
are not aware of multi-byte characters. I use special version of
setlocale (statrtup_setlocale) in my crt0 hack which is restricted to <=8bit
char sizes only.

>If you care about collation sequence, then you'll internationalize your
>code.

Well, but how's about strftime? It isn't supposed to call setlocale before
or what? I saw several places in our sources when strftime was called
without any setlocale, does all of them need to be fixed?

>Then use 8859-5 character encoding.  The only deficiency re: KOI8 is
>that it doesn't match existing data you already have on disk.

8859-5 not goes in any case. It not my decision but whole russian users
community (SUUG - Soviet Unix Users Group).

>> It means that
>> 1) all is*() macros must be correct for russian charset (LC_CTYPE).

>This will work for 8859-5.  Characters that are completely bogus will
>fail, but they'd fail anyway.

Characters that are completely bogus in 8859-1 is valid letters
in 8859-5. :-)

>> 2) strftime must return national data (LC_TIME).

>Explicitly call setlocale().

See above.

>KOI8 is a peculiar locale in that it doesn't follow the 8859-x rules
>like it should.  Like EBCDIC, it needs to die in the long term.  On

And WHY IT SHOULD DO anything? It is EXISTEN CODE TABLE and LOCALES
must be adopted for it and not vice versa. I promise you that
it never dies in nearest 20-40 years, its population grows
whith each new Internet user.

>> Maybe this functionality isn't kosher but you even can't imagine how
>> it is useful.

>This whole issue is very similar to the problems that were involved in
>going to an unmapped page 0, causing NULL dereferences to SIGSEGV.  In
>the short term, you lost functionality because you couldn't run some
>programs you used to be able to run.

It isn't correct example for this case.

>In the locale case, you lose the ability to run 8 bit clean code as if
>it had been properly internationalized, while making other code plain
>miserable to use.

>Without the imlied setlocale() call in crt0.o, there is an immediate
>benefit of ~1.1M of disk in static binaries (from Kaleb's numbers), and

It is less than this value, if you want, I'll tell exactly.

>the code that isn't internationalized becomes readily apparent.  Just
>as the code that dereferenced NULL became readily apparent when page 0
>was unmapped.

*WHO* will internationalize such code?

>Setting an "undefined" equality with 8859-1 preserves 8 bit clean
>operability in the majority of cases, and in the others, the only
>way that they could have been able to get the functionality was to
>have partially internationalized their code (you can't get at the
>altered collation sequence without some knowledge of internationalization
>implicit in the code).


>The net effect is that more code gets internationalized correctly, which
>is in everyone's best interests and increases the code portability instead
>of tying the users to FreeBSD.

Well, as I already say only thing that makes me stay against
propogating was non-matched is*() stuff. From your words
I assume that you simple do nothing with it and marks
incompatibles as 'improper i18n in any case', well,
it is the way :-)

BTW,

I assume to keep my hack in the state as is, because too many
russian users already relays on it. I consider possibilites to
reduce bloat by ways that Bruce point, i.e. libc ctype cleanup
and two different startup_locale stubs for real ctype and for fake.

-- 
Andrey A. Chernov        : And I rest so composedly,  /Now, in my bed,
ache@astral.msk.su       : That any beholder  /Might fancy me dead -
FidoNet: 2:5020/230.3    : Might start at beholding me,  /Thinking me dead.
RELCOM Team,FreeBSD Team :         E.A.Poe         From "For Annie" 1849



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ZlWzmWmyv3>