From owner-freebsd-hackers  Mon Oct 16 14:00:51 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id OAA21039
          for hackers-outgoing; Mon, 16 Oct 1995 14:00:51 -0700
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id OAA21031
          for <hackers@freefall.freebsd.org>; Mon, 16 Oct 1995 14:00:44 -0700
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id NAA25380; Mon, 16 Oct 1995 13:55:52 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199510162055.NAA25380@phaeton.artisoft.com>
Subject: Re: A couple problems in FreeBSD 2.1.0-950922-SNAP
To: kaleb@x.org (Kaleb S. KEITHLEY)
Date: Mon, 16 Oct 1995 13:55:52 -0700 (MST)
Cc: hackers@freefall.freebsd.org, joerg_wunsch@uriah.heep.sax.de
In-Reply-To: <199510160006.UAA06783@exalt.x.org> from "Kaleb S. KEITHLEY" at Oct 15, 95 08:06:30 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3961      
Sender: owner-hackers@FreeBSD.org
Precedence: bulk

> > So SVR4 would still break on koi8-r, for example.  
> 
> No it wouldn't because SVR4 doesn't have a koi8-r locale. If it has
> anything it probably is based on ISO8859-5, which, if I'm not mistaken,
> uses ASCII on the left side and Cyrillic on the right side; thus a multi-
> byte string like a file name might look different in one locale than in
> another. 

isn't -9 Cyrillic?  I think -5 is Greek?

If I have these reversed, change -9 to -5 in my pervious posts re:
Coptic/Cyrillic.

> The only way to *really* solve this is to do something like use widechar 
> strings in the file system and declare that all filenames are encoded
> in something like Unicode. Unless I misunderstood him, this is what Terry 
> Lambert was lobbying for a couple of weeks ago, when he was asking for
> 16-bit wchar_t. This has all kinds of implications, but let's not go down
> that rathole right now. :-)

It's not really a rathole.  I has it running in November of 1993.  But
yes, that's *exactly* what and why I was lobbying.

> > Either make it right, or let it be.
> 
> Define right! I don't see it as wrong to populate the right half of the
> default chartype table with values that are useful in some particular
> locale -- in this case "C". No more wrong than leaving them blank. It 
> is merely a convenience simple programs be able to do something useful 
> for the majority of the users. Is the customer always right? If a 
> particular tool isn't very useful in the general case, a customer might 
> choose another another tool that is, in the general case, more useful.

Actually, I believe the ISO refomalization of the ANSI C standard defines
'C' as the default locale, and allows all characters not in 0x00-0x1f and
0x80-0x9f to be passed through unaltered.

Personally, I hate XPG3/XPG4 locale support.  If you must do it wrong,
I'd suggest ISO2022.  My personal preference is the allocated code
pages of ISO10646 (in other words, 16 bit Unicode).

> > isctype() is not necessarily related to message catalogs.  
> 
> ??? I didn't say it was. I said that changing programs to set the locale
> was not very interesting (or necessary) unless you were going to make
> them use message catalogs for their output.

I agree.  The use of an isctype table that does not follow the ISO
conventions for 8859-x fonts may be ANSI compliant, but it is *NOT* ISO
compliant.

And once compliance is there, it's only odd-ball character sets which
illegally use 0x80-0x9f as printed characters in violation of 3.64 (which
is also formalized by ISO) and ASN.1 that will have problems with non
internationalized code that doesn't call setlocale() properly.

And the right way to correct that is to use an international standard
8859-x set instead of the "defacto standard" KOI-8.

Or convert the programs.

Don't put crap in crt0.o, or if you *do* put crap there, damn well don't
turn it on and "crappify" everything by default.

> > very undesirable results, e.g. SMTP daemons throwing their error
> > messages in German. :-(
> 
> It's hard for me to know how something like smtpd would get its locale
> set to de_DE in order to do that, but I wonder if that wouldn't be what
> I'd want if I were in Germany.

It would be running in the German locale on a german machine and send back
"no such user" errors to you in German.

The correct way to fix this is to encapsulate error representation so
that the encapsulated form is translated into the locale specific form
by the agent for the user recieving the error.

This is *precisely* why XPG3/XPG4 message catalog formalization sucks
out, since it does nothing to define a cannonical form other than that
of the string in the source code prior to abstraction, and so the ID
for the message could vary from version to version, and each program
would have to have it's own catalogs.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.