From owner-freebsd-hackers Mon Oct 16 14:00:51 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id OAA21039 for hackers-outgoing; Mon, 16 Oct 1995 14:00:51 -0700 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id OAA21031 for ; Mon, 16 Oct 1995 14:00:44 -0700 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id NAA25380; Mon, 16 Oct 1995 13:55:52 -0700 From: Terry Lambert Message-Id: <199510162055.NAA25380@phaeton.artisoft.com> Subject: Re: A couple problems in FreeBSD 2.1.0-950922-SNAP To: kaleb@x.org (Kaleb S. KEITHLEY) Date: Mon, 16 Oct 1995 13:55:52 -0700 (MST) Cc: hackers@freefall.freebsd.org, joerg_wunsch@uriah.heep.sax.de In-Reply-To: <199510160006.UAA06783@exalt.x.org> from "Kaleb S. KEITHLEY" at Oct 15, 95 08:06:30 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3961 Sender: owner-hackers@FreeBSD.org Precedence: bulk > > So SVR4 would still break on koi8-r, for example. > > No it wouldn't because SVR4 doesn't have a koi8-r locale. If it has > anything it probably is based on ISO8859-5, which, if I'm not mistaken, > uses ASCII on the left side and Cyrillic on the right side; thus a multi- > byte string like a file name might look different in one locale than in > another. isn't -9 Cyrillic? I think -5 is Greek? If I have these reversed, change -9 to -5 in my pervious posts re: Coptic/Cyrillic. > The only way to *really* solve this is to do something like use widechar > strings in the file system and declare that all filenames are encoded > in something like Unicode. Unless I misunderstood him, this is what Terry > Lambert was lobbying for a couple of weeks ago, when he was asking for > 16-bit wchar_t. This has all kinds of implications, but let's not go down > that rathole right now. :-) It's not really a rathole. I has it running in November of 1993. But yes, that's *exactly* what and why I was lobbying. > > Either make it right, or let it be. > > Define right! I don't see it as wrong to populate the right half of the > default chartype table with values that are useful in some particular > locale -- in this case "C". No more wrong than leaving them blank. It > is merely a convenience simple programs be able to do something useful > for the majority of the users. Is the customer always right? If a > particular tool isn't very useful in the general case, a customer might > choose another another tool that is, in the general case, more useful. Actually, I believe the ISO refomalization of the ANSI C standard defines 'C' as the default locale, and allows all characters not in 0x00-0x1f and 0x80-0x9f to be passed through unaltered. Personally, I hate XPG3/XPG4 locale support. If you must do it wrong, I'd suggest ISO2022. My personal preference is the allocated code pages of ISO10646 (in other words, 16 bit Unicode). > > isctype() is not necessarily related to message catalogs. > > ??? I didn't say it was. I said that changing programs to set the locale > was not very interesting (or necessary) unless you were going to make > them use message catalogs for their output. I agree. The use of an isctype table that does not follow the ISO conventions for 8859-x fonts may be ANSI compliant, but it is *NOT* ISO compliant. And once compliance is there, it's only odd-ball character sets which illegally use 0x80-0x9f as printed characters in violation of 3.64 (which is also formalized by ISO) and ASN.1 that will have problems with non internationalized code that doesn't call setlocale() properly. And the right way to correct that is to use an international standard 8859-x set instead of the "defacto standard" KOI-8. Or convert the programs. Don't put crap in crt0.o, or if you *do* put crap there, damn well don't turn it on and "crappify" everything by default. > > very undesirable results, e.g. SMTP daemons throwing their error > > messages in German. :-( > > It's hard for me to know how something like smtpd would get its locale > set to de_DE in order to do that, but I wonder if that wouldn't be what > I'd want if I were in Germany. It would be running in the German locale on a german machine and send back "no such user" errors to you in German. The correct way to fix this is to encapsulate error representation so that the encapsulated form is translated into the locale specific form by the agent for the user recieving the error. This is *precisely* why XPG3/XPG4 message catalog formalization sucks out, since it does nothing to define a cannonical form other than that of the string in the source code prior to abstraction, and so the ID for the message could vary from version to version, and each program would have to have it's own catalogs. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.