From owner-freebsd-hackers Mon Oct 16 14:42:54 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id OAA23146 for hackers-outgoing; Mon, 16 Oct 1995 14:42:54 -0700 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id OAA23139 for ; Mon, 16 Oct 1995 14:42:49 -0700 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA25492; Mon, 16 Oct 1995 14:37:18 -0700 From: Terry Lambert Message-Id: <199510162137.OAA25492@phaeton.artisoft.com> Subject: Re: A couple problems in FreeBSD 2.1.0-950922-SNAP To: ache@astral.msk.su (=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=) Date: Mon, 16 Oct 1995 14:37:18 -0700 (MST) Cc: terry@lambert.org, bde@zeta.org.au, hackers@freefall.freebsd.org, j@uriah.heep.sax.de, kaleb@x.org In-Reply-To: from "=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=" at Jan 17, 95 00:00:12 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3330 Sender: owner-hackers@FreeBSD.org Precedence: bulk > >> And what? Now too many pgms require proper locale support, even ls, > >> so we can't avoid this thing. Code added regardles of > >> ENABLE_STARTUP_LOCALE set or not, so 'hack' means this variable > >> as I understand and not code added. As I already say, > >> I can revert default case to pick ctype and use variable > >> DISABLE_STARTUP_LOCALE to disable it for debugging purposes. > > >aaaaaaaaaaaaaaauuuuuuuuuuuuuuuuuuuuuuuuuuggggggggggggggggggghhhhhhhhhhhhh! > > >Why do we think ls requires this? > > It is simple: to display native filenames. Excuse me. All you need is the correct matching keyboard/font, an 8 bit clean code path (which the current limited C locale and automatically calling setlocale() in ctr0.o screws up), and the guarantee that your character encodings don't stomp on control sequence reserved areas, like 0x00-0x1f,0x80-0x9f. Except for the bogus C locale (which I agree is bogus), and the fact that KOI-8 disrepectfully stomps on control areas with its data, you already have all that. To get around the stomping, you'll have to define a locale and make the programs locale aware. Or get an encoding standard that respects 8859-x and ISO control encoding. > >Because the default locale is 'C', doesn't mean that the default locale > >should not be ISO 8 bit clean. > > It is already 8bit clean. You can safely call ctype(>127). Excuse me. The C locale does not return the same values as 8859-1. It is not ISO 8 bit encoding clean. > >Also, programs whose output is limited in this fashion should be > >explicitly calling setlocale(), or they are only half-assed in their > >attempt to support internationalization. > > Correct ctype != half-assed. > Correct ctype != full i18n > Correct ctype is what user expects at least. Read the ISO standization of the ANSI C standard with respect to the C locale. The specific wording is "is undefined". You can make it return whatever you want it to for that. Including i18n. > Majority of users use various 8bit charsets and >8bit charsets > isn't commonly used. Why not make life easier for all 8bit charsets > users, if this not affects at all >8bit users? Exactly. Define the undefined portions of the C locale to act in an implementation dependent fashion. That happens to look exactly like 8859-1. > >In the case that it is explicitly called (ie: programs supposedly using > >these features), then the hack is unnecessary. > > And what? Second call does no-op. First call should not be made at all in a non-internationalized program; the default behaviour should be i18n. > >Likewise, if the program is *not* using theses features, then they > >should stick their ugly noses into the tent uninvited. > > Users prefers to interact in native language with all programs > which they have. It is hard to explain to user why tcsh reacts > on LANG settings when ls does not. Neither one should react, or at least the characters displayed should not change. The conversion of the high bit set characters into '?' in ls is broken. When a character in the 0x20-0x7f,0xa0-0xff range is put out, it should not be translated or otherwise multilated if you are in a C or i18n locale. Period. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.