From owner-freebsd-hackers Mon Oct 16 18:25:49 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id SAA02575 for hackers-outgoing; Mon, 16 Oct 1995 18:25:49 -0700 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id SAA02569 for ; Mon, 16 Oct 1995 18:25:44 -0700 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id SAA26017; Mon, 16 Oct 1995 18:20:35 -0700 From: Terry Lambert Message-Id: <199510170120.SAA26017@phaeton.artisoft.com> Subject: Re: A couple problems in FreeBSD 2.1.0-950922-SNAP To: ache@astral.msk.su (=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=) Date: Mon, 16 Oct 1995 18:20:35 -0700 (MST) Cc: terry@lambert.org, hackers@freefall.freebsd.org, kaleb@x.org In-Reply-To: from "=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=" at Oct 17, 95 02:40:38 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 1873 Sender: owner-hackers@FreeBSD.org Precedence: bulk > >This is valid for all 8859-x display/input systems, since the reuse of > >the code points are not transformed by this (8859-x does not encode > >characters in those locations). > > You consider one very simple case (isprint/iscontrol only) and think > that it is a proof. What you can say about ispunct() f.e.? > It is clearly differ into 8859-1 and 8859-5 f.e., islower/isupper differs > too. tolower/toupper differs too. Even isalpha differs. What did I say before about lobbying international standards bodies to replace 8859-5? I don't know if I buy the [is,to][upper,lower] distinctions. I think they are mainly for undefined code points, and getting the wrong result in an undefined are is not a problem. > >The only potentially incorrect behaviour is on blanks not being interpreted > >as blanks. If you want a blank, you shouldn't be using some wild code > >point other than 0x20 anyway. You get what you deserve. > > Well, isspace differs too. Space isn't 0x20 in 8859-5? Tab, LF, CR aren't the same? > >The problems you will encounter in this circumstance are all *very* > >specific to cases where a single file system is being used by multiple > >nationalities of clients. > > No it is different problem. By setting LANG for something != 8859-1 > (for programs that understands it) I assume that programs which > not understands it still works right. > If they are strict ASCII, I automatically protected from any > unwanted effects. If they are 8859-1 I need to classify > various unwanted effects for each != 8859-1 charset as > 'default undefined behaviour'. I agree. And this is precisely the problem with the crt0.o/setlocale() hack. You are implicitly removing the protection from unwanted effects. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.