From owner-freebsd-hackers  Tue Oct 17 11:27:47 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id LAA02752
          for hackers-outgoing; Tue, 17 Oct 1995 11:27:47 -0700
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id LAA02745
          for <freebsd-hackers@FreeBSD.ORG>; Tue, 17 Oct 1995 11:27:43 -0700
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA27951; Tue, 17 Oct 1995 11:21:46 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199510171821.LAA27951@phaeton.artisoft.com>
Subject: Re: A couple problems in FreeBSD 2.1.0-950922-SNAP
To: joerg_wunsch@uriah.heep.sax.de
Date: Tue, 17 Oct 1995 11:21:46 -0700 (MST)
Cc: freebsd-hackers@FreeBSD.ORG
In-Reply-To: <199510162245.XAA27289@uriah.heep.sax.de> from "J Wunsch" at Oct 16, 95 11:45:15 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3497      
Sender: owner-hackers@FreeBSD.ORG
Precedence: bulk

> > The problem with KOI-8 is that KOI-8 is a defacto standard, and is not
> > accepted by international standards bodies.  Mostly because the most
> > popular BBS software in the area picked it up instead of 8859-9.
> 
> The X Consortium finally agreed to accept koi8-r as a valid character
> set/encoding.
> 
> :-)

8-(.

> Well, if we would rely on things like ISO, we wouldn't use IP etc.
> and suffer from OSI/X.400 instead...

Or X.500.  Wait, that would be a good thing.

I agree that FTAM and all that cruft really, really sucks.  I'm only
pointing at ISO so that you Western Europeans (8-)) don't have to
live with a C locale that doesn't include high bits (collation, etc.
is your own problem -- fix the code if you want it).

> > The problem is not in the blank areas of the locale.
> > 
> > In point of fact, the ANSI standards for terminal control sequences
> > after ANSI 3.64 leave the codes in columns 0x80 and 0x90 to be used
> > to represent 8 bit command sequence introducers, which are the same
> > as an escape character followed by a character in columns 0x20 or 0x30.
> > Because of this, KOI-8 as a character set is not compatible with post
> > 3.64 ANSI terminal control sequence standardization.
> 
> Do you know KOI8-R?  It doesn't even touch those areas.  This is NOT
> IBM's code page crime.  KOI8-R does basically use the same printable
> characters like ISO-8859-*.  The most notable difference to the
> ISO-8859-* fonts is that KOI has the upper/lower case reversed for
> some obscure reason.

According to the Taligent published translation tables (yes, I know,
they swear they don't own them) for KOI8<->Unicode round trip conversion,
those code points are allocated but optional.

I really think you are looking at a font that doesn't implement all the
code points in the character set.

> > Really, they should be using the 8859 character set instead of KOI-8,
> > but there is understood to be a large historical investment in the
> > non-standard KOI-8 representation (unfortunately).
> 
> You're sounding like the OSI protagonists when they started the German
> educational network project (WiN) here. :-)

I'm unfamiliar with the project, but if we sound alike, then they are
probably right.  8-).  Just kidding; OSI protocols rot.

The "historical investment issue" with ISO 2022 and JIS-208/JIS-208,212
is the reason for the Japanese opposition to Unicode.  Well, that and
the use of Chinese dictionary sort order for the ideogrammatic characters
in the CJK unification part of the standard, and the fact that you can't
mask out all non-Japanese characters using a bit test.  8-).

The question is whether a "round trip" translation can be done
transparently, and the storage encoding varied for new systems with
little impact.  I think the answer is "yes".

I'd be happy to ditch ASCII ordering for this as well, if it were
necessary.  ASCII happens to have columnar seperation for case
conversion that is probably wrong, actually, and the "(", "[", "{",
"<", and "`" characters ought to have their corresponding
pairings a shift width apart.  Maybe "\" and "/" and "-" and "+" should
as well.

Though this might play hell with bit test based command sequence
recognition for 3.64 terminals.

The point is that no one has proposed a better standard to which ASCII
isn't already conformant, for better or for worse.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.