From owner-freebsd-hackers  Thu May  4 13:01:10 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id NAA20830
          for hackers-outgoing; Thu, 4 May 1995 13:01:10 -0700
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16])
          by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id NAA20822
          ; Thu, 4 May 1995 13:01:00 -0700
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA10533; Thu, 4 May 95 13:54:06 MDT
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9505041954.AA10533@cs.weber.edu>
Subject: Re: Can someone explain the various forms of Japanese text encoding?
To: asami@cs.berkeley.edu (Satoshi Asami | =?ISO-2022-JP?B?GyRCQHUbKEI=?= =?ISO-2022-JP?B?GyRCOCsbKEIgGyRCOC0bKEI=?=)
Date: Thu, 4 May 95 13:54:06 MDT
Cc: jkh@time.cdrom.com, ache@FreeBSD.org, hackers@FreeBSD.org
In-Reply-To: <199505041146.EAA01328@silvia.HIP.Berkeley.EDU> from "Satoshi Asami | =?ISO-2022-JP?B?GyRCQHUbKEI=?= =?ISO-2022-JP?B?GyRCOCsbKEIgGyRCOC0bKEI=?=" at May 4, 95 04:46:20 am
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

>  * So far I've seen "romanji", which appears to be a romanized form of
>                      ^^^^^^^
>           this should be "romaji"

Actually, most of my English-person-studying-Japanese books have it as
"romanji"; this could be a locale difference.  8-).

> JIS (short for "Japan Industrial Standard", which is the Japanese

I always confuse "Industrial" with "International in this.  8-(.

> Other than the Esc for the bookmarks, it uses only the printable ASCII
> chars (a subrange of 0x20 - 0x7e).  Note that even in a purely
> Japanese document, the end of line is still represented by 0x0a so
> there is at least one Esc-$-B and Esc-(-B per line (unless the line
> contains no Japanese at all).

Yep.  The only method of resynchronizing in the event of line noise
when using this method is looking for an end-of-line or one of the
escape sequences.  8-(.

> In both JIS and EUC, the Japanese part is 2 bytes per char, and
> usually, the Japanese fonts are twice the width of their ASCII
> counterpart.  Thus, on a 80-character wide screen, you can display 40
> Japanese letters (for 80 bytes).  Of course JIS is a little longer due
> to the escape sequences.

The use of "half-character" fonts is one of the reasons that Unicode
is not well liked in Japan; with only 16 bits for an encoding, you
have the problem of associating two characters and of running out of
bits for a prerendered (ie: ROMable) font.

On the plus side, you can resynchronize EUC by looking for a high bit.

> If we are planning to support NEC's popular (in Japan) PC-9801 series
> of computers (with Japanese support built-in in its console), we'll
> need to go to the third standard, called "Shift-JIS" (meaning "shifted
> JIS") or "MS-Kanji".  This is a truly kludgy format, I don't even want
> to try to explain it here and let's not worry about it for now. ;)

Support for the NEC and the DOS V boxes is probably important in the
long run.  Install is a real hard nut to crack for internationaliztion;
typically, commercial vendors pass on it, either trusting that the
technical people will know enough English to do the install, or making
a totally different install distribution for the non-English markets.
In other words, it's not simply a US/Japan issue.

>  * 		      Given that I also have *no* Japanese fonts for
>  * syscons, I'm also somewhat limited in that dept. anyway.  There is a
>  * format I can display with the ISO8859-1 font, according to Satoshi,
>  * though I'm still a little unclear on how it works.
> 
> According to ME?!?  When did I say that? ;)  I don't think that's
> possible.... :<

Me neither.  The only thing I can think of is a 7x1 block of # signs,
and even English would be better than that.  8-P.

> Anyway, since we don't have fonts, I think we are pretty much stuck
> with romaji for now.  Oh well. :<

I thought that the console hardware in the NEC machines was capable
of Kanji display (it has ROM fonts)... Is that not true?

I think an ASCII with Kana (Katakana/Hirugana) in the high end (sort of
an ugly non-ISO pseudo 8859 font) would be superior to Romanji, wouldn't
it?  At least a lot more Japanese would be able to read it.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.