Date: Thu, 4 May 95 13:54:06 MDT From: terry@cs.weber.edu (Terry Lambert) To: asami@cs.berkeley.edu (Satoshi Asami | =?ISO-2022-JP?B?GyRCQHUbKEI=?= =?ISO-2022-JP?B?GyRCOCsbKEIgGyRCOC0bKEI=?=) Cc: jkh@time.cdrom.com, ache@FreeBSD.org, hackers@FreeBSD.org Subject: Re: Can someone explain the various forms of Japanese text encoding? Message-ID: <9505041954.AA10533@cs.weber.edu> In-Reply-To: <199505041146.EAA01328@silvia.HIP.Berkeley.EDU> from "Satoshi Asami | =?ISO-2022-JP?B?GyRCQHUbKEI=?= =?ISO-2022-JP?B?GyRCOCsbKEIgGyRCOC0bKEI=?=" at May 4, 95 04:46:20 am
next in thread | previous in thread | raw e-mail | index | archive | help
> * So far I've seen "romanji", which appears to be a romanized form of > ^^^^^^^ > this should be "romaji" Actually, most of my English-person-studying-Japanese books have it as "romanji"; this could be a locale difference. 8-). > JIS (short for "Japan Industrial Standard", which is the Japanese I always confuse "Industrial" with "International in this. 8-(. > Other than the Esc for the bookmarks, it uses only the printable ASCII > chars (a subrange of 0x20 - 0x7e). Note that even in a purely > Japanese document, the end of line is still represented by 0x0a so > there is at least one Esc-$-B and Esc-(-B per line (unless the line > contains no Japanese at all). Yep. The only method of resynchronizing in the event of line noise when using this method is looking for an end-of-line or one of the escape sequences. 8-(. > In both JIS and EUC, the Japanese part is 2 bytes per char, and > usually, the Japanese fonts are twice the width of their ASCII > counterpart. Thus, on a 80-character wide screen, you can display 40 > Japanese letters (for 80 bytes). Of course JIS is a little longer due > to the escape sequences. The use of "half-character" fonts is one of the reasons that Unicode is not well liked in Japan; with only 16 bits for an encoding, you have the problem of associating two characters and of running out of bits for a prerendered (ie: ROMable) font. On the plus side, you can resynchronize EUC by looking for a high bit. > If we are planning to support NEC's popular (in Japan) PC-9801 series > of computers (with Japanese support built-in in its console), we'll > need to go to the third standard, called "Shift-JIS" (meaning "shifted > JIS") or "MS-Kanji". This is a truly kludgy format, I don't even want > to try to explain it here and let's not worry about it for now. ;) Support for the NEC and the DOS V boxes is probably important in the long run. Install is a real hard nut to crack for internationaliztion; typically, commercial vendors pass on it, either trusting that the technical people will know enough English to do the install, or making a totally different install distribution for the non-English markets. In other words, it's not simply a US/Japan issue. > * Given that I also have *no* Japanese fonts for > * syscons, I'm also somewhat limited in that dept. anyway. There is a > * format I can display with the ISO8859-1 font, according to Satoshi, > * though I'm still a little unclear on how it works. > > According to ME?!? When did I say that? ;) I don't think that's > possible.... :< Me neither. The only thing I can think of is a 7x1 block of # signs, and even English would be better than that. 8-P. > Anyway, since we don't have fonts, I think we are pretty much stuck > with romaji for now. Oh well. :< I thought that the console hardware in the NEC machines was capable of Kanji display (it has ROM fonts)... Is that not true? I think an ASCII with Kana (Katakana/Hirugana) in the high end (sort of an ugly non-ISO pseudo 8859 font) would be superior to Romanji, wouldn't it? At least a lot more Japanese would be able to read it. Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9505041954.AA10533>