Date: Sun, 27 Jun 1999 23:17:19 +0900 From: Motoyuki Konno <motoyuki@snipe.rim.or.jp> To: "Jordan K. Hubbard" <jkh@zippy.cdrom.com> Cc: Motoyuki Konno <motoyuki@snipe.rim.or.jp>, Nik Clayton <nclayton@lehman.com>, Jun Kuriyama <kuriyama@sky.rim.or.jp>, doc@FreeBSD.ORG, freebsd-translate@ngo.org.uk, jdp@FreeBSD.ORG Subject: Re: Resolution: FDP reorganisation Message-ID: <199906271417.XAA06581@rei.snipe.rim.or.jp> References: <67622.930333696@zippy.cdrom.com>
index | next in thread | previous in thread | raw e-mail
Hi,
"Jordan K. Hubbard" <jkh@zippy.cdrom.com> wrote:
> OK, so the Japanese folks have some sort of auto-conversion. That
> takes care of strictly the Japanese language, but what about the
> Chinese folks or the others that Nik pointed out? It seemed to me
> that he was looking for a much wider convention here, not just a
> solution to the ja problem.
If you want to know more about this, please read Ken Lunde's book
"CJKV Information Processing", from O'Reilly.
# CJKV means Chinese, Japanese, Korean & Vietnamese.
--------------------
For General:
ISO-2022: ISO-2022 is a '7 bit encoding method', because all
characters do not have their 8 bit enabled.
So, ISO-2022 encoding is very useful for e-mail, netnews.
EUC: EUC is short from 'Extended UNIX code'.
Japanese
--------
character set : JIS X 0208
encoding system: JIS, SJIS, EUC-JP
o JIS : also known as 'ISO-2022-JP', used for e-mail,
netnews. ISO-2022-JP is defined in RFC 1922.
o SJIS : short from 'Shift JIS'. DOS/Windows computers
and Macintosh use SJIS as internal code.
o EUC-JP : most UNIX computers use EUC-JP as internal code.
conversion between JIS, SJIS an EUC-JP is very easy.
Korean
------
character set : KS X 1001
encoding system : ISO-2022-KR, EUC-KR
o ISO-2022-KR : defined in RFC 1557. similar to ISO-2022-JP for
Japanese.
o EUC-KR : similar to EUC-JP for Japanese.
I have heard that many Korean people use EUC-KR
for e-mail, not ISO-2022-KR.
Chinese Taiwan
--------------
character set : CNS 11643 (traditional Chinese characters)
also known as 'Big5' (*1).
encoding system : ISO-2022-CN (*2), EUC-TW, Big5
o ISO-2022-CN : defined in RFC 1922.
o EUC-TW : similar to EUC-JP for Japanese.
o Big5 : Big5 encoding suports more characters than EUC-TW.
Ken Lunde says 'It seems a bit silly to compare
Big Five and EUC-TW encodings because they are
so different from one another' in his 'CJKV' book.
Chinese Mainland
----------------
character set : GB 2312 (simplified Chinese characters)
encoding system : ISO-2022-CN (*2), EUC-CN, GBK
o ISO-2022-CN : see the section 'Chinese Taiwan'.
o EUC-CN : similar to EUC-JP for Japanese.
o GBK : Windows computers use GBK as internal code.
EUC-CN is a subset of GBK.
*1: To be exact, CNS 11643 is corrected and supplemented version
of 'Big5'
*2: ISO-2022-CN supports both CNS (Taiwan) and GB (Chinese Mainland)
character sets.
--
------------------------------------------------------------------------
Motoyuki Konno mkonno@res.yamanashi-med.ac.jp (Univ)
motoyuki@snipe.rim.or.jp (Home)
motoyuki@FreeBSD.ORG (FreeBSD Project)
Yamanashi Medical University http://www.freebsd.org/~motoyuki/ (WWW)
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906271417.XAA06581>
