Date: Sun, 27 Jun 1999 23:17:19 +0900 From: Motoyuki Konno <motoyuki@snipe.rim.or.jp> To: "Jordan K. Hubbard" <jkh@zippy.cdrom.com> Cc: Motoyuki Konno <motoyuki@snipe.rim.or.jp>, Nik Clayton <nclayton@lehman.com>, Jun Kuriyama <kuriyama@sky.rim.or.jp>, doc@FreeBSD.ORG, freebsd-translate@ngo.org.uk, jdp@FreeBSD.ORG Subject: Re: Resolution: FDP reorganisation Message-ID: <199906271417.XAA06581@rei.snipe.rim.or.jp> References: <67622.930333696@zippy.cdrom.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, "Jordan K. Hubbard" <jkh@zippy.cdrom.com> wrote: > OK, so the Japanese folks have some sort of auto-conversion. That > takes care of strictly the Japanese language, but what about the > Chinese folks or the others that Nik pointed out? It seemed to me > that he was looking for a much wider convention here, not just a > solution to the ja problem. If you want to know more about this, please read Ken Lunde's book "CJKV Information Processing", from O'Reilly. # CJKV means Chinese, Japanese, Korean & Vietnamese. -------------------- For General: ISO-2022: ISO-2022 is a '7 bit encoding method', because all characters do not have their 8 bit enabled. So, ISO-2022 encoding is very useful for e-mail, netnews. EUC: EUC is short from 'Extended UNIX code'. Japanese -------- character set : JIS X 0208 encoding system: JIS, SJIS, EUC-JP o JIS : also known as 'ISO-2022-JP', used for e-mail, netnews. ISO-2022-JP is defined in RFC 1922. o SJIS : short from 'Shift JIS'. DOS/Windows computers and Macintosh use SJIS as internal code. o EUC-JP : most UNIX computers use EUC-JP as internal code. conversion between JIS, SJIS an EUC-JP is very easy. Korean ------ character set : KS X 1001 encoding system : ISO-2022-KR, EUC-KR o ISO-2022-KR : defined in RFC 1557. similar to ISO-2022-JP for Japanese. o EUC-KR : similar to EUC-JP for Japanese. I have heard that many Korean people use EUC-KR for e-mail, not ISO-2022-KR. Chinese Taiwan -------------- character set : CNS 11643 (traditional Chinese characters) also known as 'Big5' (*1). encoding system : ISO-2022-CN (*2), EUC-TW, Big5 o ISO-2022-CN : defined in RFC 1922. o EUC-TW : similar to EUC-JP for Japanese. o Big5 : Big5 encoding suports more characters than EUC-TW. Ken Lunde says 'It seems a bit silly to compare Big Five and EUC-TW encodings because they are so different from one another' in his 'CJKV' book. Chinese Mainland ---------------- character set : GB 2312 (simplified Chinese characters) encoding system : ISO-2022-CN (*2), EUC-CN, GBK o ISO-2022-CN : see the section 'Chinese Taiwan'. o EUC-CN : similar to EUC-JP for Japanese. o GBK : Windows computers use GBK as internal code. EUC-CN is a subset of GBK. *1: To be exact, CNS 11643 is corrected and supplemented version of 'Big5' *2: ISO-2022-CN supports both CNS (Taiwan) and GB (Chinese Mainland) character sets. -- ------------------------------------------------------------------------ Motoyuki Konno mkonno@res.yamanashi-med.ac.jp (Univ) motoyuki@snipe.rim.or.jp (Home) motoyuki@FreeBSD.ORG (FreeBSD Project) Yamanashi Medical University http://www.freebsd.org/~motoyuki/ (WWW) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906271417.XAA06581>