From owner-freebsd-i18n@FreeBSD.ORG Wed Aug 6 22:55:42 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A5E5F37B401 for ; Wed, 6 Aug 2003 22:55:42 -0700 (PDT) Received: from smtp02.syd.iprimus.net.au (smtp02.syd.iprimus.net.au [210.50.76.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id 251F343FA3 for ; Wed, 6 Aug 2003 22:55:42 -0700 (PDT) (envelope-from tim@robbins.dropbear.id.au) Received: from mail.robbins.dropbear.id.au (210.50.81.62) by smtp02.syd.iprimus.net.au (7.0.018) id 3F13130D004593B9 for freebsd-i18n@freebsd.org; Thu, 7 Aug 2003 15:55:40 +1000 Received: by mail.robbins.dropbear.id.au (Postfix, from userid 1000) id 77A49C90F; Thu, 7 Aug 2003 15:55:38 +1000 (EST) Date: Thu, 7 Aug 2003 15:55:38 +1000 From: Tim Robbins To: freebsd-i18n@freebsd.org Message-ID: <20030807055538.GA1428@dilbert.robbins.dropbear.id.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Aug 2003 05:55:43 -0000 I noticed that support for the GB18030 encoding was recently committed. I had already implemented it in a Perforce branch, along with the rest of my planned overhaul of the character encoding functions in libc for FreeBSD 6. The only thing that my implementation has that Robin Hu's doesn't is a manual page :-) I've attached my manual page, which I plan to commit in the next week or so. I'd appreciate comments from Chinese speakers or anyone who's generally clueful when it comes to character encodings. BTW, just to save duplication of effort in the future: I've already implemented the ISO-2022-CN and ISO-2022-JP encodings and all the related state-dependent encoding support, and will probably be committing it when 6.0-current is created. Thanks, Tim .\" [copyright header trimmed for mail] .\" .\" $FreeBSD$ .Dd March 30, 2003 .Dt GB18030 5 .Os .Sh NAME .Nm gb18030 .Nd "GB 18030 encoding method for Chinese text" .Sh SYNOPSIS .Nm ENCODING .Qq GB18030 .Sh DESCRIPTION The .Nm GB18030 encoding implements GB 18030-2000, a PRC National Standard for the encoding of Chinese characters. It is a superset of the older GB 2312-80 and GBK encodings. .Pp Multibyte characters in the GB18030 encoding can be one byte, two bytes, or four bytes long. There is a total of over 1.5 million code positions. .Pp The .Tn ASCII character set is represented by a single byte in the range 0x00 to 0x7F. .Pp Chinese characters are represented as either two bytes or four bytes. Characters which are represented by two bytes begin with a byte in the range 0x81-0xFE and end with a byte either in the range 0x40-0x7E or 0x80-0xFE. .Pp Characters which are represented by four bytes begin with a byte in the range 0x81-0xFE, have a second byte in the range 0x30-0x39, a third byte in the range 0x81-0xFE and a fourth byte in the range 0x30-0x39. .Sh SEE ALSO .Xr euc 4 , .Xr utf8 5 .Rs .%T "PRC National Standard GB 18030-2000" .%D "March 2000" .Re .Sh STANDARDS The .Nm encoding is believed to be compatible with GB 18030-2000.