From owner-freebsd-i18n Sat Dec 16 1: 5:45 2000 From owner-freebsd-i18n@FreeBSD.ORG Sat Dec 16 01:05:42 2000 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178]) by hub.freebsd.org (Postfix) with ESMTP id 0FD7437B400; Sat, 16 Dec 2000 01:05:42 -0800 (PST) Received: by peorth.iteration.net (Postfix, from userid 1001) id 53DD357463; Sat, 16 Dec 2000 03:06:04 -0600 (CST) Date: Sat, 16 Dec 2000 03:06:04 -0600 From: "Michael C . Wu" To: doc@freebsd.org, i18n@freebsd.org Subject: Docbook and CJK languages Message-ID: <20001216030604.B46336@peorth.iteration.net> Reply-To: "Michael C . Wu" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-PGP-Fingerprint: 5025 F691 F943 8128 48A8 5025 77CE 29C5 8FA1 2E20 X-PGP-Key-ID: 0x8FA12E20 Sender: keichii@peorth.iteration.net Sender: owner-freebsd-i18n@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG While working on some freebsd-taiwan docbook, we discovered this problem with Docbook/SGML not handling 2 byte characters correctly. For example: I have this line of text ("AA" and "BB" are two examples of 2 byte chars) AABBAABBAABBAABB When I compile this with output specified to text files. The correct behavior to cut them into two lines would be: AABBAABBAABB/n AABB/n However, sometimes the output comes out looking like: AABBAABBAABBA/n ABB/n (Note the broken AA char in the last part of the first line) This causes the whole doc to be broken and unreadable. Since subsequent encoding/decoding is off-by-one. And the problem can repeat several times in the documentation. Is there any way to fix this? Is there an SGML tag that I can specify? Or is this a lacking feature of Docbook? -- +------------------------------------------------------------------+ | keichii@peorth.iteration.net | keichii@bsdconspiracy.net | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +------------------------------------------------------------------+ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-i18n" in the body of the message From owner-freebsd-i18n Sat Dec 16 1:26:32 2000 From owner-freebsd-i18n@FreeBSD.ORG Sat Dec 16 01:26:30 2000 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from white.imgsrc.co.jp (ns.imgsrc.co.jp [210.226.20.2]) by hub.freebsd.org (Postfix) with ESMTP id 41A2737B400; Sat, 16 Dec 2000 01:26:29 -0800 (PST) Received: from waterblue.imgsrc.co.jp (waterblue.imgsrc.co.jp [210.226.20.160]) by white.imgsrc.co.jp (8.11.1/8.11.0) with ESMTP id eBG9QHZ58807; Sat, 16 Dec 2000 18:26:18 +0900 (JST) Date: Sat, 16 Dec 2000 18:26:16 +0900 Message-ID: <7mae9wemyv.wl@waterblue.imgsrc.co.jp> From: Jun Kuriyama To: "Michael C . Wu" Cc: doc@FreeBSD.ORG, i18n@FreeBSD.ORG Subject: Re: Docbook and CJK languages In-Reply-To: <20001216030604.B46336@peorth.iteration.net> References: <20001216030604.B46336@peorth.iteration.net> User-Agent: Wanderlust/2.3.92 (Roam) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 12) (Channel Islands) (i386--freebsd) MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-i18n@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG At 16 Dec 2000 09:05:49 GMT, Michael C . Wu wrote: > This causes the whole doc to be broken and unreadable. Since > subsequent encoding/decoding is off-by-one. And the problem > can repeat several times in the documentation. > > Is there any way to fix this? Is there an SGML tag that I can > specify? Or is this a lacking feature of Docbook? What encoding did you use? In Japanese environment, we choose EUC-JP because Jade can handle Japanese correctly only with that encoding. -- Jun Kuriyama // IMG SRC, Inc. // FreeBSD Project To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-i18n" in the body of the message From owner-freebsd-i18n Sat Dec 16 1:32:37 2000 From owner-freebsd-i18n@FreeBSD.ORG Sat Dec 16 01:32:34 2000 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178]) by hub.freebsd.org (Postfix) with ESMTP id 2755D37B400; Sat, 16 Dec 2000 01:32:33 -0800 (PST) Received: by peorth.iteration.net (Postfix, from userid 1001) id AFD0457463; Sat, 16 Dec 2000 03:32:48 -0600 (CST) Date: Sat, 16 Dec 2000 03:32:48 -0600 From: "Michael C . Wu" To: Jun Kuriyama Cc: doc@FreeBSD.ORG, i18n@FreeBSD.ORG Subject: Re: Docbook and CJK languages Message-ID: <20001216033248.A46685@peorth.iteration.net> Reply-To: "Michael C . Wu" References: <20001216030604.B46336@peorth.iteration.net> <7mae9wemyv.wl@waterblue.imgsrc.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <7mae9wemyv.wl@waterblue.imgsrc.co.jp>; from kuriyama@imgsrc.co.jp on Sat, Dec 16, 2000 at 06:26:16PM +0900 X-PGP-Fingerprint: 5025 F691 F943 8128 48A8 5025 77CE 29C5 8FA1 2E20 X-PGP-Key-ID: 0x8FA12E20 Sender: keichii@peorth.iteration.net Sender: owner-freebsd-i18n@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, Dec 16, 2000 at 06:26:16PM +0900, Jun Kuriyama scribbled: | At 16 Dec 2000 09:05:49 GMT, | Michael C . Wu wrote: | > This causes the whole doc to be broken and unreadable. Since | > subsequent encoding/decoding is off-by-one. And the problem | > can repeat several times in the documentation. | > | > Is there any way to fix this? Is there an SGML tag that I can | > specify? Or is this a lacking feature of Docbook? | | What encoding did you use? In Japanese environment, we choose EUC-JP | because Jade can handle Japanese correctly only with that encoding. Lame sucky zh_TW.Big5 ...:) Should we specify zh_TW.EUC ? -- +------------------------------------------------------------------+ | keichii@peorth.iteration.net | keichii@bsdconspiracy.net | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +------------------------------------------------------------------+ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-i18n" in the body of the message