From owner-freebsd-doc Thu Nov 15 6: 6:18 2001 Delivered-To: freebsd-doc@freebsd.org Received: from columbus.cris.net (ns.cris.net [212.110.128.65]) by hub.freebsd.org (Postfix) with ESMTP id 6146F37B405 for <freebsd-doc@FreeBSD.ORG>; Thu, 15 Nov 2001 06:06:12 -0800 (PST) Received: from ark.cris.net (ns2.cris.net [212.110.128.68]) by columbus.cris.net (8.9.3/8.9.3) with ESMTP id QAA62267; Thu, 15 Nov 2001 16:06:04 +0200 (EET) Received: (from phantom@localhost) by ark.cris.net (8.11.1/8.11.1) id fAFE5Wk61697; Thu, 15 Nov 2001 16:05:32 +0200 (EET) Date: Thu, 15 Nov 2001 16:05:32 +0200 From: Alexey Zelkin <phantom@FreeBSD.ORG> To: Hiroki Sato <hrs@eos.ocn.ne.jp> Cc: horcicka@FreeBSD.cz, freebsd-doc@FreeBSD.ORG Subject: Re: Why TIDY can never work correctly with ISO-8859-2 and others Message-ID: <20011115160532.A61351@ark.cris.net> References: <20011115105650.W57038-100000@dual.ms.mff.cuni.cz> <20011115.214017.71143189.hrs@sekine00.ee.noda.sut.ac.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20011115.214017.71143189.hrs@sekine00.ee.noda.sut.ac.jp>; from hrs@eos.ocn.ne.jp on Thu, Nov 15, 2001 at 09:40:17PM +0900 X-Operating-System: FreeBSD 3.5-STABLE i386 Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk List-ID: <freebsd-doc.FreeBSD.ORG> List-Archive: <http://docs.freebsd.org/mail/> (Web Archive) List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions) List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-doc> List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-doc> X-Loop: FreeBSD.org hi, On Thu, Nov 15, 2001 at 09:40:17PM +0900, Hiroki Sato wrote: > horcicka> And if you use char-encoding: raw - character entities with values above 255 > horcicka> are not printed as entities - this is really bad in 8-bit encodings. > > Yes, Japanese docs also suffer from it. The input routine of tidy expands > any entities first, even if -raw flag is specified. > > horcicka> In my opinion Tidy cannot be used for encodings it does not natively support > horcicka> (i.e. for Russian and Czech (- still not in main CVS) translations of pages > horcicka> and docs). > > I think so, too. > > As a workaround, we can apply a patch and use the modified > version of tidy that can suppress to interpret given entities > as entities themselves, but I do not know if it will be a good solution. Most noticeable problem of -raw case is converting to character with code 160. As enough workaround for Russian translation we've used -latin1 case, but anyway expanding of all entities except and & is bad. I am working on patch for tidy(1) to add new option which should supress all entity -> character recoding. Hope it should be enough. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message