Date: Thu, 15 Nov 2001 16:05:32 +0200 From: Alexey Zelkin <phantom@FreeBSD.ORG> To: Hiroki Sato <hrs@eos.ocn.ne.jp> Cc: horcicka@FreeBSD.cz, freebsd-doc@FreeBSD.ORG Subject: Re: Why TIDY can never work correctly with ISO-8859-2 and others Message-ID: <20011115160532.A61351@ark.cris.net> In-Reply-To: <20011115.214017.71143189.hrs@sekine00.ee.noda.sut.ac.jp>; from hrs@eos.ocn.ne.jp on Thu, Nov 15, 2001 at 09:40:17PM %2B0900 References: <20011115105650.W57038-100000@dual.ms.mff.cuni.cz> <20011115.214017.71143189.hrs@sekine00.ee.noda.sut.ac.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
hi, On Thu, Nov 15, 2001 at 09:40:17PM +0900, Hiroki Sato wrote: > horcicka> And if you use char-encoding: raw - character entities with values above 255 > horcicka> are not printed as entities - this is really bad in 8-bit encodings. > > Yes, Japanese docs also suffer from it. The input routine of tidy expands > any entities first, even if -raw flag is specified. > > horcicka> In my opinion Tidy cannot be used for encodings it does not natively support > horcicka> (i.e. for Russian and Czech (- still not in main CVS) translations of pages > horcicka> and docs). > > I think so, too. > > As a workaround, we can apply a patch and use the modified > version of tidy that can suppress to interpret given entities > as entities themselves, but I do not know if it will be a good solution. Most noticeable problem of -raw case is converting to character with code 160. As enough workaround for Russian translation we've used -latin1 case, but anyway expanding of all entities except and & is bad. I am working on patch for tidy(1) to add new option which should supress all entity -> character recoding. Hope it should be enough. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011115160532.A61351>