From owner-freebsd-doc  Thu Nov 15  6: 6:18 2001
Delivered-To: freebsd-doc@freebsd.org
Received: from columbus.cris.net (ns.cris.net [212.110.128.65])
	by hub.freebsd.org (Postfix) with ESMTP id 6146F37B405
	for <freebsd-doc@FreeBSD.ORG>; Thu, 15 Nov 2001 06:06:12 -0800 (PST)
Received: from ark.cris.net (ns2.cris.net [212.110.128.68])
	by columbus.cris.net (8.9.3/8.9.3) with ESMTP id QAA62267;
	Thu, 15 Nov 2001 16:06:04 +0200 (EET)
Received: (from phantom@localhost)
	by ark.cris.net (8.11.1/8.11.1) id fAFE5Wk61697;
	Thu, 15 Nov 2001 16:05:32 +0200 (EET)
Date: Thu, 15 Nov 2001 16:05:32 +0200
From: Alexey Zelkin <phantom@FreeBSD.ORG>
To: Hiroki Sato <hrs@eos.ocn.ne.jp>
Cc: horcicka@FreeBSD.cz, freebsd-doc@FreeBSD.ORG
Subject: Re: Why TIDY can never work correctly with ISO-8859-2 and others
Message-ID: <20011115160532.A61351@ark.cris.net>
References: <20011115105650.W57038-100000@dual.ms.mff.cuni.cz> <20011115.214017.71143189.hrs@sekine00.ee.noda.sut.ac.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <20011115.214017.71143189.hrs@sekine00.ee.noda.sut.ac.jp>; from hrs@eos.ocn.ne.jp on Thu, Nov 15, 2001 at 09:40:17PM +0900
X-Operating-System: FreeBSD 3.5-STABLE i386
Sender: owner-freebsd-doc@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-doc.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-doc>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-doc>
X-Loop: FreeBSD.org

hi,

On Thu, Nov 15, 2001 at 09:40:17PM +0900, Hiroki Sato wrote:

> horcicka> And if you use char-encoding: raw - character entities with values above 255
> horcicka> are not printed as entities - this is really bad in 8-bit encodings.
> 
>  Yes, Japanese docs also suffer from it.  The input routine of tidy expands
>  any entities first, even if -raw flag is specified.
> 
> horcicka> In my opinion Tidy cannot be used for encodings it does not natively support
> horcicka> (i.e. for Russian and Czech (- still not in main CVS) translations of pages
> horcicka> and docs).
> 
>  I think so, too.
> 
>  As a workaround, we can apply a patch and use the modified
>  version of tidy that can suppress to interpret given entities
>  as entities themselves, but I do not know if it will be a good solution.

Most noticeable problem of -raw case
is converting &nbsp; to character with code 160. As enough
workaround for Russian translation we've used -latin1 case, but
anyway expanding of all entities except &nbsp; and &amp; is bad.

I am working on patch for tidy(1) to add new option which should
supress all entity -> character recoding. Hope it should be enough.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message