Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 06 Feb 2004 01:49:40 +0900 (JST)
From:      Hiroki Sato <hrs@FreeBSD.org>
To:        ale@FreeBSD.org
Cc:        phantom@FreeBSD.org.ua
Subject:   Re: tidy flag
Message-ID:  <20040206.014940.23072599.hrs@eos.ocn.ne.jp>
In-Reply-To: <20040205063847.GA13136@phantom.cris.net>
References:  <20040204.171343.23008681.hrs@eos.ocn.ne.jp> <402171B7.7020205@FreeBSD.org> <20040205063847.GA13136@phantom.cris.net>

next in thread | previous in thread | raw e-mail | index | archive | help
----Security_Multipart(Fri_Feb__6_01_49_40_2004_155)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Alexey Zelkin <phantom@FreeBSD.org.ua> wrote
  in <20040205063847.GA13136@phantom.cris.net>:

phantom> On Wed, Feb 04, 2004 at 11:27:03PM +0100, Alex Dupre wrote:
phantom> > Ok, the question then becomes: is it possible to replace the -preserve 
phantom> > tidy-stable flag with the -numeric tidy-devel flag? Otherwise can you 
phantom> > send me a pratical example where -preserve is needed? We (Thierry Thomas 
phantom> > and me) will try ourself.
phantom> 
phantom> Well.  Try below html code with -preserve and without.  You'll see a
phantom> difference.  Actually most annoying things was a 'entity expansion', but
phantom> there were also some problems with non-ASCII symbols processing under
phantom> some conditions (but unfortunatelly i don't remember details).
phantom> 
phantom> <html>
phantom>   <body>
phantom>     NBSP - &nbsp;
phantom>     COPY - &copy;
phantom>   </body>
phantom> </html>

 The problem is that the result of the expansion should depend
 on the html doc's charset/encoding.  For example, in euc-jp, &copy;
 should be {0x8f, 0xa2, 0xed}, but tidy always think it as 0xa9.
 And many browsers interpret &#169; as a raw character in the html
 doc's charset (euc-jp, in this case).  &#160;, &#169;, &#183, and
 other >159 characters in euc-jp are different from iso-8859-*.

 While according to the XML specification it is unambiguous (&#xxx;
 is always interpreted as a Unicode character), I think it is better
 that entity is preserved as it is at the present moment.  Tidy does
 not know the relationship between euc-jp and Unicode, so a lot of
 Japanese docs will be broken without -preserve. 

-- 
| Hiroki SATO

----Security_Multipart(Fri_Feb__6_01_49_40_2004_155)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQBAInQkTyzT2CeTzy0RAsMjAJ0QPmr4dVhCifRvH/K7p5nhzbduMgCglj57
tAWjiW04IIXrbV1+f+q108Y=
=DDXG
-----END PGP SIGNATURE-----

----Security_Multipart(Fri_Feb__6_01_49_40_2004_155)----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040206.014940.23072599.hrs>