Date: Fri, 06 Feb 2004 01:49:40 +0900 (JST) From: Hiroki Sato <hrs@FreeBSD.org> To: ale@FreeBSD.org Cc: phantom@FreeBSD.org.ua Subject: Re: tidy flag Message-ID: <20040206.014940.23072599.hrs@eos.ocn.ne.jp> In-Reply-To: <20040205063847.GA13136@phantom.cris.net> References: <20040204.171343.23008681.hrs@eos.ocn.ne.jp> <402171B7.7020205@FreeBSD.org> <20040205063847.GA13136@phantom.cris.net>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
Alexey Zelkin <phantom@FreeBSD.org.ua> wrote
in <20040205063847.GA13136@phantom.cris.net>:
phantom> On Wed, Feb 04, 2004 at 11:27:03PM +0100, Alex Dupre wrote:
phantom> > Ok, the question then becomes: is it possible to replace the -preserve
phantom> > tidy-stable flag with the -numeric tidy-devel flag? Otherwise can you
phantom> > send me a pratical example where -preserve is needed? We (Thierry Thomas
phantom> > and me) will try ourself.
phantom>
phantom> Well. Try below html code with -preserve and without. You'll see a
phantom> difference. Actually most annoying things was a 'entity expansion', but
phantom> there were also some problems with non-ASCII symbols processing under
phantom> some conditions (but unfortunatelly i don't remember details).
phantom>
phantom> <html>
phantom> <body>
phantom> NBSP -
phantom> COPY - ©
phantom> </body>
phantom> </html>
The problem is that the result of the expansion should depend
on the html doc's charset/encoding. For example, in euc-jp, ©
should be {0x8f, 0xa2, 0xed}, but tidy always think it as 0xa9.
And many browsers interpret © as a raw character in the html
doc's charset (euc-jp, in this case).  , ©, ·, and
other >159 characters in euc-jp are different from iso-8859-*.
While according to the XML specification it is unambiguous (&#xxx;
is always interpreted as a Unicode character), I think it is better
that entity is preserved as it is at the present moment. Tidy does
not know the relationship between euc-jp and Unicode, so a lot of
Japanese docs will be broken without -preserve.
--
| Hiroki SATO
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)
iD8DBQBAInQkTyzT2CeTzy0RAsMjAJ0QPmr4dVhCifRvH/K7p5nhzbduMgCglj57
tAWjiW04IIXrbV1+f+q108Y=
=DDXG
-----END PGP SIGNATURE-----
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040206.014940.23072599.hrs>
