Date: Fri, 06 Feb 2004 01:49:40 +0900 (JST) From: Hiroki Sato <hrs@FreeBSD.org> To: ale@FreeBSD.org Cc: phantom@FreeBSD.org.ua Subject: Re: tidy flag Message-ID: <20040206.014940.23072599.hrs@eos.ocn.ne.jp> In-Reply-To: <20040205063847.GA13136@phantom.cris.net> References: <20040204.171343.23008681.hrs@eos.ocn.ne.jp> <402171B7.7020205@FreeBSD.org> <20040205063847.GA13136@phantom.cris.net>
next in thread | previous in thread | raw e-mail | index | archive | help
----Security_Multipart(Fri_Feb__6_01_49_40_2004_155)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Alexey Zelkin <phantom@FreeBSD.org.ua> wrote in <20040205063847.GA13136@phantom.cris.net>: phantom> On Wed, Feb 04, 2004 at 11:27:03PM +0100, Alex Dupre wrote: phantom> > Ok, the question then becomes: is it possible to replace the -preserve phantom> > tidy-stable flag with the -numeric tidy-devel flag? Otherwise can you phantom> > send me a pratical example where -preserve is needed? We (Thierry Thomas phantom> > and me) will try ourself. phantom> phantom> Well. Try below html code with -preserve and without. You'll see a phantom> difference. Actually most annoying things was a 'entity expansion', but phantom> there were also some problems with non-ASCII symbols processing under phantom> some conditions (but unfortunatelly i don't remember details). phantom> phantom> <html> phantom> <body> phantom> NBSP - phantom> COPY - © phantom> </body> phantom> </html> The problem is that the result of the expansion should depend on the html doc's charset/encoding. For example, in euc-jp, © should be {0x8f, 0xa2, 0xed}, but tidy always think it as 0xa9. And many browsers interpret © as a raw character in the html doc's charset (euc-jp, in this case).  , ©, ·, and other >159 characters in euc-jp are different from iso-8859-*. While according to the XML specification it is unambiguous (&#xxx; is always interpreted as a Unicode character), I think it is better that entity is preserved as it is at the present moment. Tidy does not know the relationship between euc-jp and Unicode, so a lot of Japanese docs will be broken without -preserve. -- | Hiroki SATO ----Security_Multipart(Fri_Feb__6_01_49_40_2004_155)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQBAInQkTyzT2CeTzy0RAsMjAJ0QPmr4dVhCifRvH/K7p5nhzbduMgCglj57 tAWjiW04IIXrbV1+f+q108Y= =DDXG -----END PGP SIGNATURE----- ----Security_Multipart(Fri_Feb__6_01_49_40_2004_155)----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040206.014940.23072599.hrs>