Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Nov 2001 12:34:00 +0100 (CET)
From:      Martin Horcicka <horcicka@FreeBSD.cz>
To:        <freebsd-doc@FreeBSD.org>
Subject:   Why TIDY can never work correctly with ISO-8859-2 and others
Message-ID:  <20011115105650.W57038-100000@dual.ms.mff.cuni.cz>

next in thread | raw e-mail | index | archive | help
Hi,

Tidy simply cannot be used correctly with (e.g.) 8-bit character sets other
than Latin 1 because it does not support them.

Consider HTML document in (e.g.) ISO-8859-2 encoding and some central European
characters and a &copy; entity in it. The default behavior of Tidy
(char-encoding: ascii) is to use character entities instead of all non-ascii
characters - it takes the central european character and encodes it as entity
with the same value but interpreted (as defined by HTML specification) in
ISO-8859-1 (resp. Unicode)!

If you use char-encoding: latin1 - the &copy; entity is converted to a normal
character with the same value - but in ISO-8859-2!

And if you use char-encoding: raw - character entities with values above 255
are not printed as entities - this is really bad in 8-bit encodings.

In my opinion Tidy cannot be used for encodings it does not natively support
(i.e. for Russian and Czech (- still not in main CVS) translations of pages
and docs).

Martin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011115105650.W57038-100000>