Date: Thu, 15 Nov 2001 12:34:00 +0100 (CET) From: Martin Horcicka <horcicka@FreeBSD.cz> To: <freebsd-doc@FreeBSD.org> Subject: Why TIDY can never work correctly with ISO-8859-2 and others Message-ID: <20011115105650.W57038-100000@dual.ms.mff.cuni.cz>
next in thread | raw e-mail | index | archive | help
Hi, Tidy simply cannot be used correctly with (e.g.) 8-bit character sets other than Latin 1 because it does not support them. Consider HTML document in (e.g.) ISO-8859-2 encoding and some central European characters and a © entity in it. The default behavior of Tidy (char-encoding: ascii) is to use character entities instead of all non-ascii characters - it takes the central european character and encodes it as entity with the same value but interpreted (as defined by HTML specification) in ISO-8859-1 (resp. Unicode)! If you use char-encoding: latin1 - the © entity is converted to a normal character with the same value - but in ISO-8859-2! And if you use char-encoding: raw - character entities with values above 255 are not printed as entities - this is really bad in 8-bit encodings. In my opinion Tidy cannot be used for encodings it does not natively support (i.e. for Russian and Czech (- still not in main CVS) translations of pages and docs). Martin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011115105650.W57038-100000>