Date: Thu, 19 Jan 2012 08:44:34 +0900 (JST) From: Hiroki Sato <hrs@FreeBSD.org> To: wblock@wonkity.com Cc: freebsd-doc@FreeBSD.org Subject: Re: Tidy and HTML tab spacing Message-ID: <20120119.084434.926306642968660094.hrs@allbsd.org> In-Reply-To: <alpine.BSF.2.00.1201181520140.40712@wonkity.com> References: <alpine.BSF.2.00.1201181255210.39534@wonkity.com> <alpine.BSF.2.00.1201181520140.40712@wonkity.com>
next in thread | previous in thread | raw e-mail | index | archive | help
----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Warren Block <wblock@wonkity.com> wrote in <alpine.BSF.2.00.1201181520140.40712@wonkity.com>: wb> HTML versions of FreeBSD documents are fed through tidy (www/tidy or wb> www/tidy-devel) for cleanup. There's a bug in tidy[1] that can cause wb> tab stops to be wrong: wb> http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623 wb> wb> Note how DISTNAME and EXTRACT_SUFX do not line up. They are correct wb> in the source book.sgml. wb> wb> So what to do? I lean to fixing Tidy if possible. The reason why we are using Tidy is to fix mark-ups in rendered results from various tools like Jade, not (only) for human-readability. The results of Tidy are still not perfect from viewpoint of standard conformance, but it is better than nothing even if most of modern www browsers can handle the rendered HTMLs directly. It is known that there are some problems with entity dereference and white-space handling as you also pointed out. wb> 3. Tidy could be replaced with some other tool. However, the others Although I tried xmlindent, xmlformat, and xmllint as a replacement in the past, they were indended for well-formed XML docs and not enough for fixing malformed (sometimes broken) mark-ups. wb> 4. Add newlines to the HTML in the build process before it gets to wb> tidy: wb> s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/ I think this will break the results because a newline just after ">" is recognized as CDATA. wb> 5. Don't tidy HTML files at all (suggested as an option by Benedict wb> Reuschling). The unprocessed HTML is ugly, but few people are going wb> to look at it directly. Files that haven't been through tidy are a wb> little larger, about 4% in the case of the Porter's Handbook. To eliminate Tidy we have to improve standard conformance of the rendered results. I do not know the recent situation precisely because I investigated it seven years ago, but I think it still has some glitches. -- Hiroki ----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEABECAAYFAk8XWWIACgkQTyzT2CeTzy3HyQCeMVvG+f2eYwy4eQeSlgSWZOZv /AoAn3xKxtWP13Zwx1wD36PL32/SJozj =Tjgi -----END PGP SIGNATURE----- ----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120119.084434.926306642968660094.hrs>