Date: Wed, 18 Jan 2012 15:49:48 -0700 (MST) From: Warren Block <wblock@wonkity.com> To: freebsd-doc@freebsd.org Subject: Re: Tidy and HTML tab spacing Message-ID: <alpine.BSF.2.00.1201181520140.40712@wonkity.com> In-Reply-To: <alpine.BSF.2.00.1201181255210.39534@wonkity.com> References: <alpine.BSF.2.00.1201181255210.39534@wonkity.com>
next in thread | previous in thread | raw e-mail | index | archive | help
HTML versions of FreeBSD documents are fed through tidy (www/tidy or www/tidy-devel) for cleanup. There's a bug in tidy[1] that can cause tab stops to be wrong: http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623 Note how DISTNAME and EXTRACT_SUFX do not line up. They are correct in the source book.sgml. So what to do? 1. It might be possible to fix tidy. This would be the neatest. (See [1]). 2. An option could be added to tidy to ignore tabs. The HTML standard "strongly discourages" tabs in PRE elements[2], but does not disallow them. Using actual tabs has an added benefit to the user in that they could cut-and-paste or just drag-select Makefile examples to see embedded tabs. 3. Tidy could be replaced with some other tool. However, the others I've found have additional dependencies on either PHP or Java, so I did not test them for correct handling of tabs[3],[4]. Either one adds some overhead not just for doc build machines but anyone who wants to work on FreeBSD documentation. 4. Add newlines to the HTML in the build process before it gets to tidy: s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/ 5. Don't tidy HTML files at all (suggested as an option by Benedict Reuschling). The unprocessed HTML is ugly, but few people are going to look at it directly. Files that haven't been through tidy are a little larger, about 4% in the case of the Porter's Handbook. Footnotes: [1] In www/tidy-devel, line 355 of streamio.c does not realize that characters at the beginning of the line may be inside a tag and should not count as visible. The pre-tidy HTML output of the example above is ---- <PRE CLASS="PROGRAMLISTING" >DISTNAME= foo EXTRACT_SUFX= .tgz</PRE > ---- The '>' before DISTNAME is being wrongly counted toward the tab stop. See http://www.wonkity.com/~wblock/tidy/ for a slightly more detailed example. Tidy is mature software, and there's been a bug report for this problem in the bug database since 2008: https://sourceforge.net/tracker/?func=detail&aid=1885471&group_id=27659&atid=390963 So bug fixes in this area from the tidy project are unlikely. [2] http://www.w3.org/TR/html401/struct/text.html#edef-PRE [3] http://htmlpurifier.org/ [4] http://htmlcleaner.sourceforge.net/index.php
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1201181520140.40712>