Date: Mon, 23 Jan 2012 12:39:25 -0700 (MST) From: Warren Block <wblock@wonkity.com> To: Gabor Kovesdan <gabor@FreeBSD.org> Cc: freebsd-doc@FreeBSD.org Subject: Re: Tidy and HTML tab spacing Message-ID: <alpine.BSF.2.00.1201231145380.90760@wonkity.com> In-Reply-To: <4F1D93E0.2050709@FreeBSD.org> References: <alpine.BSF.2.00.1201181255210.39534@wonkity.com> <alpine.BSF.2.00.1201181520140.40712@wonkity.com> <4F1B4767.5070105@FreeBSD.org> <alpine.BSF.2.00.1201211648030.72083@wonkity.com> <4F1D93E0.2050709@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---902635197-2098338272-1327347565=:90760 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Mon, 23 Jan 2012, Gabor Kovesdan wrote: > On 2012.01.22. 1:30, Warren Block wrote: >> On Sun, 22 Jan 2012, Gabor Kovesdan wrote: >> >>> On 2012.01.18. 23:49, Warren Block wrote: >>>> 5. Don't tidy HTML files at all (suggested as an option by Benedict >>>> Reuschling). The unprocessed HTML is ugly, but few people are going >>>> to look at it directly. Files that haven't been through tidy are a >>>> little larger, about 4% in the case of the Porter's Handbook. >>> I also think tidy should be removed. As hrs wrote, new standards should be >>> evaluated and probably they are much better. (I think they are.) If there >>> are some nits, then we should process it with a custom script or >>> something, instead of this crapware. >> >> Tidy does a lot; it would be a lot of work to recreate. > Tidy is also the reason that our webpages are not valid HTML. A new version of Tidy is supposed to be out soonish. Whether it will solve the problems, I don't know. What about lxml? Available in ports (devel/py-lxml), reputed to be good at parsing problem HTML and creating good XHTML. A quick test showed that it seems to do okay with <pre> elements. A quick script to generate a test is attached. The W3C validator says this version of the Porter's Handbook has eight errors, versus the six errors and five warnings of the Tidy version. (The ugly special-case in line 12 drops the lxml version to five errors.) ---902635197-2098338272-1327347565=:90760 Content-Type: TEXT/PLAIN; charset=US-ASCII; name=tester.py Content-Transfer-Encoding: BASE64 Content-ID: <alpine.BSF.2.00.1201231239250.90760@wonkity.com> Content-Description: Content-Disposition: attachment; filename=tester.py IyEvdXNyL2Jpbi9lbnYgcHl0aG9uDQoNCmZyb20gbHhtbCBpbXBvcnQgZXRy ZWUNCmltcG9ydCByZQ0KDQppbmh0bWwgPSBvcGVuKCdib29rLmh0bWwnLCAn cicpLnJlYWQoKQ0KDQp0cmVlID0gZXRyZWUuSFRNTChpbmh0bWwucmVwbGFj ZSgnXHInLCAnJykpDQpvdXR4aHRtbCA9ICdcbicuam9pbihbIGV0cmVlLnRv c3RyaW5nKHN0cmVlLCBwcmV0dHlfcHJpbnQ9VHJ1ZSwgbWV0aG9kPSJ4bWwi KQ0KCQlmb3Igc3RyZWUgaW4gdHJlZSBdKQ0KDQpvdXR4aHRtbCA9IG91dHho dG1sLnJlcGxhY2UoJ2NvbXBhY3Q9IkNPTVBBQ1QiJywgJ2NvbXBhY3Q9ImNv bXBhY3QiJykNCg0KZiA9IG9wZW4oJ2x4bWwuaHRtbCcsICd3JykNCmYud3Jp dGUoJzwhRE9DVFlQRSBodG1sIFBVQkxJQyAiLS8vVzNDLy9EVEQgWEhUTUwg MS4wIFRyYW5zaXRpb25hbC8vRU4iICJodHRwOi8vd3d3LnczLm9yZy9UUi94 aHRtbDEvRFREL3hodG1sMS10cmFuc2l0aW9uYWwuZHRkIj5cbicpDQpmLndy aXRlKCc8aHRtbCB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94aHRt bCI+XG4nKQ0KZi53cml0ZShvdXR4aHRtbCkNCmYud3JpdGUoJzwvaHRtbD5c bicpDQpmLmNsb3NlKCkNCg== ---902635197-2098338272-1327347565=:90760--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1201231145380.90760>