From owner-freebsd-doc@FreeBSD.ORG Mon Jan 23 19:39:27 2012 Return-Path: Delivered-To: freebsd-doc@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13B64106566B; Mon, 23 Jan 2012 19:39:27 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id C17308FC08; Mon, 23 Jan 2012 19:39:26 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.5/8.14.5) with ESMTP id q0NJdP6o091158; Mon, 23 Jan 2012 12:39:25 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id q0NJdPxa091155; Mon, 23 Jan 2012 12:39:25 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Mon, 23 Jan 2012 12:39:25 -0700 (MST) From: Warren Block To: Gabor Kovesdan In-Reply-To: <4F1D93E0.2050709@FreeBSD.org> Message-ID: References: <4F1B4767.5070105@FreeBSD.org> <4F1D93E0.2050709@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-902635197-2098338272-1327347565=:90760" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Mon, 23 Jan 2012 12:39:26 -0700 (MST) Cc: freebsd-doc@FreeBSD.org Subject: Re: Tidy and HTML tab spacing X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jan 2012 19:39:27 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---902635197-2098338272-1327347565=:90760 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Mon, 23 Jan 2012, Gabor Kovesdan wrote: > On 2012.01.22. 1:30, Warren Block wrote: >> On Sun, 22 Jan 2012, Gabor Kovesdan wrote: >> >>> On 2012.01.18. 23:49, Warren Block wrote: >>>> 5. Don't tidy HTML files at all (suggested as an option by Benedict >>>> Reuschling). The unprocessed HTML is ugly, but few people are going >>>> to look at it directly. Files that haven't been through tidy are a >>>> little larger, about 4% in the case of the Porter's Handbook. >>> I also think tidy should be removed. As hrs wrote, new standards should be >>> evaluated and probably they are much better. (I think they are.) If there >>> are some nits, then we should process it with a custom script or >>> something, instead of this crapware. >> >> Tidy does a lot; it would be a lot of work to recreate. > Tidy is also the reason that our webpages are not valid HTML. A new version of Tidy is supposed to be out soonish. Whether it will solve the problems, I don't know. What about lxml? Available in ports (devel/py-lxml), reputed to be good at parsing problem HTML and creating good XHTML. A quick test showed that it seems to do okay with
 elements.

A quick script to generate a test is attached.  The W3C validator says 
this version of the Porter's Handbook has eight errors, versus the six 
errors and five warnings of the Tidy version.  (The ugly special-case in 
line 12 drops the lxml version to five errors.)
---902635197-2098338272-1327347565=:90760
Content-Type: TEXT/PLAIN; charset=US-ASCII; name=tester.py
Content-Transfer-Encoding: BASE64
Content-ID: 
Content-Description: 
Content-Disposition: attachment; filename=tester.py

IyEvdXNyL2Jpbi9lbnYgcHl0aG9uDQoNCmZyb20gbHhtbCBpbXBvcnQgZXRy
ZWUNCmltcG9ydCByZQ0KDQppbmh0bWwgPSBvcGVuKCdib29rLmh0bWwnLCAn
cicpLnJlYWQoKQ0KDQp0cmVlID0gZXRyZWUuSFRNTChpbmh0bWwucmVwbGFj
ZSgnXHInLCAnJykpDQpvdXR4aHRtbCA9ICdcbicuam9pbihbIGV0cmVlLnRv
c3RyaW5nKHN0cmVlLCBwcmV0dHlfcHJpbnQ9VHJ1ZSwgbWV0aG9kPSJ4bWwi
KQ0KCQlmb3Igc3RyZWUgaW4gdHJlZSBdKQ0KDQpvdXR4aHRtbCA9IG91dHho
dG1sLnJlcGxhY2UoJ2NvbXBhY3Q9IkNPTVBBQ1QiJywgJ2NvbXBhY3Q9ImNv
bXBhY3QiJykNCg0KZiA9IG9wZW4oJ2x4bWwuaHRtbCcsICd3JykNCmYud3Jp
dGUoJzwhRE9DVFlQRSBodG1sIFBVQkxJQyAiLS8vVzNDLy9EVEQgWEhUTUwg
MS4wIFRyYW5zaXRpb25hbC8vRU4iICJodHRwOi8vd3d3LnczLm9yZy9UUi94
aHRtbDEvRFREL3hodG1sMS10cmFuc2l0aW9uYWwuZHRkIj5cbicpDQpmLndy
aXRlKCc8aHRtbCB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94aHRt
bCI+XG4nKQ0KZi53cml0ZShvdXR4aHRtbCkNCmYud3JpdGUoJzwvaHRtbD5c
bicpDQpmLmNsb3NlKCkNCg==

---902635197-2098338272-1327347565=:90760--