From owner-freebsd-doc@FreeBSD.ORG Fri Jan 20 20:05:52 2012 Return-Path: Delivered-To: freebsd-doc@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F742106566B; Fri, 20 Jan 2012 20:05:52 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 05C208FC0C; Fri, 20 Jan 2012 20:05:51 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.5/8.14.5) with ESMTP id q0KK5pq0061763; Fri, 20 Jan 2012 13:05:51 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id q0KK5pYI061760; Fri, 20 Jan 2012 13:05:51 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Fri, 20 Jan 2012 13:05:51 -0700 (MST) From: Warren Block To: Hiroki Sato In-Reply-To: <20120119.155736.1127622096127250170.hrs@allbsd.org> Message-ID: References: <20120119.084434.926306642968660094.hrs@allbsd.org> <20120119.155736.1127622096127250170.hrs@allbsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-902635197-1133617426-1327089951=:61386" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Fri, 20 Jan 2012 13:05:51 -0700 (MST) Cc: freebsd-doc@FreeBSD.org Subject: Re: Tidy and HTML tab spacing X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jan 2012 20:05:52 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---902635197-1133617426-1327089951=:61386 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII On Thu, 19 Jan 2012, Hiroki Sato wrote: > It is difficult to solve this issue completely because the result > text can be obtained only by a complete HTML processor such as www > browsers. I don't have a good idea, but I think it is not a bad idea > to use a tab character (or replacing it to ) in the result text > by modifying Tidy and leave the processing to www browsers. The suggestion of is interesting. The problem is that tidy is changing tabs to spaces while still reading the file, when it really should be treating the tab as a special entity while processings tags. So preprocessing still might be a way to preserve tabs. Just convert them all to before tidy has a chance to change them. perl -i -pe 's/\t/ /g' book.html sed works also, but beware of the binary tab necessary for sed's paleolithic regexes: sed -i -e 's/ /\ /g' book.html Initial testing shows this does seem to work. The output has genuine tabs which display fine in Firefox but ought to be tested in other browsers. Note that there are some invisible tabs in the HTML output that come from the SGML source. For example, the copyright notice in the Porter's Handbook: The FreeBSD Documentation Project That leading tab on the second line is in the HTML file. Still, replacing those invisible tabs with instead of spaces should render the same. Finally, before picking up on the idea of tab-as-an-entity, I worked up a patch to www/tidy-devel which uses the magic value of --tab-size 255 to mean "don't replace tabs". Attached, but I think the is better. ---902635197-1133617426-1327089951=:61386 Content-Type: TEXT/PLAIN; charset=US-ASCII; name=patch-src-streamio.c Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename=patch-src-streamio.c LS0tIHNyYy9zdHJlYW1pby5jLm9yaWcJMjAwOC0wMy0yMiAxNTowMDoxOC4w MDAwMDAwMDAgLTA2MDANCisrKyBzcmMvc3RyZWFtaW8uYwkyMDEyLTAxLTIw IDEyOjI1OjU4LjAwMDAwMDAwMCAtMDcwMA0KQEAgLTM1MSwxMSArMzUxLDE4 IEBADQogICAgICAgICAgICAgYWRkZWQgPSB5ZXM7DQogICAgICAgICAgICAg VFlfKEFkZENoYXJUb09yaWdpbmFsVGV4dCkoaW4sICh0Y2hhciljKTsNCiAj ZW5kaWYNCi0gICAgICAgICAgICBpbi0+dGFicyA9IHRhYnNpemUgPiAwID8N Ci0gICAgICAgICAgICAgICAgdGFic2l6ZSAtICgoaW4tPmN1cmNvbCAtIDEp ICUgdGFic2l6ZSkgLSAxDQotICAgICAgICAgICAgICAgIDogMDsNCi0gICAg ICAgICAgICBpbi0+Y3VyY29sKys7DQotICAgICAgICAgICAgYyA9ICcgJzsN CisgICAgICAgICAgICBpZiAodGFic2l6ZSA9PSAyNTUpIHsNCisgICAgICAg ICAgICAgICAgaW4tPmN1cmNvbCsrOw0KKyAgICAgICAgICAgICAgICBjID0g J1x0JzsNCisgICAgICAgICAgICB9DQorICAgICAgICAgICAgZWxzZQ0KKyAg ICAgICAgICAgIHsNCisgICAgICAgICAgICAgICAgaW4tPnRhYnMgPSB0YWJz aXplID4gMCA/DQorICAgICAgICAgICAgICAgICAgICB0YWJzaXplIC0gKChp bi0+Y3VyY29sIC0gMSkgJSB0YWJzaXplKSAtIDENCisgICAgICAgICAgICAg ICAgICAgIDogMDsNCisgICAgICAgICAgICAgICAgaW4tPmN1cmNvbCsrOw0K KyAgICAgICAgICAgICAgICBjID0gJyAnOw0KKyAgICAgICAgICAgIH0NCiAg ICAgICAgICAgICBicmVhazsNCiAgICAgICAgIH0NCiANCg== ---902635197-1133617426-1327089951=:61386--