Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jan 2012 13:05:51 -0700 (MST)
From:      Warren Block <wblock@wonkity.com>
To:        Hiroki Sato <hrs@FreeBSD.org>
Cc:        freebsd-doc@FreeBSD.org
Subject:   Re: Tidy and HTML tab spacing
Message-ID:  <alpine.BSF.2.00.1201201231090.61386@wonkity.com>
In-Reply-To: <20120119.155736.1127622096127250170.hrs@allbsd.org>
References:  <alpine.BSF.2.00.1201181520140.40712@wonkity.com> <20120119.084434.926306642968660094.hrs@allbsd.org> <alpine.BSF.2.00.1201181748230.42380@wonkity.com> <20120119.155736.1127622096127250170.hrs@allbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---902635197-1133617426-1327089951=:61386
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII

On Thu, 19 Jan 2012, Hiroki Sato wrote:

> It is difficult to solve this issue completely because the result
> text can be obtained only by a complete HTML processor such as www
> browsers.  I don't have a good idea, but I think it is not a bad idea
> to use a tab character (or replacing it to &#09;) in the result text
> by modifying Tidy and leave the processing to www browsers.

The suggestion of &#09; is interesting.  The problem is that tidy is 
changing tabs to spaces while still reading the file, when it really 
should be treating the tab as a special entity while processings tags.

So preprocessing still might be a way to preserve tabs.  Just convert 
them all to &#09; before tidy has a chance to change them.

   perl -i -pe 's/\t/&#09;/g' book.html

sed works also, but beware of the binary tab necessary for sed's 
paleolithic regexes:

   sed -i -e 's/	/\&#09;/g' book.html

Initial testing shows this does seem to work.  The output has genuine
tabs which display fine in Firefox but ought to be tested in other 
browsers.  Note that there are some invisible tabs in the HTML output 
that come from the SGML source.  For example, the copyright notice in 
the Porter's Handbook:

       <holder role="mailto:doc@FreeBSD.org">The FreeBSD Documentation
 	Project</holder>

That leading tab on the second line is in the HTML file.  Still, 
replacing those invisible tabs with &#09; instead of spaces should 
render the same.


Finally, before picking up on the idea of tab-as-an-entity, I worked up 
a patch to www/tidy-devel which uses the magic value of --tab-size 255 
to mean "don't replace tabs".  Attached, but I think the &#09; is 
better.
---902635197-1133617426-1327089951=:61386
Content-Type: TEXT/PLAIN; charset=US-ASCII; name=patch-src-streamio.c
Content-Transfer-Encoding: BASE64
Content-ID: <alpine.BSF.2.00.1201201305510.61386@wonkity.com>
Content-Description: 
Content-Disposition: attachment; filename=patch-src-streamio.c

LS0tIHNyYy9zdHJlYW1pby5jLm9yaWcJMjAwOC0wMy0yMiAxNTowMDoxOC4w
MDAwMDAwMDAgLTA2MDANCisrKyBzcmMvc3RyZWFtaW8uYwkyMDEyLTAxLTIw
IDEyOjI1OjU4LjAwMDAwMDAwMCAtMDcwMA0KQEAgLTM1MSwxMSArMzUxLDE4
IEBADQogICAgICAgICAgICAgYWRkZWQgPSB5ZXM7DQogICAgICAgICAgICAg
VFlfKEFkZENoYXJUb09yaWdpbmFsVGV4dCkoaW4sICh0Y2hhciljKTsNCiAj
ZW5kaWYNCi0gICAgICAgICAgICBpbi0+dGFicyA9IHRhYnNpemUgPiAwID8N
Ci0gICAgICAgICAgICAgICAgdGFic2l6ZSAtICgoaW4tPmN1cmNvbCAtIDEp
ICUgdGFic2l6ZSkgLSAxDQotICAgICAgICAgICAgICAgIDogMDsNCi0gICAg
ICAgICAgICBpbi0+Y3VyY29sKys7DQotICAgICAgICAgICAgYyA9ICcgJzsN
CisgICAgICAgICAgICBpZiAodGFic2l6ZSA9PSAyNTUpIHsNCisgICAgICAg
ICAgICAgICAgaW4tPmN1cmNvbCsrOw0KKyAgICAgICAgICAgICAgICBjID0g
J1x0JzsNCisgICAgICAgICAgICB9DQorICAgICAgICAgICAgZWxzZQ0KKyAg
ICAgICAgICAgIHsNCisgICAgICAgICAgICAgICAgaW4tPnRhYnMgPSB0YWJz
aXplID4gMCA/DQorICAgICAgICAgICAgICAgICAgICB0YWJzaXplIC0gKChp
bi0+Y3VyY29sIC0gMSkgJSB0YWJzaXplKSAtIDENCisgICAgICAgICAgICAg
ICAgICAgIDogMDsNCisgICAgICAgICAgICAgICAgaW4tPmN1cmNvbCsrOw0K
KyAgICAgICAgICAgICAgICBjID0gJyAnOw0KKyAgICAgICAgICAgIH0NCiAg
ICAgICAgICAgICBicmVhazsNCiAgICAgICAgIH0NCiANCg==

---902635197-1133617426-1327089951=:61386--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1201201231090.61386>