Date: Fri, 20 Jan 2012 13:05:51 -0700 (MST) From: Warren Block <wblock@wonkity.com> To: Hiroki Sato <hrs@FreeBSD.org> Cc: freebsd-doc@FreeBSD.org Subject: Re: Tidy and HTML tab spacing Message-ID: <alpine.BSF.2.00.1201201231090.61386@wonkity.com> In-Reply-To: <20120119.155736.1127622096127250170.hrs@allbsd.org> References: <alpine.BSF.2.00.1201181520140.40712@wonkity.com> <20120119.084434.926306642968660094.hrs@allbsd.org> <alpine.BSF.2.00.1201181748230.42380@wonkity.com> <20120119.155736.1127622096127250170.hrs@allbsd.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
On Thu, 19 Jan 2012, Hiroki Sato wrote:
> It is difficult to solve this issue completely because the result
> text can be obtained only by a complete HTML processor such as www
> browsers. I don't have a good idea, but I think it is not a bad idea
> to use a tab character (or replacing it to 	) in the result text
> by modifying Tidy and leave the processing to www browsers.
The suggestion of 	 is interesting. The problem is that tidy is
changing tabs to spaces while still reading the file, when it really
should be treating the tab as a special entity while processings tags.
So preprocessing still might be a way to preserve tabs. Just convert
them all to 	 before tidy has a chance to change them.
perl -i -pe 's/\t/	/g' book.html
sed works also, but beware of the binary tab necessary for sed's
paleolithic regexes:
sed -i -e 's/ /\	/g' book.html
Initial testing shows this does seem to work. The output has genuine
tabs which display fine in Firefox but ought to be tested in other
browsers. Note that there are some invisible tabs in the HTML output
that come from the SGML source. For example, the copyright notice in
the Porter's Handbook:
<holder role="mailto:doc@FreeBSD.org">The FreeBSD Documentation
Project</holder>
That leading tab on the second line is in the HTML file. Still,
replacing those invisible tabs with 	 instead of spaces should
render the same.
Finally, before picking up on the idea of tab-as-an-entity, I worked up
a patch to www/tidy-devel which uses the magic value of --tab-size 255
to mean "don't replace tabs". Attached, but I think the 	 is
better.
[-- Attachment #2 --]
--- src/streamio.c.orig 2008-03-22 15:00:18.000000000 -0600
+++ src/streamio.c 2012-01-20 12:25:58.000000000 -0700
@@ -351,11 +351,18 @@
added = yes;
TY_(AddCharToOriginalText)(in, (tchar)c);
#endif
- in->tabs = tabsize > 0 ?
- tabsize - ((in->curcol - 1) % tabsize) - 1
- : 0;
- in->curcol++;
- c = ' ';
+ if (tabsize == 255) {
+ in->curcol++;
+ c = '\t';
+ }
+ else
+ {
+ in->tabs = tabsize > 0 ?
+ tabsize - ((in->curcol - 1) % tabsize) - 1
+ : 0;
+ in->curcol++;
+ c = ' ';
+ }
break;
}
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1201201231090.61386>
