Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Sep 2005 21:30:17 -0400
From:      Mikhail Teterin <mi+mx@aldan.algebra.com>
To:        Charles Swiger <cswiger@mac.com>
Cc:        current@freebsd.org
Subject:   Re: using bzip2 to compress man-pages
Message-ID:  <200509222130.18284.mi%2Bmx@aldan.algebra.com>
In-Reply-To: <184C5FE7-B956-43E8-AC60-68EA6D5337BB@mac.com>
References:  <200509220446.j8M4kBPA019823@blue.virtual-estates.net> <200509221652.54123.mi%2Bmx@aldan.algebra.com> <184C5FE7-B956-43E8-AC60-68EA6D5337BB@mac.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--Boundary-00=_qq1MDwHuJrym2uF
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

> http://www.pkix.net/~chuck/manpage_test/

Interesting. I did not realize, bzip2 is inferior to gzip on small files. It 
still wins overall, however -- the wins on large man-pages compensate for 
losses on the small ones. Your script does not show the total number of 
sectors in each case (patch attached).

Using your

	http://www.pkix.net/~chuck/manpage_test/manpage.txt

and the following command:

	awk 'NF == 2 {
		p=$2; gz=$1; gzsectors=int((gz-1)/512)+1;
		tgzsectors+=gzsectors
	} NF == 1 {
		bz=$1;	bzsectors=int((bz-1)/512)+1;
		tbzsectors+=bzsectors;
		print p ":\t" gz " " gzsectors " " tgzsectors \
		    " " bz " " bzsectors " " tbzsectors
	} END {
		print tgzsectors " of .gz can be turned into " \
		    tbzsectors " of .bz2"}'
	    /tmp/manpage.txt

I get:

	14919 of .gz can be turned into 14738 of .bz2

That's 181 512-byte sectors or 92672 bytes. Not very much, but this is just 
the /usr/share/man. Considering the /usr/share/cat (with larger _formatted_ 
files), plus the ports' man-pages, I still think bzip2 is beneficial.

Assuming 1024-sized sectors, I get 8170 for .gz vs. 8067 for .bz2, or 105472 
bytes.

Reducing reliance on GNU software remains an extra bonus...

Finally, the PR contains independent patches for both man(1) and the man-page 
compressing infrastructure. After 5-months wait, I'll settle for partial 
acceptance.

Yours,

	-mi

--Boundary-00=_qq1MDwHuJrym2uF
Content-Type: text/x-diff;
  charset="iso-8859-1";
  name="mpsizer.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="mpsizer.diff"

+++ mpsizer.py	Thu Sep 22 21:27:48 2005
@@ -5,4 +5,6 @@
 gz_bigger_bz = 0
 bz_bigger_gz = 0
+tgz = 0
+tbz = 0
 
 try: 
@@ -13,6 +15,8 @@
     bzsize = int(fd.readline().split()[0])
 
-    gzblocks = (gzsize / 512) + int(gzsize % 512 > 0)
-    bzblocks = (bzsize / 512) + int(bzsize % 512 > 0)
+    gzblocks = int(gzsize / 512) + int(gzsize % 512 > 0)
+    bzblocks = int(bzsize / 512) + int(bzsize % 512 > 0)
+    tgz += gzblocks
+    tbz +=  bzblocks
     
     print manpage_name, gzblocks, bzblocks
@@ -29,2 +33,3 @@
   print "same # blocks: %d, gzip > bzip: %d, bzip > gzip %d" % \
         (same_size, gz_bigger_bz, bz_bigger_gz)
+  print "%d gz-sectors vs. %d bz-sectors" % (tgz, tbz)

--Boundary-00=_qq1MDwHuJrym2uF--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200509222130.18284.mi%2Bmx>