Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Sep 2005 18:39:50 -0400
From:      Charles Swiger <cswiger@mac.com>
To:        Mikhail Teterin <mi+mx@aldan.algebra.com>
Cc:        current@freebsd.org
Subject:   Re: using bzip2 to compress man-pages
Message-ID:  <184C5FE7-B956-43E8-AC60-68EA6D5337BB@mac.com>
In-Reply-To: <200509221652.54123.mi%2Bmx@aldan.algebra.com>
References:  <200509220446.j8M4kBPA019823@blue.virtual-estates.net> <20050922182104.GC990@galgenberg.net> <CF4FBAB7-791D-41E0-B59B-9D78C6E4381F@mac.com> <200509221652.54123.mi%2Bmx@aldan.algebra.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sep 22, 2005, at 4:52 PM, Mikhail Teterin wrote:
> Charles Swiger wrote:
>> My guess is that roughly 95% of the manpages aren't going to save a
>> disk sector by switching.
>
> One does not need to save the entire sector-size. Only the (size %
> sector_size), which currently pushes the file into an additional  
> sector.

Agreed, this is exactly right.

> The following command line assumes, the sector size of 512 bytes  
> and the bzip2
> vs. gzip saving of only 10%.

Unfortunately, bzip2 sometimes compresses less well than gzip,  
especially for very small files.  Consider the output from the  
command I posted; the first number is the byte-size using gzip  
followed by the filename, the second line is the byte-size using  
"bzip2 --best":

     1231 /usr/share/man/man1/addftinfo.1.gz
     1272
     1963 /usr/share/man/man1/apply.1.gz
     2010
      667 /usr/share/man/man1/apropos.1.gz
      709
[ ... ]

Notice for files smaller than about 3K, gzip is almost always  
*smaller* than bzip2.

 From about 3K to about 6K, the two seem to be about even, and bzip2  
starts becoming a significant win for files larger than about 10K.

> Notice, it takes care to look once at every
> manual page even if it is has more than one alias (eliminating  
> pages with the
> same inode). Try this on your system:
>
> % find /usr/share/man/ -name \*.gz -ls | sort -k 1 | awk '$1 ==  
> inode { next }
> { inode=$1; total++; if ($7 % 512 < $7*0.10) savings++ } END {print  
> savings "
> out of " total}'
> 1200 out of 2694
>
> 1200 files out 2694... That's a little more than 5%...

Yes, well, you aren't computing a real result.

Assuming that bzip2 always produces smaller files than gzip for the  
average manpage (median size of ~3K compressed) is not valid.  I  
wrote a quick bit of python to compute and tally up the actual block  
sizes, giving the following results:

same # blocks: 2288, gzip > bzip: 217, bzip > gzip 182

http://www.pkix.net/~chuck/manpage_test/

-- 
-Chuck




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?184C5FE7-B956-43E8-AC60-68EA6D5337BB>