From owner-freebsd-current@FreeBSD.ORG Thu Sep 22 22:39:58 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7C2FC16A41F for ; Thu, 22 Sep 2005 22:39:58 +0000 (GMT) (envelope-from cswiger@mac.com) Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.71]) by mx1.FreeBSD.org (Postfix) with ESMTP id 34DEA43D45 for ; Thu, 22 Sep 2005 22:39:58 +0000 (GMT) (envelope-from cswiger@mac.com) Received: from mac.com (smtpin08-en2 [10.13.10.153]) by smtpout.mac.com (Xserve/8.12.11/smtpout14/MantshX 4.0) with ESMTP id j8MMdpK6023398; Thu, 22 Sep 2005 15:39:51 -0700 (PDT) Received: from [10.1.1.209] (nfw2.codefab.com [199.103.21.225] (may be forged)) (authenticated bits=0) by mac.com (Xserve/smtpin08/MantshX 4.0) with ESMTP id j8MMdncr010475; Thu, 22 Sep 2005 15:39:50 -0700 (PDT) In-Reply-To: <200509221652.54123.mi+mx@aldan.algebra.com> References: <200509220446.j8M4kBPA019823@blue.virtual-estates.net> <20050922182104.GC990@galgenberg.net> <200509221652.54123.mi+mx@aldan.algebra.com> Mime-Version: 1.0 (Apple Message framework v734) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <184C5FE7-B956-43E8-AC60-68EA6D5337BB@mac.com> Content-Transfer-Encoding: 7bit From: Charles Swiger Date: Thu, 22 Sep 2005 18:39:50 -0400 To: Mikhail Teterin X-Mailer: Apple Mail (2.734) Cc: current@freebsd.org Subject: Re: using bzip2 to compress man-pages X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Sep 2005 22:39:58 -0000 On Sep 22, 2005, at 4:52 PM, Mikhail Teterin wrote: > Charles Swiger wrote: >> My guess is that roughly 95% of the manpages aren't going to save a >> disk sector by switching. > > One does not need to save the entire sector-size. Only the (size % > sector_size), which currently pushes the file into an additional > sector. Agreed, this is exactly right. > The following command line assumes, the sector size of 512 bytes > and the bzip2 > vs. gzip saving of only 10%. Unfortunately, bzip2 sometimes compresses less well than gzip, especially for very small files. Consider the output from the command I posted; the first number is the byte-size using gzip followed by the filename, the second line is the byte-size using "bzip2 --best": 1231 /usr/share/man/man1/addftinfo.1.gz 1272 1963 /usr/share/man/man1/apply.1.gz 2010 667 /usr/share/man/man1/apropos.1.gz 709 [ ... ] Notice for files smaller than about 3K, gzip is almost always *smaller* than bzip2. From about 3K to about 6K, the two seem to be about even, and bzip2 starts becoming a significant win for files larger than about 10K. > Notice, it takes care to look once at every > manual page even if it is has more than one alias (eliminating > pages with the > same inode). Try this on your system: > > % find /usr/share/man/ -name \*.gz -ls | sort -k 1 | awk '$1 == > inode { next } > { inode=$1; total++; if ($7 % 512 < $7*0.10) savings++ } END {print > savings " > out of " total}' > 1200 out of 2694 > > 1200 files out 2694... That's a little more than 5%... Yes, well, you aren't computing a real result. Assuming that bzip2 always produces smaller files than gzip for the average manpage (median size of ~3K compressed) is not valid. I wrote a quick bit of python to compute and tally up the actual block sizes, giving the following results: same # blocks: 2288, gzip > bzip: 217, bzip > gzip 182 http://www.pkix.net/~chuck/manpage_test/ -- -Chuck