From owner-freebsd-current@FreeBSD.ORG Fri Sep 23 01:31:10 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E47A416A41F for ; Fri, 23 Sep 2005 01:31:09 +0000 (GMT) (envelope-from mi+mx@aldan.algebra.com) Received: from blue.virtual-estates.net (aldan.algebra.com [216.254.65.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id F3EA943D45 for ; Fri, 23 Sep 2005 01:31:06 +0000 (GMT) (envelope-from mi+mx@aldan.algebra.com) Received: from corbulon.video-collage.com (static-151-204-231-237.bos.east.verizon.net [151.204.231.237]) by blue.virtual-estates.net (8.13.4/8.13.4) with ESMTP id j8N1V21W005039 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 22 Sep 2005 21:31:05 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) Received: from mteterin.us.murex.com (195-11.customer.cloud9.net [168.100.195.11]) by corbulon.video-collage.com (8.13.4/8.13.1) with ESMTP id j8N1Ur57006423 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 22 Sep 2005 21:30:56 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) Received: from mteterin.us.murex.com (mteterin@localhost [127.0.0.1]) by mteterin.us.murex.com (8.13.3/8.13.3) with ESMTP id j8N1UKMe008490; Thu, 22 Sep 2005 21:30:20 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) Received: from localhost (localhost [[UNIX: localhost]]) by mteterin.us.murex.com (8.13.3/8.13.3/Submit) id j8N1UIRm008489; Thu, 22 Sep 2005 21:30:18 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) X-Authentication-Warning: mteterin.us.murex.com: mteterin set sender to mi+mx@aldan.algebra.com using -f From: Mikhail Teterin Organization: Virtual Estates, Inc. To: Charles Swiger Date: Thu, 22 Sep 2005 21:30:17 -0400 User-Agent: KMail/1.8.2 References: <200509220446.j8M4kBPA019823@blue.virtual-estates.net> <200509221652.54123.mi+mx@aldan.algebra.com> <184C5FE7-B956-43E8-AC60-68EA6D5337BB@mac.com> In-Reply-To: <184C5FE7-B956-43E8-AC60-68EA6D5337BB@mac.com> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_qq1MDwHuJrym2uF" Message-Id: <200509222130.18284.mi+mx@aldan.algebra.com> X-Virus-Scanned: ClamAV devel-20050525/1097/Wed Sep 21 14:56:51 2005 on corbulon.video-collage.com X-Virus-Status: Clean X-Scanned-By: MIMEDefang 2.43 X-Mailman-Approved-At: Fri, 23 Sep 2005 11:33:09 +0000 Cc: current@freebsd.org Subject: Re: using bzip2 to compress man-pages X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Sep 2005 01:31:10 -0000 --Boundary-00=_qq1MDwHuJrym2uF Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline > http://www.pkix.net/~chuck/manpage_test/ Interesting. I did not realize, bzip2 is inferior to gzip on small files. It still wins overall, however -- the wins on large man-pages compensate for losses on the small ones. Your script does not show the total number of sectors in each case (patch attached). Using your http://www.pkix.net/~chuck/manpage_test/manpage.txt and the following command: awk 'NF == 2 { p=$2; gz=$1; gzsectors=int((gz-1)/512)+1; tgzsectors+=gzsectors } NF == 1 { bz=$1; bzsectors=int((bz-1)/512)+1; tbzsectors+=bzsectors; print p ":\t" gz " " gzsectors " " tgzsectors \ " " bz " " bzsectors " " tbzsectors } END { print tgzsectors " of .gz can be turned into " \ tbzsectors " of .bz2"}' /tmp/manpage.txt I get: 14919 of .gz can be turned into 14738 of .bz2 That's 181 512-byte sectors or 92672 bytes. Not very much, but this is just the /usr/share/man. Considering the /usr/share/cat (with larger _formatted_ files), plus the ports' man-pages, I still think bzip2 is beneficial. Assuming 1024-sized sectors, I get 8170 for .gz vs. 8067 for .bz2, or 105472 bytes. Reducing reliance on GNU software remains an extra bonus... Finally, the PR contains independent patches for both man(1) and the man-page compressing infrastructure. After 5-months wait, I'll settle for partial acceptance. Yours, -mi --Boundary-00=_qq1MDwHuJrym2uF Content-Type: text/x-diff; charset="iso-8859-1"; name="mpsizer.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mpsizer.diff" +++ mpsizer.py Thu Sep 22 21:27:48 2005 @@ -5,4 +5,6 @@ gz_bigger_bz = 0 bz_bigger_gz = 0 +tgz = 0 +tbz = 0 try: @@ -13,6 +15,8 @@ bzsize = int(fd.readline().split()[0]) - gzblocks = (gzsize / 512) + int(gzsize % 512 > 0) - bzblocks = (bzsize / 512) + int(bzsize % 512 > 0) + gzblocks = int(gzsize / 512) + int(gzsize % 512 > 0) + bzblocks = int(bzsize / 512) + int(bzsize % 512 > 0) + tgz += gzblocks + tbz += bzblocks print manpage_name, gzblocks, bzblocks @@ -29,2 +33,3 @@ print "same # blocks: %d, gzip > bzip: %d, bzip > gzip %d" % \ (same_size, gz_bigger_bz, bz_bigger_gz) + print "%d gz-sectors vs. %d bz-sectors" % (tgz, tbz) --Boundary-00=_qq1MDwHuJrym2uF--