From owner-freebsd-questions@FreeBSD.ORG  Wed Jan  3 17:46:50 2007
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id B7F2716A415
	for <freebsd-questions@freebsd.org>;
	Wed,  3 Jan 2007 17:46:50 +0000 (UTC)
	(envelope-from smithi@nimnet.asn.au)
Received: from gaia.nimnet.asn.au (nimbin.lnk.telstra.net [139.130.45.143])
	by mx1.freebsd.org (Postfix) with ESMTP id CD19E13C441
	for <freebsd-questions@freebsd.org>;
	Wed,  3 Jan 2007 17:46:48 +0000 (UTC)
	(envelope-from smithi@nimnet.asn.au)
Received: from localhost (smithi@localhost)
	by gaia.nimnet.asn.au (8.8.8/8.8.8R1.4) with SMTP id EAA14525;
	Thu, 4 Jan 2007 04:46:43 +1100 (EST)
	(envelope-from smithi@nimnet.asn.au)
Date: Thu, 4 Jan 2007 04:46:43 +1100 (EST)
From: Ian Smith <smithi@nimnet.asn.au>
To: freebsd-questions@freebsd.org
In-Reply-To: <20070103120041.C104816A569@hub.freebsd.org>
Message-ID: <Pine.BSF.3.96.1070104025642.7024B-100000@gaia.nimnet.asn.au>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: James Long <list@museum.rain.com>, Kurt Buff <kurt.buff@gmail.com>
Subject: Re: Batch file question - average size of file in directory
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jan 2007 17:46:50 -0000

 > Message: 17
 > Date: Tue, 2 Jan 2007 19:50:01 -0800
 > From: James Long <list@museum.rain.com>

 > > Message: 28
 > > Date: Tue, 2 Jan 2007 10:20:08 -0800
 > > From: "Kurt Buff" <kurt.buff@gmail.com>

 > > I don't even have a clue how to start this one, so am looking for a little help.
 > > 
 > > I've got a directory with a large number of gzipped files in it (over
 > > 110k) along with a few thousand uncompressed files.

If it were me I'd mv those into a bunch of subdirectories; things get
really slow with more than 500 or so files per directory .. anyway .. 

 > > I'd like to find the average uncompressed size of the gzipped files,
 > > and ignore the uncompressed files.
 > > 
 > > How on earth would I go about doing that with the default shell (no
 > > bash or other shells installed), or in perl, or something like that.
 > > I'm no scripter of any great expertise, and am just stumbling over
 > > this trying to find an approach.
 > > 
 > > Many thanks for any help,
 > > 
 > > Kurt
 > 
 > Hi, Kurt.

And hi, James,

 > Can I make some assumptions that simplify things?  No kinky filenames, 
 > just [a-zA-Z0-9.].  My approach specifically doesn't like colons or 
 > spaces, I bet.  Also, you say gzipped, so I'm assuming it's ONLY gzip, 
 > no bzip2, etc.
 >
 > Here's a first draft that might give you some ideas.  It will output:
 > 
 > foo.gz : 3456
 > bar.gz : 1048576
 > (etc.)
 > 
 > find . -type f | while read fname; do
 >   file $fname | grep -q "compressed" && echo "$fname : $(zcat $fname | wc -c)"
 > done

 % file cat7/tuning.7.gz
 cat7/tuning.7.gz: gzip compressed data, from Unix

Good check, though grep "gzip compressed" excludes bzip2 etc.

But you REALLY don't want to zcat 110 thousand files just to wc 'em,
unless it's a benchmark :) .. may I suggest a slight speedup, template:

 % gunzip -l cat7/tuning.7.gz
 compressed  uncompr. ratio uncompressed_name
     13642     38421  64.5% cat7/tuning.7

 > If you really need a script that will do the math for you, then
 > pip the output of this into bc:
 > 
 > #!/bin/sh
 > 
 > find . -type f | {
 > 
 > n=0
 > echo scale=2
 > echo -n "("
 > while read fname; do
- >   if file $fname | grep -q "compressed"
+    if file $fname | grep -q "gzip compressed"
 >   then
- >     echo -n "$(zcat $fname | wc -c)+"
+      echo -n "$(gunzip -l $fname | grep -v comp | awk '{print $2}')+" 
 >     n=$(($n+1))
 >   fi
 > done
 > echo "0) / $n"
 > 
 > }
 > 
 > That should give you the average decompressed size of the gzip'ped
 > files in the current directory.

HTH, Ian