From owner-freebsd-questions@FreeBSD.ORG  Wed Jan  3 04:21:03 2007
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9014716A407
	for <freebsd-questions@freebsd.org>;
	Wed,  3 Jan 2007 04:21:03 +0000 (UTC)
	(envelope-from list@museum.rain.com)
Received: from ns.umpquanet.com (ns.umpquanet.com [63.105.30.37])
	by mx1.freebsd.org (Postfix) with ESMTP id 7500D13C428
	for <freebsd-questions@freebsd.org>;
	Wed,  3 Jan 2007 04:21:03 +0000 (UTC)
	(envelope-from list@museum.rain.com)
Received: from ns.umpquanet.com (localhost [127.0.0.1])
	by ns.umpquanet.com (8.13.8/8.13.8) with ESMTP id l033o3UK004547;
	Tue, 2 Jan 2007 19:50:03 -0800 (PST)
	(envelope-from list@museum.rain.com)
Received: (from james@localhost)
	by ns.umpquanet.com (8.13.8/8.13.8/Submit) id l033o1mg004545;
	Tue, 2 Jan 2007 19:50:01 -0800 (PST)
	(envelope-from list@museum.rain.com)
Date: Tue, 2 Jan 2007 19:50:01 -0800
From: James Long <list@museum.rain.com>
To: freebsd-questions@freebsd.org, Kurt Buff <kurt.buff@gmail.com>
Message-ID: <20070103035000.GA99263@ns.umpquanet.com>
References: <20070102200721.31D1C16A517@hub.freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20070102200721.31D1C16A517@hub.freebsd.org>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: 
Subject: Re: Batch file question - average size of file in directory
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jan 2007 04:21:03 -0000

> Message: 28
> Date: Tue, 2 Jan 2007 10:20:08 -0800
> From: "Kurt Buff" <kurt.buff@gmail.com>
> Subject: Batch file question - average size of file in directory
> To: questions@freebsd.org
> Message-ID:
> 	<a9f4a3860701021020g1468af4ah26c8a5fe90610719@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> All,
> 
> I don't even have a clue how to start this one, so am looking for a little help.
> 
> I've got a directory with a large number of gzipped files in it (over
> 110k) along with a few thousand uncompressed files.
> 
> I'd like to find the average uncompressed size of the gzipped files,
> and ignore the uncompressed files.
> 
> How on earth would I go about doing that with the default shell (no
> bash or other shells installed), or in perl, or something like that.
> I'm no scripter of any great expertise, and am just stumbling over
> this trying to find an approach.
> 
> Many thanks for any help,
> 
> Kurt

Hi, Kurt.

Can I make some assumptions that simplify things?  No kinky filenames, 
just [a-zA-Z0-9.].  My approach specifically doesn't like colons or 
spaces, I bet.  Also, you say gzipped, so I'm assuming it's ONLY gzip, 
no bzip2, etc.

Here's a first draft that might give you some ideas.  It will output:

foo.gz : 3456
bar.gz : 1048576
(etc.)

find . -type f | while read fname; do
  file $fname | grep -q "compressed" && echo "$fname : $(zcat $fname | wc -c)"
done


If you really need a script that will do the math for you, then
pip the output of this into bc:

#!/bin/sh

find . -type f | {

n=0
echo scale=2
echo -n "("
while read fname; do
  if file $fname | grep -q "compressed"
  then
    echo -n "$(zcat $fname | wc -c)+"
    n=$(($n+1))
  fi
done
echo "0) / $n"

}

That should give you the average decompressed size of the gzip'ped
files in the current directory.