From owner-freebsd-questions@FreeBSD.ORG  Wed Jan  3 18:42:45 2007
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E7DE216A407
	for <freebsd-questions@freebsd.org>;
	Wed,  3 Jan 2007 18:42:45 +0000 (UTC)
	(envelope-from kurt.buff@gmail.com)
Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.232])
	by mx1.freebsd.org (Postfix) with ESMTP id AAE1413C441
	for <freebsd-questions@freebsd.org>;
	Wed,  3 Jan 2007 18:42:45 +0000 (UTC)
	(envelope-from kurt.buff@gmail.com)
Received: by wx-out-0506.google.com with SMTP id s18so6038823wxc
	for <freebsd-questions@freebsd.org>;
	Wed, 03 Jan 2007 10:42:45 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=olO32d1Gcf1fFR76H6pyvfcH8lLWKKql8yKrLJsf0qejLMVaI9CMIP58xdt3/RqQ3sjv3tANLQbpnPo6hFjUmgs8ohdwkPBK2zK0Rv8M7YNWVV6AlBedy8uDwuSTgCxgQ/amnJfjPAmyoSCHQs87/Gsl95Hjk19UYGZgXrFobhQ=
Received: by 10.70.129.4 with SMTP id b4mr39388354wxd.1167849765546;
	Wed, 03 Jan 2007 10:42:45 -0800 (PST)
Received: by 10.70.131.11 with HTTP; Wed, 3 Jan 2007 10:42:44 -0800 (PST)
Message-ID: <a9f4a3860701031042u45757b7ag897d55e1969f84b8@mail.gmail.com>
Date: Wed, 3 Jan 2007 10:42:44 -0800
From: "Kurt Buff" <kurt.buff@gmail.com>
To: "James Long" <list@museum.rain.com>
In-Reply-To: <20070103035000.GA99263@ns.umpquanet.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20070102200721.31D1C16A517@hub.freebsd.org>
	<20070103035000.GA99263@ns.umpquanet.com>
Cc: freebsd-questions@freebsd.org
Subject: Re: Batch file question - average size of file in directory
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jan 2007 18:42:46 -0000

On 1/2/07, James Long <list@museum.rain.com> wrote:
<snip my problem description>
> Hi, Kurt.
>
> Can I make some assumptions that simplify things?  No kinky filenames,
> just [a-zA-Z0-9.].  My approach specifically doesn't like colons or
> spaces, I bet.  Also, you say gzipped, so I'm assuming it's ONLY gzip,
> no bzip2, etc.

Right, no other compression types - just .gz.

Here's a small snippet of the directory listing:

-rw-r-----  1 kurt  kurt   108208 Dec 21 06:15 dummy-zKLQEWrDDOZh
-rw-r-----  1 kurt  kurt    24989 Dec 28 17:29 dummy-zfzaEjlURTU1
-rw-r-----  1 kurt  kurt    30596 Jan  2 19:37 stuff-0+-OvVrXcEoq.gz
-rw-r-----  1 kurt  kurt     2055 Dec 22 20:25 stuff-0+19OXqwpEdH.gz
-rw-r-----  1 kurt  kurt    13781 Dec 30 03:53 stuff-0+1bMFK2XvlQ.gz
-rw-r-----  1 kurt  kurt    11485 Dec 20 04:40 stuff-0+5jriDIt0jc.gz


> Here's a first draft that might give you some ideas.  It will output:
>
> foo.gz : 3456
> bar.gz : 1048576
> (etc.)
>
> find . -type f | while read fname; do
>   file $fname | grep -q "compressed" && echo "$fname : $(zcat $fname | wc -c)"
> done
>
>
> If you really need a script that will do the math for you, then
> pip the output of this into bc:
>
> #!/bin/sh
>
> find . -type f | {
>
> n=0
> echo scale=2
> echo -n "("
> while read fname; do
>   if file $fname | grep -q "compressed"
>   then
>     echo -n "$(zcat $fname | wc -c)+"
>     n=$(($n+1))
>   fi
> done
> echo "0) / $n"
>
> }
>
> That should give you the average decompressed size of the gzip'ped
> files in the current directory.


Hmmm....

That's the same basic approach that Giogos took, to uncompress the
file and count bytes with wc. I'm liking the 'zcat -l' contstruct, as
it looks more flexible, but then I have to parse the output, probably
with grep and cut.

Time to put on my thinking cap - I'll get back to the list on this.

Kurt