From owner-freebsd-questions@FreeBSD.ORG Wed Jan 3 17:46:50 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B7F2716A415 for ; Wed, 3 Jan 2007 17:46:50 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from gaia.nimnet.asn.au (nimbin.lnk.telstra.net [139.130.45.143]) by mx1.freebsd.org (Postfix) with ESMTP id CD19E13C441 for ; Wed, 3 Jan 2007 17:46:48 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from localhost (smithi@localhost) by gaia.nimnet.asn.au (8.8.8/8.8.8R1.4) with SMTP id EAA14525; Thu, 4 Jan 2007 04:46:43 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Thu, 4 Jan 2007 04:46:43 +1100 (EST) From: Ian Smith To: freebsd-questions@freebsd.org In-Reply-To: <20070103120041.C104816A569@hub.freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: James Long , Kurt Buff Subject: Re: Batch file question - average size of file in directory X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jan 2007 17:46:50 -0000 > Message: 17 > Date: Tue, 2 Jan 2007 19:50:01 -0800 > From: James Long > > Message: 28 > > Date: Tue, 2 Jan 2007 10:20:08 -0800 > > From: "Kurt Buff" > > I don't even have a clue how to start this one, so am looking for a little help. > > > > I've got a directory with a large number of gzipped files in it (over > > 110k) along with a few thousand uncompressed files. If it were me I'd mv those into a bunch of subdirectories; things get really slow with more than 500 or so files per directory .. anyway .. > > I'd like to find the average uncompressed size of the gzipped files, > > and ignore the uncompressed files. > > > > How on earth would I go about doing that with the default shell (no > > bash or other shells installed), or in perl, or something like that. > > I'm no scripter of any great expertise, and am just stumbling over > > this trying to find an approach. > > > > Many thanks for any help, > > > > Kurt > > Hi, Kurt. And hi, James, > Can I make some assumptions that simplify things? No kinky filenames, > just [a-zA-Z0-9.]. My approach specifically doesn't like colons or > spaces, I bet. Also, you say gzipped, so I'm assuming it's ONLY gzip, > no bzip2, etc. > > Here's a first draft that might give you some ideas. It will output: > > foo.gz : 3456 > bar.gz : 1048576 > (etc.) > > find . -type f | while read fname; do > file $fname | grep -q "compressed" && echo "$fname : $(zcat $fname | wc -c)" > done % file cat7/tuning.7.gz cat7/tuning.7.gz: gzip compressed data, from Unix Good check, though grep "gzip compressed" excludes bzip2 etc. But you REALLY don't want to zcat 110 thousand files just to wc 'em, unless it's a benchmark :) .. may I suggest a slight speedup, template: % gunzip -l cat7/tuning.7.gz compressed uncompr. ratio uncompressed_name 13642 38421 64.5% cat7/tuning.7 > If you really need a script that will do the math for you, then > pip the output of this into bc: > > #!/bin/sh > > find . -type f | { > > n=0 > echo scale=2 > echo -n "(" > while read fname; do - > if file $fname | grep -q "compressed" + if file $fname | grep -q "gzip compressed" > then - > echo -n "$(zcat $fname | wc -c)+" + echo -n "$(gunzip -l $fname | grep -v comp | awk '{print $2}')+" > n=$(($n+1)) > fi > done > echo "0) / $n" > > } > > That should give you the average decompressed size of the gzip'ped > files in the current directory. HTH, Ian