From owner-freebsd-questions@FreeBSD.ORG Wed Jan 3 18:42:45 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E7DE216A407 for ; Wed, 3 Jan 2007 18:42:45 +0000 (UTC) (envelope-from kurt.buff@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.232]) by mx1.freebsd.org (Postfix) with ESMTP id AAE1413C441 for ; Wed, 3 Jan 2007 18:42:45 +0000 (UTC) (envelope-from kurt.buff@gmail.com) Received: by wx-out-0506.google.com with SMTP id s18so6038823wxc for ; Wed, 03 Jan 2007 10:42:45 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=olO32d1Gcf1fFR76H6pyvfcH8lLWKKql8yKrLJsf0qejLMVaI9CMIP58xdt3/RqQ3sjv3tANLQbpnPo6hFjUmgs8ohdwkPBK2zK0Rv8M7YNWVV6AlBedy8uDwuSTgCxgQ/amnJfjPAmyoSCHQs87/Gsl95Hjk19UYGZgXrFobhQ= Received: by 10.70.129.4 with SMTP id b4mr39388354wxd.1167849765546; Wed, 03 Jan 2007 10:42:45 -0800 (PST) Received: by 10.70.131.11 with HTTP; Wed, 3 Jan 2007 10:42:44 -0800 (PST) Message-ID: Date: Wed, 3 Jan 2007 10:42:44 -0800 From: "Kurt Buff" To: "James Long" In-Reply-To: <20070103035000.GA99263@ns.umpquanet.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070102200721.31D1C16A517@hub.freebsd.org> <20070103035000.GA99263@ns.umpquanet.com> Cc: freebsd-questions@freebsd.org Subject: Re: Batch file question - average size of file in directory X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jan 2007 18:42:46 -0000 On 1/2/07, James Long wrote: > Hi, Kurt. > > Can I make some assumptions that simplify things? No kinky filenames, > just [a-zA-Z0-9.]. My approach specifically doesn't like colons or > spaces, I bet. Also, you say gzipped, so I'm assuming it's ONLY gzip, > no bzip2, etc. Right, no other compression types - just .gz. Here's a small snippet of the directory listing: -rw-r----- 1 kurt kurt 108208 Dec 21 06:15 dummy-zKLQEWrDDOZh -rw-r----- 1 kurt kurt 24989 Dec 28 17:29 dummy-zfzaEjlURTU1 -rw-r----- 1 kurt kurt 30596 Jan 2 19:37 stuff-0+-OvVrXcEoq.gz -rw-r----- 1 kurt kurt 2055 Dec 22 20:25 stuff-0+19OXqwpEdH.gz -rw-r----- 1 kurt kurt 13781 Dec 30 03:53 stuff-0+1bMFK2XvlQ.gz -rw-r----- 1 kurt kurt 11485 Dec 20 04:40 stuff-0+5jriDIt0jc.gz > Here's a first draft that might give you some ideas. It will output: > > foo.gz : 3456 > bar.gz : 1048576 > (etc.) > > find . -type f | while read fname; do > file $fname | grep -q "compressed" && echo "$fname : $(zcat $fname | wc -c)" > done > > > If you really need a script that will do the math for you, then > pip the output of this into bc: > > #!/bin/sh > > find . -type f | { > > n=0 > echo scale=2 > echo -n "(" > while read fname; do > if file $fname | grep -q "compressed" > then > echo -n "$(zcat $fname | wc -c)+" > n=$(($n+1)) > fi > done > echo "0) / $n" > > } > > That should give you the average decompressed size of the gzip'ped > files in the current directory. Hmmm.... That's the same basic approach that Giogos took, to uncompress the file and count bytes with wc. I'm liking the 'zcat -l' contstruct, as it looks more flexible, but then I have to parse the output, probably with grep and cut. Time to put on my thinking cap - I'll get back to the list on this. Kurt