From owner-freebsd-questions@FreeBSD.ORG Fri Sep 14 07:27:45 2007 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9144916A419 for ; Fri, 14 Sep 2007 07:27:45 +0000 (UTC) (envelope-from jonathan+freebsd-questions@hst.org.za) Received: from hermes.hst.org.za (onix.hst.org.za [209.203.2.133]) by mx1.freebsd.org (Postfix) with ESMTP id AC00E13C461 for ; Fri, 14 Sep 2007 07:27:43 +0000 (UTC) (envelope-from jonathan+freebsd-questions@hst.org.za) Received: from sysadmin.hst.org.za (sysadmin.int.dbn.hst.org.za [10.1.1.20]) (authenticated bits=0) by hermes.hst.org.za (8.13.8/8.13.8) with ESMTP id l8E7NEJK090912 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Fri, 14 Sep 2007 09:23:14 +0200 (SAST) (envelope-from jonathan+freebsd-questions@hst.org.za) From: Jonathan McKeown Organization: Health Systems Trust To: freebsd-questions@freebsd.org Date: Fri, 14 Sep 2007 09:30:20 +0200 User-Agent: KMail/1.7.2 References: <20070913183504.GC11683@slackbox.xs4all.nl> In-Reply-To: <20070913183504.GC11683@slackbox.xs4all.nl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200709140930.21142.jonathan+freebsd-questions@hst.org.za> X-Spam-Score: -4.218 () ALL_TRUSTED,AWL,BAYES_00 X-Scanned-By: MIMEDefang 2.61 on 209.203.2.133 Cc: Kurt Buff Subject: Re: Scripting question X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Sep 2007 07:27:45 -0000 On Thursday 13 September 2007 20:35, Roland Smith wrote: > On Thu, Sep 13, 2007 at 10:16:40AM -0700, Kurt Buff wrote: > > I'm trying to do some text file manipulation, and it's driving me nuts. [snip] > > I've looked at sort and uniq, and I've googled a fair bit but can't > > seem to find anything that would do this. > > > > I don't have the perl skills, though that would be ideal. > > > > Any help out there? > > #!/usr/bin/perl > while (<>) { > # Assuming no whitespace in addresses; kill everything after the first > # space > s/ .*$//; > # Store the name & count in a hash > $names{$_}++; > } > # Go over the hash > while (($name,$count) = each(%names)) { > if ($count == 1) { > # print unique names. > print $name, "\n"; > } > } Another approach in Perl would be: #!/usr/bin/perl my (%names, %dups); while (<>) { my ($key) = split; $dups{$key} = 1 if $names{$key}; $names{$key} = 1; } delete @names{keys %dups}; # # keys %names is now an unordered list of only non-repeated elements # keys %dups is an unordered list of only repeated elements split splits on whitespace, returning a list of fields which can be assigned to a list of variables. Here we only want to capture the first field: split is more efficient for this than using a regex. The first occurrence of $key is in parens because it's actually a list of one variable name. We build two hashes, one, %name, keyed by the original names (this is the classic way to reduce duplicates to single occurrences, since the duplicated keys overwrite the originals), and one, %dup, whose keys are names already appearing in %names - the duplicated entries. Having done that we use a hash slice to delete from %names all the keys of %dups, which leaves the keys of %names holding all the entries which only appear once (and the keys of %dups all the duplicated entries if that's useful). Jonathan