From owner-freebsd-questions@FreeBSD.ORG  Sat May  7 11:15:03 2011
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3997F1065672
	for <freebsd-questions@FreeBSD.org>;
	Sat,  7 May 2011 11:15:03 +0000 (UTC)
	(envelope-from listreader@lazlarlyricon.com)
Received: from mailgw5.surf-town.net (mail12.surf-town.net [212.97.132.52])
	by mx1.freebsd.org (Postfix) with ESMTP id BC9148FC08
	for <freebsd-questions@FreeBSD.org>;
	Sat,  7 May 2011 11:15:02 +0000 (UTC)
Received: by mailgw5.surf-town.net (Postfix, from userid 65534)
	id DA6FC1FF07; Sat,  7 May 2011 13:15:01 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])
	by mailgw5.surf-town.net (Postfix) with ESMTP id B635C1FF25;
	Sat,  7 May 2011 13:15:01 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mailgw5.surf-town.net
X-Spam-Flag: NO
X-Spam-Score: -1.44
X-Spam-Level: 
X-Spam-Status: No, score=-1.44 tagged_above=-999 required=7
	tests=[ALL_TRUSTED=-1.44]
Received: from mailgw5.surf-town.net ([127.0.0.1])
	by localhost (mailgw5.surf-town.net [127.0.0.1]) (amavisd-new,
	port 10024)
	with LMTP id 1wtQmc-Bh9bS; Sat,  7 May 2011 13:14:56 +0200 (CEST)
Received: from lazlar.kicks-ass.net
	(c-0987e355.09-42-6e6b7010.cust.bredbandsbolaget.se [85.227.135.9])
	by mailgw5.surf-town.net (Postfix) with ESMTPA id E734B1FF07;
	Sat,  7 May 2011 13:14:54 +0200 (CEST)
Message-ID: <4DC529AD.5080906@lazlarlyricon.com>
Date: Sat, 07 May 2011 13:14:53 +0200
From: Rolf Nielsen <listreader@lazlarlyricon.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; sv-SE;
	rv:1.9.2.17) Gecko/20110502 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: Robert Bonomi <bonomi@mail.r-bonomi.com>
References: <201105070528.p475SvZ8093849@mail.r-bonomi.com>
In-Reply-To: <201105070528.p475SvZ8093849@mail.r-bonomi.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-questions@FreeBSD.org
Subject: Re: Comparing two lists
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 May 2011 11:15:03 -0000

2011-05-07 07:28, Robert Bonomi skrev:
>>  From listreader@lazlarlyricon.com  Fri May  6 20:14:09 2011
>> Date: Sat, 07 May 2011 03:13:39 +0200
>> From: Rolf Nielsen<listreader@lazlarlyricon.com>
>> To: Robert Bonomi<bonomi@mail.r-bonomi.com>
>> CC: freebsd-questions@freebsd.org
>> Subject: Re: Comparing two lists
>>
>> 2011-05-07 02:54, Robert Bonomi skrev:
>>>>    From owner-freebsd-questions@freebsd.org  Fri May  6 19:27:54 2011
>>>> Date: Sat, 07 May 2011 02:09:26 +0200
>>>> From: Rolf Nielsen<listreader@lazlarlyricon.com>
>>>> To: FreeBSD<freebsd-questions@freebsd.org>
>>>> Subject: Comparing two lists
>>>>
>>>> Hello all,
>>>>
>>>> I have two text files, quite extensive ones. They have some lines in
>>>> common and some lines are unique to one of the files. The lines that do
>>>> exist in both files are not necessarily in the same location. Now I need
>>>> to compare the files and output a list of lines that exist in both
>>>> files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
>>>> combination of two or more of them?
>>>
>>>
>>> If the files have only 'minor' differences -- i.e. no long runs of lines
>>> that are in only one fie -- *and* the common lines are  in the same order
>>> in each file, you can use diff(1), without any other shennigans.
>>>
>>> If the above is -not- true, and If you need _only_ the common lines, AND
>>> order is not important, then sort(1) both files, and use diff(1) on the
>>> two sorted versions.
>>>
>>>
>>> Beyond that it depends on what you mean by 'extensive' ones.  megabytes?
>>> Gigabytes? or what??
>>>
>>>
>>>
>>
>> Some 10,000 to 20,000 lines each. I do need only the common lines. Order
>> is not essential, but would make life easier. I've tried a little with
>> uniq, as suggested by Polyptron, but I guess 3am is not quite the right
>> time to do these things. Anyway, thanks.
>
> Ok, 20k lines is only a medium-size file. There's no problem in fitting
> the entire file 'in memory'.  ('big' files are ones that are larger than
> available memory. :)

By "quite extensive" I was refering to the number of lines rather than 
the byte size, and 20k lines is, by my standards, quite a lot for a 
plain text file. :P
But that's beside the point. :)

>
> Using uniq:
>     sort  {{file1}} {{file2}} |uniq -d

Yes, I found that solution on
http://www.catonmat.net/blog/set-operations-in-unix-shell
which is mainly about comm, but also lists other ways of doing things. I 
also found
grep -xF -f file1 file2
there, and I've tested that one too. Both seem to be doing what I want.

>
> to maintain order, put the following in a file, call it 'common.awk'
>
>       NR==FNR   { array[$0]=1; next; }
>                 { if (array[$0] == 1) print $0; }
>
> then use the command:
>
>    awk -f common.awk {{file1}} {{file2}}
>
> This will output common lines, in the order they occur in _file2_.
>
>

I took the liberty of sending a copy of this to the list although you 
replied privately.