From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan 23 22:22:15 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 123551065677;
	Fri, 23 Jan 2009 22:22:15 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (unknown [IPv6:2a01:170:102f::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8636F8FC17;
	Fri, 23 Jan 2009 22:22:14 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (localhost [127.0.0.1])
	by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id n0NMMAkT097665;
	Fri, 23 Jan 2009 23:22:11 +0100 (CET)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.14.3/8.14.3/Submit) id n0NMMAcS097663;
	Fri, 23 Jan 2009 23:22:10 +0100 (CET) (envelope-from olli)
From: Oliver Fromme <olli@lurza.secnetix.de>
Message-Id: <200901232222.n0NMMAcS097663@lurza.secnetix.de>
To: dougb@freebsd.org (Doug Barton)
Date: Fri, 23 Jan 2009 23:22:10 +0100 (CET)
In-Reply-To: <497A2A83.9010606@FreeBSD.org>
X-Mailer: ELM [version 2.5 PL8]
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2
	(lurza.secnetix.de [127.0.0.1]);
	Fri, 23 Jan 2009 23:22:11 +0100 (CET)
Cc: Yoshihiro Ota <ota@j.email.ne.jp>, freebsd-hackers@freebsd.org,
	xistence@0x58.com, cperciva@freebsd.org
Subject: Re: freebsd-update's install_verify routine excessive stating
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Jan 2009 22:22:15 -0000


Doug Barton wrote:
 > Oliver Fromme wrote:
 > > I assume, with "this" you mean my solution to the slow
 > > shell loop problem (not quoted above), not Yoshihiro Ota's
 > > awk proposal?
 > 
 > I meant the solution using comm, sorry. (I forgot to mention that I
 > would probably use cmp here, but that's a personal preference.)

I see.  No problem.

However, I think cmp wouldn't work here, because cmp only
detects whether there is a difference between two files.

In this case we need to know if one file is a subset of
the other:  For every hash there must be a .gz file, but
it doesn't hurt if there are more files.  So the list of
hashes can be a subset of the list of .gz files, they
don't have to be equal.

While I were at it, I skimmed through the cmp source and
found a bug (or inefficiency):  When the -s option is
specified (i.e. silent, exit code only), it would be
sufficient to terminate when the first difference is
encountered.  But it always compares the whole files.
I'll try to make a patch to improve this.

 > > Yes, it can.  I already explained pretty much all of that
 > > (useless cat etc.) in my first post in this thread.  Did
 > > you read it? 
 > 
 > Yes, I was attempting to agree with you. :)

OK, sorry.  I misunderstood.  :)

 > > My suggestion (after a small correction by
 > > Christoph Mallon) was to replace the cat|cut|grep|cut
 > > sequence with this single awk command:
 > > 
 > > awk -F "|" '$2 ~ /^f/ {print $7}' "$@"
 > > 
 > > For those not fluent with awk, it means this:
 > >  - Treat "|" as field separator.
 > >  - Search for lines where the second field matches ^f
 > >    (i.e. it starts with an "f").
 > >  - Print the 7th field of those matching lines.
 > 
 > Like I said, I haven't seen the files, but this looks good at first
 > blush. That said, the generation of the hash list file is just a drop
 > in the bucket. The real inefficiency in this function is the test -f
 > for 64k files, one at a time.

Yes, definitely.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"We will perhaps eventually be writing only small modules which are identi-
fied by name as they are used to build larger ones, so that devices like
indentation, rather than delimiters, might become feasible for expressing
local structure in the source language." -- Donald E. Knuth, 1974