Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Jan 2009 23:22:10 +0100 (CET)
From:      Oliver Fromme <olli@lurza.secnetix.de>
To:        dougb@freebsd.org (Doug Barton)
Cc:        Yoshihiro Ota <ota@j.email.ne.jp>, freebsd-hackers@freebsd.org, xistence@0x58.com, cperciva@freebsd.org
Subject:   Re: freebsd-update's install_verify routine excessive stating
Message-ID:  <200901232222.n0NMMAcS097663@lurza.secnetix.de>
In-Reply-To: <497A2A83.9010606@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Doug Barton wrote:
 > Oliver Fromme wrote:
 > > I assume, with "this" you mean my solution to the slow
 > > shell loop problem (not quoted above), not Yoshihiro Ota's
 > > awk proposal?
 > 
 > I meant the solution using comm, sorry. (I forgot to mention that I
 > would probably use cmp here, but that's a personal preference.)

I see.  No problem.

However, I think cmp wouldn't work here, because cmp only
detects whether there is a difference between two files.

In this case we need to know if one file is a subset of
the other:  For every hash there must be a .gz file, but
it doesn't hurt if there are more files.  So the list of
hashes can be a subset of the list of .gz files, they
don't have to be equal.

While I were at it, I skimmed through the cmp source and
found a bug (or inefficiency):  When the -s option is
specified (i.e. silent, exit code only), it would be
sufficient to terminate when the first difference is
encountered.  But it always compares the whole files.
I'll try to make a patch to improve this.

 > > Yes, it can.  I already explained pretty much all of that
 > > (useless cat etc.) in my first post in this thread.  Did
 > > you read it? 
 > 
 > Yes, I was attempting to agree with you. :)

OK, sorry.  I misunderstood.  :)

 > > My suggestion (after a small correction by
 > > Christoph Mallon) was to replace the cat|cut|grep|cut
 > > sequence with this single awk command:
 > > 
 > > awk -F "|" '$2 ~ /^f/ {print $7}' "$@"
 > > 
 > > For those not fluent with awk, it means this:
 > >  - Treat "|" as field separator.
 > >  - Search for lines where the second field matches ^f
 > >    (i.e. it starts with an "f").
 > >  - Print the 7th field of those matching lines.
 > 
 > Like I said, I haven't seen the files, but this looks good at first
 > blush. That said, the generation of the hash list file is just a drop
 > in the bucket. The real inefficiency in this function is the test -f
 > for 64k files, one at a time.

Yes, definitely.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"We will perhaps eventually be writing only small modules which are identi-
fied by name as they are used to build larger ones, so that devices like
indentation, rather than delimiters, might become feasible for expressing
local structure in the source language." -- Donald E. Knuth, 1974



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200901232222.n0NMMAcS097663>