FreeBSD Mail Archives

Date:      Fri, 23 Jan 2009 21:06:22 +0100 (CET)
From:      Oliver Fromme <olli@lurza.secnetix.de>
To:        dougb@FreeBSD.org (Doug Barton)
Cc:        Yoshihiro Ota <ota@j.email.ne.jp>, freebsd-hackers@FreeBSD.org, xistence@0x58.com, cperciva@FreeBSD.org
Subject:   Re: freebsd-update's install_verify routine excessive stating
Message-ID:  <200901232006.n0NK6M1B092584@lurza.secnetix.de>
In-Reply-To: <497A1BEE.7070709@FreeBSD.org>


Doug Barton wrote:
 > Oliver Fromme wrote:
 > > Yoshihiro Ota wrote:
 > > > Oliver Fromme wrote:
 > > > > It would be much better to generate two lists:
 > > > >  - The list of hashes, as already done ("filelist")
 > > > >  - A list of gzipped files present, stripped to the hash:
 > > > > 
 > > > >    (cd files; echo *.gz) |
 > > > >    tr ' ' '\n' |
 > > > >    sed 's/\.gz$//' > filespresent
 > > > > 
 > > > > Note we use "echo" instead of "ls", in order to avoid the
 > > > > kern.argmax limit.  64000 files would certainly exceed that
 > > > > limit.  Also note that the output is already sorted because
 > > > > the shell sorts wildcard expansions.
 > > > > 
 > > > > Now that we have those two files, we can use comm(1) to
 > > > > find out whether there are any hashes in filelist that are
 > > > > not in filespresent:
 > > > > 
 > > > >    if [ -n "$(comm -23 filelist filespresent)" ]; then
 > > > >            echo -n "Update files missing -- "
 > > > >            ...
 > > > >    fi
 > > > > 
 > > > > That solution scales much better because no shell loop is
 > > > > required at all.
 > > > 
 > > > This will probably be the fastest.
 > > 
 > > Are you sure?  I'm not.
 > 
 > I'd put money on this being faster for a lot of reasons.

I assume, with "this" you mean my solution to the slow
shell loop problem (not quoted above), not Yoshihiro Ota's
awk proposal?

 > test is a
 > builtin in our /bin/sh, so there is no exec involved for 'test -f',
 > but going out to disk for 64k files on an individual basis should
 > definitely be slower than getting the file list in one shot.

Correct.

 > There's no doubt that the current routine is not efficient. The cat
 > should be eliminated, the following is equivalent:
 > 
 > cut -f 2,7 -d '|' $@ |
 > 
 > (quoting the $@ won't make a difference here).

Right, technically it doesn't make a difference because the
file names are fixed and don't contain spaces.  *But* then
it is better to use $*.  Every time I see $@ without double
quotes I wonder if the author forgot to add them.  It always
smells like a bug.  Using $@ without quotes is pointless
because then it behaves exactly the same as $*.

 > I haven't seen the files we're talking about, but I can't help
 > thinking that cut | grep | cut could be streamlined.

Yes, it can.  I already explained pretty much all of that
(useless cat etc.) in my first post in this thread.  Did
you read it?  My suggestion (after a small correction by
Christoph Mallon) was to replace the cat|cut|grep|cut
sequence with this single awk command:

awk -F "|" '$2 ~ /^f/ {print $7}' "$@"

For those not fluent with awk, it means this:
 - Treat "|" as field separator.
 - Search for lines where the second field matches ^f
   (i.e. it starts with an "f").
 - Print the 7th field of those matching lines.

Best regards
Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

In my experience the term "transparent proxy" is an oxymoron (like jumbo
shrimp).  "Transparent" proxies seem to vary from the distortions of a
funhouse mirror to barely translucent.  I really, really dislike them
when trying to figure out the corrective lenses needed with each of them.
        -- R. Kevin Oberman, Network Engineer

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200901232006.n0NK6M1B092584>

Header And Logo

Peripheral Links

Site Navigation