Date: Fri, 23 Jan 2009 21:06:22 +0100 (CET) From: Oliver Fromme <olli@lurza.secnetix.de> To: dougb@FreeBSD.org (Doug Barton) Cc: Yoshihiro Ota <ota@j.email.ne.jp>, freebsd-hackers@FreeBSD.org, xistence@0x58.com, cperciva@FreeBSD.org Subject: Re: freebsd-update's install_verify routine excessive stating Message-ID: <200901232006.n0NK6M1B092584@lurza.secnetix.de> In-Reply-To: <497A1BEE.7070709@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Doug Barton wrote: > Oliver Fromme wrote: > > Yoshihiro Ota wrote: > > > Oliver Fromme wrote: > > > > It would be much better to generate two lists: > > > > - The list of hashes, as already done ("filelist") > > > > - A list of gzipped files present, stripped to the hash: > > > > > > > > (cd files; echo *.gz) | > > > > tr ' ' '\n' | > > > > sed 's/\.gz$//' > filespresent > > > > > > > > Note we use "echo" instead of "ls", in order to avoid the > > > > kern.argmax limit. 64000 files would certainly exceed that > > > > limit. Also note that the output is already sorted because > > > > the shell sorts wildcard expansions. > > > > > > > > Now that we have those two files, we can use comm(1) to > > > > find out whether there are any hashes in filelist that are > > > > not in filespresent: > > > > > > > > if [ -n "$(comm -23 filelist filespresent)" ]; then > > > > echo -n "Update files missing -- " > > > > ... > > > > fi > > > > > > > > That solution scales much better because no shell loop is > > > > required at all. > > > > > > This will probably be the fastest. > > > > Are you sure? I'm not. > > I'd put money on this being faster for a lot of reasons. I assume, with "this" you mean my solution to the slow shell loop problem (not quoted above), not Yoshihiro Ota's awk proposal? > test is a > builtin in our /bin/sh, so there is no exec involved for 'test -f', > but going out to disk for 64k files on an individual basis should > definitely be slower than getting the file list in one shot. Correct. > There's no doubt that the current routine is not efficient. The cat > should be eliminated, the following is equivalent: > > cut -f 2,7 -d '|' $@ | > > (quoting the $@ won't make a difference here). Right, technically it doesn't make a difference because the file names are fixed and don't contain spaces. *But* then it is better to use $*. Every time I see $@ without double quotes I wonder if the author forgot to add them. It always smells like a bug. Using $@ without quotes is pointless because then it behaves exactly the same as $*. > I haven't seen the files we're talking about, but I can't help > thinking that cut | grep | cut could be streamlined. Yes, it can. I already explained pretty much all of that (useless cat etc.) in my first post in this thread. Did you read it? My suggestion (after a small correction by Christoph Mallon) was to replace the cat|cut|grep|cut sequence with this single awk command: awk -F "|" '$2 ~ /^f/ {print $7}' "$@" For those not fluent with awk, it means this: - Treat "|" as field separator. - Search for lines where the second field matches ^f (i.e. it starts with an "f"). - Print the 7th field of those matching lines. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd In my experience the term "transparent proxy" is an oxymoron (like jumbo shrimp). "Transparent" proxies seem to vary from the distortions of a funhouse mirror to barely translucent. I really, really dislike them when trying to figure out the corrective lenses needed with each of them. -- R. Kevin Oberman, Network Engineer
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200901232006.n0NK6M1B092584>