Date: Fri, 23 Jan 2009 11:35:10 -0800 From: Doug Barton <dougb@FreeBSD.org> To: Oliver Fromme <olli@lurza.secnetix.de> Cc: Yoshihiro Ota <ota@j.email.ne.jp>, freebsd-hackers@FreeBSD.ORG, xistence@0x58.com, cperciva@FreeBSD.ORG Subject: Re: freebsd-update's install_verify routine excessive stating Message-ID: <497A1BEE.7070709@FreeBSD.org> In-Reply-To: <200901231109.n0NB933k069163@lurza.secnetix.de> References: <200901231109.n0NB933k069163@lurza.secnetix.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Oliver Fromme wrote: > Yoshihiro Ota wrote: > > Oliver Fromme wrote: > > > It would be much better to generate two lists: > > > - The list of hashes, as already done ("filelist") > > > - A list of gzipped files present, stripped to the hash: > > > > > > (cd files; echo *.gz) | > > > tr ' ' '\n' | > > > sed 's/\.gz$//' > filespresent > > > > > > Note we use "echo" instead of "ls", in order to avoid the > > > kern.argmax limit. 64000 files would certainly exceed that > > > limit. Also note that the output is already sorted because > > > the shell sorts wildcard expansions. > > > > > > Now that we have those two files, we can use comm(1) to > > > find out whether there are any hashes in filelist that are > > > not in filespresent: > > > > > > if [ -n "$(comm -23 filelist filespresent)" ]; then > > > echo -n "Update files missing -- " > > > ... > > > fi > > > > > > That solution scales much better because no shell loop is > > > required at all. > > > > This will probably be the fastest. > > Are you sure? I'm not. I'd put money on this being faster for a lot of reasons. test is a builtin in our /bin/sh, so there is no exec involved for 'test -f', but going out to disk for 64k files on an individual basis should definitely be slower than getting the file list in one shot. There's no doubt that the current routine is not efficient. The cat should be eliminated, the following is equivalent: cut -f 2,7 -d '|' $@ | (quoting the $@ won't make a difference here). I haven't seen the files we're talking about, but I can't help thinking that cut | grep | cut could be streamlined. > Only a benchmark can answer that. Agreed, when making changes like this you should always benchmark them. I did a lot of that when working on portmaster 2.0 which is why I have some familiarity with this issue. > > awk -F "|" ' > > $2 ~ /^f/{required[$7]=$7; count++} > > END{FS="[/.]"; > > while("find files -name *.gz" | getline>0) > > if($2 in required) > > if(--count<=0) > > exit(0); > > exit(count)}' "$@" > > I think this awk solution is more difficult to read and > understand, which means that it is also more prone to > introduce errors. I agree, but I have only passing familiarity with awk, so to someone who knows awk this might look like "hello world." :) Doug -- This .signature sanitized for your protection
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?497A1BEE.7070709>