From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 23 20:06:29 2009 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74554106564A; Fri, 23 Jan 2009 20:06:29 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (unknown [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id E53E18FC14; Fri, 23 Jan 2009 20:06:28 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id n0NK6MfS092586; Fri, 23 Jan 2009 21:06:23 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id n0NK6M1B092584; Fri, 23 Jan 2009 21:06:22 +0100 (CET) (envelope-from olli) From: Oliver Fromme Message-Id: <200901232006.n0NK6M1B092584@lurza.secnetix.de> To: dougb@FreeBSD.org (Doug Barton) Date: Fri, 23 Jan 2009 21:06:22 +0100 (CET) In-Reply-To: <497A1BEE.7070709@FreeBSD.org> X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Fri, 23 Jan 2009 21:06:25 +0100 (CET) Cc: Yoshihiro Ota , freebsd-hackers@FreeBSD.org, xistence@0x58.com, cperciva@FreeBSD.org Subject: Re: freebsd-update's install_verify routine excessive stating X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jan 2009 20:06:29 -0000 Doug Barton wrote: > Oliver Fromme wrote: > > Yoshihiro Ota wrote: > > > Oliver Fromme wrote: > > > > It would be much better to generate two lists: > > > > - The list of hashes, as already done ("filelist") > > > > - A list of gzipped files present, stripped to the hash: > > > > > > > > (cd files; echo *.gz) | > > > > tr ' ' '\n' | > > > > sed 's/\.gz$//' > filespresent > > > > > > > > Note we use "echo" instead of "ls", in order to avoid the > > > > kern.argmax limit. 64000 files would certainly exceed that > > > > limit. Also note that the output is already sorted because > > > > the shell sorts wildcard expansions. > > > > > > > > Now that we have those two files, we can use comm(1) to > > > > find out whether there are any hashes in filelist that are > > > > not in filespresent: > > > > > > > > if [ -n "$(comm -23 filelist filespresent)" ]; then > > > > echo -n "Update files missing -- " > > > > ... > > > > fi > > > > > > > > That solution scales much better because no shell loop is > > > > required at all. > > > > > > This will probably be the fastest. > > > > Are you sure? I'm not. > > I'd put money on this being faster for a lot of reasons. I assume, with "this" you mean my solution to the slow shell loop problem (not quoted above), not Yoshihiro Ota's awk proposal? > test is a > builtin in our /bin/sh, so there is no exec involved for 'test -f', > but going out to disk for 64k files on an individual basis should > definitely be slower than getting the file list in one shot. Correct. > There's no doubt that the current routine is not efficient. The cat > should be eliminated, the following is equivalent: > > cut -f 2,7 -d '|' $@ | > > (quoting the $@ won't make a difference here). Right, technically it doesn't make a difference because the file names are fixed and don't contain spaces. *But* then it is better to use $*. Every time I see $@ without double quotes I wonder if the author forgot to add them. It always smells like a bug. Using $@ without quotes is pointless because then it behaves exactly the same as $*. > I haven't seen the files we're talking about, but I can't help > thinking that cut | grep | cut could be streamlined. Yes, it can. I already explained pretty much all of that (useless cat etc.) in my first post in this thread. Did you read it? My suggestion (after a small correction by Christoph Mallon) was to replace the cat|cut|grep|cut sequence with this single awk command: awk -F "|" '$2 ~ /^f/ {print $7}' "$@" For those not fluent with awk, it means this: - Treat "|" as field separator. - Search for lines where the second field matches ^f (i.e. it starts with an "f"). - Print the 7th field of those matching lines. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd In my experience the term "transparent proxy" is an oxymoron (like jumbo shrimp). "Transparent" proxies seem to vary from the distortions of a funhouse mirror to barely translucent. I really, really dislike them when trying to figure out the corrective lenses needed with each of them. -- R. Kevin Oberman, Network Engineer