From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 23 22:22:15 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 123551065677; Fri, 23 Jan 2009 22:22:15 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (unknown [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 8636F8FC17; Fri, 23 Jan 2009 22:22:14 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id n0NMMAkT097665; Fri, 23 Jan 2009 23:22:11 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id n0NMMAcS097663; Fri, 23 Jan 2009 23:22:10 +0100 (CET) (envelope-from olli) From: Oliver Fromme Message-Id: <200901232222.n0NMMAcS097663@lurza.secnetix.de> To: dougb@freebsd.org (Doug Barton) Date: Fri, 23 Jan 2009 23:22:10 +0100 (CET) In-Reply-To: <497A2A83.9010606@FreeBSD.org> X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Fri, 23 Jan 2009 23:22:11 +0100 (CET) Cc: Yoshihiro Ota , freebsd-hackers@freebsd.org, xistence@0x58.com, cperciva@freebsd.org Subject: Re: freebsd-update's install_verify routine excessive stating X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jan 2009 22:22:15 -0000 Doug Barton wrote: > Oliver Fromme wrote: > > I assume, with "this" you mean my solution to the slow > > shell loop problem (not quoted above), not Yoshihiro Ota's > > awk proposal? > > I meant the solution using comm, sorry. (I forgot to mention that I > would probably use cmp here, but that's a personal preference.) I see. No problem. However, I think cmp wouldn't work here, because cmp only detects whether there is a difference between two files. In this case we need to know if one file is a subset of the other: For every hash there must be a .gz file, but it doesn't hurt if there are more files. So the list of hashes can be a subset of the list of .gz files, they don't have to be equal. While I were at it, I skimmed through the cmp source and found a bug (or inefficiency): When the -s option is specified (i.e. silent, exit code only), it would be sufficient to terminate when the first difference is encountered. But it always compares the whole files. I'll try to make a patch to improve this. > > Yes, it can. I already explained pretty much all of that > > (useless cat etc.) in my first post in this thread. Did > > you read it? > > Yes, I was attempting to agree with you. :) OK, sorry. I misunderstood. :) > > My suggestion (after a small correction by > > Christoph Mallon) was to replace the cat|cut|grep|cut > > sequence with this single awk command: > > > > awk -F "|" '$2 ~ /^f/ {print $7}' "$@" > > > > For those not fluent with awk, it means this: > > - Treat "|" as field separator. > > - Search for lines where the second field matches ^f > > (i.e. it starts with an "f"). > > - Print the 7th field of those matching lines. > > Like I said, I haven't seen the files, but this looks good at first > blush. That said, the generation of the hash list file is just a drop > in the bucket. The real inefficiency in this function is the test -f > for 64k files, one at a time. Yes, definitely. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "We will perhaps eventually be writing only small modules which are identi- fied by name as they are used to build larger ones, so that devices like indentation, rather than delimiters, might become feasible for expressing local structure in the source language." -- Donald E. Knuth, 1974