From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 23 01:57:30 2009 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A41F106566C for ; Fri, 23 Jan 2009 01:57:30 +0000 (UTC) (envelope-from ota@j.email.ne.jp) Received: from mail.asahi-net.or.jp (mail1.asahi-net.or.jp [202.224.39.197]) by mx1.freebsd.org (Postfix) with ESMTP id 593928FC13 for ; Fri, 23 Jan 2009 01:57:30 +0000 (UTC) (envelope-from ota@j.email.ne.jp) Received: from localhost (pool-72-81-43-86.phlapa.east.verizon.net [72.81.43.86]) by mail.asahi-net.or.jp (Postfix) with ESMTP id 7B67B5D4AB; Fri, 23 Jan 2009 10:39:13 +0900 (JST) Date: Thu, 22 Jan 2009 20:38:19 -0500 From: Yoshihiro Ota To: Oliver Fromme Message-Id: <20090122203819.585fb35f.ota@j.email.ne.jp> In-Reply-To: <200901221217.n0MCHfY3086653@lurza.secnetix.de> References: <200901221217.n0MCHfY3086653@lurza.secnetix.de> X-Mailer: Sylpheed 2.6.0 (GTK+ 2.12.11; i386-portbld-freebsd7.1) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@FreeBSD.ORG, xistence@0x58.com, cperciva@FreeBSD.ORG Subject: Re: freebsd-update's install_verify routine excessive stating X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jan 2009 01:57:30 -0000 Hi. It's interesting. On Thu, 22 Jan 2009 13:17:41 +0100 (CET) Oliver Fromme wrote: > Hi, > > > So I would suggest to replace the whole pipe with this: > > awk -F "|" '$2 ~ /^f/ {print $2}' "$@" | > sort -u > filelist > > It would be much better to generate two lists: > - The list of hashes, as already done ("filelist") > - A list of gzipped files present, stripped to the hash: > > (cd files; echo *.gz) | > tr ' ' '\n' | > sed 's/\.gz$//' > filespresent > > Note we use "echo" instead of "ls", in order to avoid the > kern.argmax limit. 64000 files would certainly exceed that > limit. Also note that the output is already sorted because > the shell sorts wildcard expansions. > > Now that we have those two files, we can use comm(1) to > find out whether there are any hashes in filelist that are > not in filespresent: > > if [ -n "$(comm -23 filelist filespresent)" ]; then > echo -n "Update files missing -- " > ... > fi > > That solution scales much better because no shell loop is > required at all. This will probably be the fastest. awk -F "|" ' $2 ~ /^f/{required[$7]=$7; count++} END{FS="[/.]"; while("find files -name *.gz" | getline>0) if($2 in required) if(--count<=0) exit(0); exit(count)}' "$@" It verifies entries using hashtable instead of sort. Therefore, it is O(n+m) instead of O(n*log(n))+O(m*log(m)) in theory. I am not well aware how good awk's associate array is, though. Regards, Hiro