From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan 23 19:34:42 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 260D21065676
	for <freebsd-hackers@FreeBSD.ORG>; Fri, 23 Jan 2009 19:34:42 +0000 (UTC)
	(envelope-from dougb@FreeBSD.org)
Received: from mail2.fluidhosting.com (mx22.fluidhosting.com [204.14.89.5])
	by mx1.freebsd.org (Postfix) with ESMTP id C2D698FC1E
	for <freebsd-hackers@FreeBSD.ORG>; Fri, 23 Jan 2009 19:34:41 +0000 (UTC)
	(envelope-from dougb@FreeBSD.org)
Received: (qmail 29069 invoked by uid 399); 23 Jan 2009 19:34:41 -0000
Received: from localhost (HELO ?192.168.0.19?) (dougb@dougbarton.us@127.0.0.1)
	by localhost with ESMTPAM; 23 Jan 2009 19:34:41 -0000
X-Originating-IP: 127.0.0.1
X-Sender: dougb@dougbarton.us
Message-ID: <497A1BEE.7070709@FreeBSD.org>
Date: Fri, 23 Jan 2009 11:35:10 -0800
From: Doug Barton <dougb@FreeBSD.org>
Organization: http://www.FreeBSD.org/
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: Oliver Fromme <olli@lurza.secnetix.de>
References: <200901231109.n0NB933k069163@lurza.secnetix.de>
In-Reply-To: <200901231109.n0NB933k069163@lurza.secnetix.de>
X-Enigmail-Version: 0.95.7
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Yoshihiro Ota <ota@j.email.ne.jp>, freebsd-hackers@FreeBSD.ORG,
	xistence@0x58.com, cperciva@FreeBSD.ORG
Subject: Re: freebsd-update's install_verify routine excessive stating
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Jan 2009 19:34:42 -0000

Oliver Fromme wrote:
> Yoshihiro Ota wrote:
>  > Oliver Fromme wrote:
>  > > It would be much better to generate two lists:
>  > >  - The list of hashes, as already done ("filelist")
>  > >  - A list of gzipped files present, stripped to the hash:
>  > > 
>  > >    (cd files; echo *.gz) |
>  > >    tr ' ' '\n' |
>  > >    sed 's/\.gz$//' > filespresent
>  > > 
>  > > Note we use "echo" instead of "ls", in order to avoid the
>  > > kern.argmax limit.  64000 files would certainly exceed that
>  > > limit.  Also note that the output is already sorted because
>  > > the shell sorts wildcard expansions.
>  > > 
>  > > Now that we have those two files, we can use comm(1) to
>  > > find out whether there are any hashes in filelist that are
>  > > not in filespresent:
>  > > 
>  > >    if [ -n "$(comm -23 filelist filespresent)" ]; then
>  > >            echo -n "Update files missing -- "
>  > >            ...
>  > >    fi
>  > > 
>  > > That solution scales much better because no shell loop is
>  > > required at all.
>  > 
>  > This will probably be the fastest.
> 
> Are you sure?  I'm not.

I'd put money on this being faster for a lot of reasons. test is a
builtin in our /bin/sh, so there is no exec involved for 'test -f',
but going out to disk for 64k files on an individual basis should
definitely be slower than getting the file list in one shot.

There's no doubt that the current routine is not efficient. The cat
should be eliminated, the following is equivalent:

cut -f 2,7 -d '|' $@ |

(quoting the $@ won't make a difference here).

I haven't seen the files we're talking about, but I can't help
thinking that cut | grep | cut could be streamlined.

> Only a benchmark can answer that. 

Agreed, when making changes like this you should always benchmark
them. I did a lot of that when working on portmaster 2.0 which is why
I have some familiarity with this issue.

>  > awk -F "|" '
>  >   $2 ~ /^f/{required[$7]=$7; count++}
>  >   END{FS="[/.]";
>  >    while("find files -name *.gz" | getline>0)
>  >     if($2 in required)
>  >      if(--count<=0)
>  >       exit(0);
>  >   exit(count)}' "$@"
> 
> I think this awk solution is more difficult to read and
> understand, which means that it is also more prone to
> introduce errors. 

I agree, but I have only passing familiarity with awk, so to someone
who knows awk this might look like "hello world." :)

Doug

-- 

    This .signature sanitized for your protection