Date: Thu, 22 Jan 2009 13:17:41 +0100 (CET) From: Oliver Fromme <olli@lurza.secnetix.de> To: freebsd-hackers@FreeBSD.ORG, xistence@0x58.com, cperciva@FreeBSD.ORG Subject: Re: freebsd-update's install_verify routine excessive stating Message-ID: <200901221217.n0MCHfY3086653@lurza.secnetix.de> In-Reply-To: <F671A531-2093-4CCC-B2AB-9339B19FECC4@0x58.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, I've Cc'ed Colin Percival, the author of freebsd-update. Bert JW Regeer wrote: > Recently I decided it would be time to upgrade an older laptop that > was still running 6.2-RELEASE to a more recent release 7.1-RELEASE > [...] > Everything went well, including the kernel update. It was not until > after I rebooted and ran: > > ./freebsd-update.sh -f freebsd-update.conf install > > That I started noticing something rather weird. It had been running > for a good 30 minutes before I started wondering what was going on. > top gave me nothing of value, other than that all of the time was > spent running sh, and that it was spending all of its time in system, > not user where I would have expected it. Thinking went wrong I stopped > the process and started it again. > > After a ktrace and kdump I found the culprit in install_verify in the > freebsd-update utility, it would read one line, and then run stat on > one of the many files that was listed in /usr/upgrade/files/. > > install_verify () { > # Generate a list of hashes > cat $@ | That should be "$@" with double quotes. $@ doesn't make sense without the quotes. Apart from that, it's a typical case of "useless use of cat". > cut -f 2,7 -d '|' | > grep -E '^f' | > cut -f 2 -d '|' | > sort -u > filelist It's unclear why there are two "cut" commands. The 7th field isn't used at all. Also, the -E option to grep is superfluous here. Finally, using awk might be more efficient than cut + grep. So I would suggest to replace the whole pipe with this: awk -F "|" '$2 ~ /^f/ {print $2}' "$@" | sort -u > filelist > # Make sure all the hashes exist > while read HASH; do > if ! [ -f files/${HASH}.gz ]; then > echo -n "Update files missing -- " > echo "this should never happen." > echo "Re-run '$0 fetch'." > return 1 > fi > done < filelist > > # Clean up > rm filelist > } > > running find /usr/upgrade/files/ | wc -l showed over 64000 files. So > what was happening here is that freebsd-update.sh is doing due > diligence in checking that all the files exist, however it uses the > built in test utility to do so. While in normal situations this may > not be that big of a deal since the changeset is likely to be small, > running stat on 64000 individual files in a loop is rather slow. > > In my case I have just removed the offending code and hoped all went > well, however this is off course not an acceptable solution. I have > not yet come up with an acceptable way to fix the offending code, > hence my post to hackers. I am also not sure as to how I would file a > pr bug report for this situation, any guidance would be greatly > appreciated. You are right, that loop doesn't scale very well. I'm not surprised it is horribly slow with 64000 files on an old laptop that probably has a disk that isn't too fast. It would be much better to generate two lists: - The list of hashes, as already done ("filelist") - A list of gzipped files present, stripped to the hash: (cd files; echo *.gz) | tr ' ' '\n' | sed 's/\.gz$//' > filespresent Note we use "echo" instead of "ls", in order to avoid the kern.argmax limit. 64000 files would certainly exceed that limit. Also note that the output is already sorted because the shell sorts wildcard expansions. Now that we have those two files, we can use comm(1) to find out whether there are any hashes in filelist that are not in filespresent: if [ -n "$(comm -23 filelist filespresent)" ]; then echo -n "Update files missing -- " ... fi That solution scales much better because no shell loop is required at all. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Life is short (You need Python)" -- Bruce Eckel, ANSI C++ Comitee member, author of "Thinking in C++" and "Thinking in Java"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200901221217.n0MCHfY3086653>