From owner-freebsd-performance@FreeBSD.ORG Fri Feb 8 09:00:06 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F7AA16A41B; Fri, 8 Feb 2008 09:00:06 +0000 (UTC) (envelope-from erik@cederstrand.dk) Received: from mail.itu.dk (pluto.itu.dk [130.226.142.18]) by mx1.freebsd.org (Postfix) with ESMTP id 34F8B13C4DB; Fri, 8 Feb 2008 09:00:06 +0000 (UTC) (envelope-from erik@cederstrand.dk) Received: from [192.168.1.148] (stud1-15.itu.dk [130.226.140.15]) by mail.itu.dk (Postfix) with ESMTP id AD21636E9D6; Fri, 8 Feb 2008 08:41:05 +0000 (UTC) Message-ID: <47AC15A5.5020009@cederstrand.dk> Date: Fri, 08 Feb 2008 09:41:09 +0100 From: Erik Cederstrand User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: Brooks Davis References: <4796C717.9000507@cederstrand.dk> <20080123193400.N63024@fledge.watson.org> <4797A245.7080202@cederstrand.dk> <20080123202433.E63024@fledge.watson.org> <4797A802.8060509@FreeBSD.org> <47A0BFE7.4070708@cederstrand.dk> <20080130190000.GA18333@lor.one-eyed-alien.net> In-Reply-To: <20080130190000.GA18333@lor.one-eyed-alien.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org, kris@FreeBSD.org Subject: Re: Performance Tracker project update X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2008 09:00:06 -0000 Brooks Davis skrev: > On Wed, Jan 30, 2008 at 07:20:23PM +0100, Erik Cederstrand wrote: >> >> I'd like a situation where I can very quickly set up a slave with a >> specific version of FreeBSD to run additional tests or provide shell access >> to a developer. This currently involves adding an entry to a queue, >> rebooting and waiting 2 minutes. Quick and easy, but the archiving strategy >> is obviously very inefficient. >> >> I'm thinking of a couple of options: >> 1. Having one full install per month and archiving the rest as diffs >> against that by recursively bsdiff'ing every file in the tree (I >> could bsdiff a whole tarball, but bsdiff is very memory-intensive). >> Quick test: 25 mins. >> 2. Make a hash of all files and only store the binaries where the hash >> is different from the monthly tarball. Faster than 1., but less >> effective. Quick test: 5 mins. >> 3. Use some kind of VCS. My experience with Subversion and binary files >> is that it's very slow. >> 4. Throw hardware at the problem. >> >> I'd say it should not take more than 10 mins to recreate an archived >> version. Any thoughts? > > It seems like you should be able to combine 1 and 2 with checksums to > decide if you need to run diffs. I'd think that would be quite fast. I finally got around to testing this, and with a combination of mtree comparing md5 hashes, bsdiff compacting changed files and hardlinking unchanged files I get a reduction in size from 256MB to 10MB. Pretty good, and the whole operation only takes a few minutes. I have one peculiarity, though. I install python2.5 into the directory containing the build, and even though the python version has not changed, I still get mismatching md5 sums on every .pyo and .pyc file. Any thoughts on this? Erik