From owner-freebsd-questions@FreeBSD.ORG Tue Aug 20 09:19:35 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 00C05753 for ; Tue, 20 Aug 2013 09:19:34 +0000 (UTC) (envelope-from frank2@fjl.co.uk) Received: from bs1.fjl.org.uk (bs1.fjl.org.uk [84.45.41.196]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A5AA12F64 for ; Tue, 20 Aug 2013 09:19:34 +0000 (UTC) Received: from [192.168.1.35] (mux.fjl.org.uk [62.3.120.246]) (authenticated bits=0) by bs1.fjl.org.uk (8.14.4/8.14.4) with ESMTP id r7K9JN7e094632 (version=TLSv1/SSLv3 cipher=DHE-DSS-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 20 Aug 2013 10:19:23 +0100 (BST) (envelope-from frank2@fjl.co.uk) Message-ID: <5213349B.10908@fjl.co.uk> Date: Tue, 20 Aug 2013 10:19:23 +0100 From: Frank Leonhardt User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-questions@freebsd.org Subject: Re: copying milllions of small files and millions of dirs References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <20130816064612.GH1190@petole.demisel.net> <1376934082.25499.11612497.1C73C726@webmail.messagingengine.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Aug 2013 09:19:35 -0000 On 20/08/2013 08:32, krad wrote: > When i migrated a large mailspool in maildir format from the old nfs server > to the new one in a previous job, I 1st generated a list of the top level > maildirs. I then generated the rsync commands + plus a few other bits and > pieces for each maildir to make a single transaction like function. I then > pumped all this auto generated scripts into xjobs and ran them in parallel. > This vastly speeded up the process as sequentially running the tree was far > to slow. THis was for about 15 million maildirs in a hashed structure btw > so a fair amount of files. > > > eg > > find /maildir -type d -maxdepth 4 | while read d > do > r=$(($RANDOM*$RANDOM)) > echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r > echo some other stuff >> /tmp/scripts/$r > done > > ls /tmp/scripts/| while read f > echo /tmp/scripts/$f > done | xjobs -j 20 > This isn't what I'd have expected, as running operations in parallel on mechanical drives would normally result in superfluous head movements and thus exacerbate the I/O bottleneck. The system must be optimising the requests from 20 parallel jobs better than I thought it would to climb out from that hole far enough to get a net benefit. Did you remember how any other approaches performed?