Date: Tue, 20 Aug 2013 10:19:23 +0100 From: Frank Leonhardt <frank2@fjl.co.uk> To: freebsd-questions@freebsd.org Subject: Re: copying milllions of small files and millions of dirs Message-ID: <5213349B.10908@fjl.co.uk> In-Reply-To: <CALfReyeWxHjmqXhWiK4jbCvh3MktqKqnTBQjYgC0wDTgBcK5jg@mail.gmail.com> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <20130816064612.GH1190@petole.demisel.net> <1376934082.25499.11612497.1C73C726@webmail.messagingengine.com> <B629E9F8-C01A-4D09-8054-C63F69846F5C@gmail.com> <CALfReyeWxHjmqXhWiK4jbCvh3MktqKqnTBQjYgC0wDTgBcK5jg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 20/08/2013 08:32, krad wrote: > When i migrated a large mailspool in maildir format from the old nfs server > to the new one in a previous job, I 1st generated a list of the top level > maildirs. I then generated the rsync commands + plus a few other bits and > pieces for each maildir to make a single transaction like function. I then > pumped all this auto generated scripts into xjobs and ran them in parallel. > This vastly speeded up the process as sequentially running the tree was far > to slow. THis was for about 15 million maildirs in a hashed structure btw > so a fair amount of files. > > > eg > > find /maildir -type d -maxdepth 4 | while read d > do > r=$(($RANDOM*$RANDOM)) > echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r > echo some other stuff >> /tmp/scripts/$r > done > > ls /tmp/scripts/| while read f > echo /tmp/scripts/$f > done | xjobs -j 20 > This isn't what I'd have expected, as running operations in parallel on mechanical drives would normally result in superfluous head movements and thus exacerbate the I/O bottleneck. The system must be optimising the requests from 20 parallel jobs better than I thought it would to climb out from that hole far enough to get a net benefit. Did you remember how any other approaches performed?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5213349B.10908>