Date: Thu, 15 Aug 2013 13:22:59 -0700 From: Charles Swiger <cswiger@mac.com> To: frank2@fjl.co.uk, aurfalien <aurfalien@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: copying milllions of small files and millions of dirs Message-ID: <B09E50DD-81F8-4EE6-8295-0DD56A5A97A9@mac.com> In-Reply-To: <520D33D6.8050607@fjl.co.uk> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <520D33D6.8050607@fjl.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
[ ...combining replies for brevity... ] On Aug 15, 2013, at 1:02 PM, Frank Leonhardt <frank2@fjl.co.uk> wrote: > I'm reading all this with interest. The first thing I'd have tried = would be tar (and probably netcat) but I'm a probably bit of a dinosaur. = (If someone wants to buy me some really big drives I promise I'll = update). If it's really NFS or nothing I guess you couldn't open a = socket anyway. Either tar via netcat or SSH, or dump / restore via similar pipeline are = quite traditional. tar is more flexible for partial filesystem copies, = whereas the dump / restore is more oriented towards complete filesystem = copies. If the destination starts off empty, they're probably faster = than rsync, but rsync does delta updates which is a huge win if you're = going to be copying changes onto a slightly older version. Anyway, you're entirely right that the capabilities of the source matter = a great deal. If it could do zfs send / receive, or similar snapshot mirroring, that = would likely do better than userland tools. > I'd be interested to know whether tar is still worth using in this = world of volume managers and SMP. Yes. On Aug 15, 2013, at 12:14 PM, aurfalien <aurfalien@gmail.com> wrote: [ ... ] >>>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of = a diff. >>>>=20 >>>> Yeah, probably not-- you're almost certainly I/O bound, not network = bound. >>>=20 >>> Actually it was network bound via 1 rsync process which is why I = broke up 154 dirs into 7 batches of 22 each. >>=20 >> Oh. Um, unless you can make more network bandwidth available, you've = saturated the bottleneck. >> Doing a single copy task is likely to complete faster than splitting = up the job into subtasks in such a case. >=20 > Well, using iftop, I am now at least able to get ~1Gb with 7 scripts = going were before it was in the 10Ms with 1. 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s = obviously wasn't close saturating a 10Gb link. Regards, --=20 -Chuck
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B09E50DD-81F8-4EE6-8295-0DD56A5A97A9>