Date: Thu, 15 Aug 2013 13:40:25 -0700 From: aurfalien <aurfalien@gmail.com> To: Charles Swiger <cswiger@mac.com> Cc: frank2@fjl.co.uk, freebsd-questions@freebsd.org Subject: Re: copying milllions of small files and millions of dirs Message-ID: <8AB33749-728B-48FD-B17F-72FE54BD564A@gmail.com> In-Reply-To: <B09E50DD-81F8-4EE6-8295-0DD56A5A97A9@mac.com> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <520D33D6.8050607@fjl.co.uk> <B09E50DD-81F8-4EE6-8295-0DD56A5A97A9@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 15, 2013, at 1:22 PM, Charles Swiger wrote: > [ ...combining replies for brevity... ] >=20 > On Aug 15, 2013, at 1:02 PM, Frank Leonhardt <frank2@fjl.co.uk> wrote: >> I'm reading all this with interest. The first thing I'd have tried = would be tar (and probably netcat) but I'm a probably bit of a dinosaur. = (If someone wants to buy me some really big drives I promise I'll = update). If it's really NFS or nothing I guess you couldn't open a = socket anyway. >=20 > Either tar via netcat or SSH, or dump / restore via similar pipeline = are quite traditional. tar is more flexible for partial filesystem = copies, whereas the dump / restore is more oriented towards complete = filesystem copies. If the destination starts off empty, they're = probably faster than rsync, but rsync does delta updates which is a huge = win if you're going to be copying changes onto a slightly older version. Yep, so looks like it is what it is as the data set is changing while I = do the base sync. So I'll have to do several more to pick up new comers = etc... > Anyway, you're entirely right that the capabilities of the source = matter a great deal. > If it could do zfs send / receive, or similar snapshot mirroring, that = would likely do better than userland tools. >=20 >> I'd be interested to know whether tar is still worth using in this = world of volume managers and SMP. >=20 > Yes. >=20 > On Aug 15, 2013, at 12:14 PM, aurfalien <aurfalien@gmail.com> wrote: > [ ... ] >>>>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of = a diff. >>>>>=20 >>>>> Yeah, probably not-- you're almost certainly I/O bound, not = network bound. >>>>=20 >>>> Actually it was network bound via 1 rsync process which is why I = broke up 154 dirs into 7 batches of 22 each. >>>=20 >>> Oh. Um, unless you can make more network bandwidth available, = you've saturated the bottleneck. >>> Doing a single copy task is likely to complete faster than splitting = up the job into subtasks in such a case. >>=20 >> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts = going were before it was in the 10Ms with 1. >=20 > 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 = MB/s obviously wasn't close saturating a 10Gb link. Cool. Looks like I am doing my best which is what I wanted to know. I = chose to do 7 rsync scripts as it evenly divides into 154 parent dirs :) You should see how our backup system deal with this; Atempo Time = Navigator or Tina as its called. It takes an hour just to lay down the dirs on tape before even starting = to backup, crazyness. And thats just for 1 parent dir having an avg of = 500,000 dirs. Actually I'm prolly wrong as the initial creation is = 125,000 dirs, of which a few are sym links. Then it grows from there. Looking at the Tina stats, we see a million = objects or more. - aurf=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8AB33749-728B-48FD-B17F-72FE54BD564A>