Date: Thu, 15 Aug 2013 12:14:08 -0700 From: aurfalien <aurfalien@gmail.com> To: Charles Swiger <cswiger@mac.com> Cc: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: copying milllions of small files and millions of dirs Message-ID: <611B3931-958B-4A46-A6BD-1CA541F32699@gmail.com> In-Reply-To: <1F06D736-1019-4223-8546-5DBB0F5D878B@mac.com> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <CC3CFFD3-6742-447B-AA5D-2A4F6C483883@mac.com> <6483A298-6216-4306-913C-B3E0F4A3BC8D@gmail.com> <1F06D736-1019-4223-8546-5DBB0F5D878B@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote: > On Aug 15, 2013, at 11:37 AM, aurfalien <aurfalien@gmail.com> wrote: >> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote: >>> On Aug 15, 2013, at 11:13 AM, aurfalien <aurfalien@gmail.com> wrote: >>>> Is there a faster way to copy files over NFS? >>>=20 >>> Probably. >>=20 >> Ok, thanks for the specifics. >=20 > You're most welcome. >=20 >>>> Currently breaking up a simple rsync over 7 or so scripts which = copies 22 dirs having ~500,000 dirs or files each. >>>=20 >>> There's a maximum useful concurrency which depends on how many disk = spindles and what flavor of RAID is in use; exceeding it will result in = thrashing the disks and heavily reducing throughput due to competing I/O = requests. Try measuring aggregate performance when running fewer rsyncs = at once and see whether it improves. >>=20 >> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL = and no atime, the server it self has 128GB ECC RAM. I didn't have time = to tune or really learn ZFS but at this point its only backing up the = data for emergency purposes. >=20 > OK. If you've got 7 independent groups and can use separate network = pipes for each parallel copy, then using 7 simultaneous scripts is = likely reasonable. >=20 >>> Of course, putting half a million files into a single directory = level is also a bad idea, even with dirhash support. You'd do better to = break them up into subdirs containing fewer than ~10K files apiece. >>=20 >> I can't, thats our job structure obviously developed by scrip kiddies = and not systems ppl, but I digress. >=20 > Identifying something which is "broken as designed" is still helpful, = since it indicates what needs to change. >=20 >>>> Obviously reading all the meta data is a PITA. >>>=20 >>> Yes. >>>=20 >>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a = diff. >>>=20 >>> Yeah, probably not-- you're almost certainly I/O bound, not network = bound. >>=20 >> Actually it was network bound via 1 rsync process which is why I = broke up 154 dirs into 7 batches of 22 each. >=20 > Oh. Um, unless you can make more network bandwidth available, you've = saturated the bottleneck. > Doing a single copy task is likely to complete faster than splitting = up the job into subtasks in such a case. Well, using iftop, I am now at least able to get ~1Gb with 7 scripts = going were before it was in the 10Ms with 1. Also, physically looking at my ZFS server, it now shows the drives = lights are blinking faster, like every second. Were as before it was = sort of seldom, like every 3 seconds or so. I was thinking to perhaps zip dirs up and then xfer the file over but it = would prolly take as long to zip/unzip. This bloody project structure we have is nuts. - aurf=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?611B3931-958B-4A46-A6BD-1CA541F32699>