From owner-freebsd-questions@FreeBSD.ORG Thu Aug 15 21:21:55 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 412EC1CA for ; Thu, 15 Aug 2013 21:21:55 +0000 (UTC) (envelope-from iamatt@gmail.com) Received: from mail-we0-x22f.google.com (mail-we0-x22f.google.com [IPv6:2a00:1450:400c:c03::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C10A22A97 for ; Thu, 15 Aug 2013 21:21:54 +0000 (UTC) Received: by mail-we0-f175.google.com with SMTP id q58so992057wes.6 for ; Thu, 15 Aug 2013 14:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ekC369ntnDcMXyAVMTTOzOUAGTZRzrZTeHmBQ6fAzCo=; b=PjuW/ujrMEwnrD1owxm+W2XFBfs+5SblmDW3bC8h7y0QWz1DePi5eMu3FYtAQPeIVY llmYzf/gnEcOgyN7T6lN5lp/1AXZ+FNJ+A1kI+NOJ7nnXZCo7DPh2p2vuqPC+ptIZ3Wu 6H1MsP4TN7Nh3aj1D5DliSRC0wJ2JApja+s0mUDBHIYmBiEnS6sLOPaMVB9O+a8jFxgr mgUTtx/YVi292A4+n/cnLi0yaXUJGQ06A4d3Wc+VjvZ/qBF5JLm9BP9HR66k3FDEi18v tRlbNSZV+l2nFYSh71dtjgtvpKclCkursuoWgkrLUpAyB0FAZ6Xi2mYZSLY3Kxeupa4B a9aw== MIME-Version: 1.0 X-Received: by 10.180.37.164 with SMTP id z4mr3067872wij.30.1376601712865; Thu, 15 Aug 2013 14:21:52 -0700 (PDT) Received: by 10.217.50.196 with HTTP; Thu, 15 Aug 2013 14:21:52 -0700 (PDT) In-Reply-To: <611B3931-958B-4A46-A6BD-1CA541F32699@gmail.com> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <6483A298-6216-4306-913C-B3E0F4A3BC8D@gmail.com> <1F06D736-1019-4223-8546-5DBB0F5D878B@mac.com> <611B3931-958B-4A46-A6BD-1CA541F32699@gmail.com> Date: Thu, 15 Aug 2013 16:21:52 -0500 Message-ID: Subject: Re: copying milllions of small files and millions of dirs From: iamatt To: aurfalien Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Questions X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Aug 2013 21:21:55 -0000 I would use ndmp. That is how we archive our nas crap isilon stuff but we have the backend accelerators Not sure if there is ndmp for FreeBSD. Like another poster said you are most likely i/o bound anyway. On Thu, Aug 15, 2013 at 2:14 PM, aurfalien wrote: > > On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote: > > > On Aug 15, 2013, at 11:37 AM, aurfalien wrote: > >> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote: > >>> On Aug 15, 2013, at 11:13 AM, aurfalien wrote: > >>>> Is there a faster way to copy files over NFS? > >>> > >>> Probably. > >> > >> Ok, thanks for the specifics. > > > > You're most welcome. > > > >>>> Currently breaking up a simple rsync over 7 or so scripts which > copies 22 dirs having ~500,000 dirs or files each. > >>> > >>> There's a maximum useful concurrency which depends on how many disk > spindles and what flavor of RAID is in use; exceeding it will result in > thrashing the disks and heavily reducing throughput due to competing I/O > requests. Try measuring aggregate performance when running fewer rsyncs at > once and see whether it improves. > >> > >> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL > and no atime, the server it self has 128GB ECC RAM. I didn't have time to > tune or really learn ZFS but at this point its only backing up the data for > emergency purposes. > > > > OK. If you've got 7 independent groups and can use separate network > pipes for each parallel copy, then using 7 simultaneous scripts is likely > reasonable. > > > >>> Of course, putting half a million files into a single directory level > is also a bad idea, even with dirhash support. You'd do better to break > them up into subdirs containing fewer than ~10K files apiece. > >> > >> I can't, thats our job structure obviously developed by scrip kiddies > and not systems ppl, but I digress. > > > > Identifying something which is "broken as designed" is still helpful, > since it indicates what needs to change. > > > >>>> Obviously reading all the meta data is a PITA. > >>> > >>> Yes. > >>> > >>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a > diff. > >>> > >>> Yeah, probably not-- you're almost certainly I/O bound, not network > bound. > >> > >> Actually it was network bound via 1 rsync process which is why I broke > up 154 dirs into 7 batches of 22 each. > > > > Oh. Um, unless you can make more network bandwidth available, you've > saturated the bottleneck. > > Doing a single copy task is likely to complete faster than splitting up > the job into subtasks in such a case. > > Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going > were before it was in the 10Ms with 1. > > Also, physically looking at my ZFS server, it now shows the drives lights > are blinking faster, like every second. Were as before it was sort of > seldom, like every 3 seconds or so. > > I was thinking to perhaps zip dirs up and then xfer the file over but it > would prolly take as long to zip/unzip. > > This bloody project structure we have is nuts. > > - aurf > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to " > freebsd-questions-unsubscribe@freebsd.org" >