From owner-freebsd-questions@FreeBSD.ORG  Thu Aug 15 19:14:13 2013
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 8B56EBE0
 for <freebsd-questions@freebsd.org>; Thu, 15 Aug 2013 19:14:13 +0000 (UTC)
 (envelope-from aurfalien@gmail.com)
Received: from mail-pb0-x22f.google.com (mail-pb0-x22f.google.com
 [IPv6:2607:f8b0:400e:c01::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 623F723F3
 for <freebsd-questions@freebsd.org>; Thu, 15 Aug 2013 19:14:13 +0000 (UTC)
Received: by mail-pb0-f47.google.com with SMTP id rr4so1122566pbb.34
 for <freebsd-questions@freebsd.org>; Thu, 15 Aug 2013 12:14:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=xD4A6ATGycvpkWoMXWDQxJ6nd14arcZNgFovlvuU93Y=;
 b=iHqZqPMEfv1ygWe2/vXTc1wfHfNr97vT5q7oIfA4r/euQw7BJBbnq+/XGtgawPxdLC
 h8BeoqRn02s4E51l3iwM6y+pvtxyvHrM67jGJ3bfykrg6so/NvFEb7juJtrsysSxPSfa
 BJPDEokkcWpnN7/fTNsnrsT5hZApVrK4yLJHLIHeegsb/wBbbOANIPddXOogI4Hn5So6
 rLMB5Xgg7w44yd6fDG0ZqIoz8vNJZlnuGaG9gYjywztFNG7Rjd8jixMB7QLX1cM0lkJp
 uo5QKvfnLe5ZsNjvhelt+fnBd3HKRXzrHBRql1K1mC063aMBUKrHGmpRUw73Z6bQSKQP
 10fA==
X-Received: by 10.68.195.193 with SMTP id ig1mr3664042pbc.176.1376594051548;
 Thu, 15 Aug 2013 12:14:11 -0700 (PDT)
Received: from briankrusicw.logan.tv ([64.17.255.138])
 by mx.google.com with ESMTPSA id xe9sm1631327pab.0.2013.08.15.12.14.09
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 15 Aug 2013 12:14:10 -0700 (PDT)
Subject: Re: copying milllions of small files and millions of dirs
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=us-ascii
From: aurfalien <aurfalien@gmail.com>
In-Reply-To: <1F06D736-1019-4223-8546-5DBB0F5D878B@mac.com>
Date: Thu, 15 Aug 2013 12:14:08 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <611B3931-958B-4A46-A6BD-1CA541F32699@gmail.com>
References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com>
 <CC3CFFD3-6742-447B-AA5D-2A4F6C483883@mac.com>
 <6483A298-6216-4306-913C-B3E0F4A3BC8D@gmail.com>
 <1F06D736-1019-4223-8546-5DBB0F5D878B@mac.com>
To: Charles Swiger <cswiger@mac.com>
X-Mailer: Apple Mail (2.1085)
Cc: FreeBSD Questions <freebsd-questions@freebsd.org>
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Aug 2013 19:14:13 -0000


On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote:

> On Aug 15, 2013, at 11:37 AM, aurfalien <aurfalien@gmail.com> wrote:
>> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
>>> On Aug 15, 2013, at 11:13 AM, aurfalien <aurfalien@gmail.com> wrote:
>>>> Is there a faster way to copy files over NFS?
>>>=20
>>> Probably.
>>=20
>> Ok, thanks for the specifics.
>=20
> You're most welcome.
>=20
>>>> Currently breaking up a simple rsync over 7 or so scripts which =
copies 22 dirs having ~500,000 dirs or files each.
>>>=20
>>> There's a maximum useful concurrency which depends on how many disk =
spindles and what flavor of RAID is in use; exceeding it will result in =
thrashing the disks and heavily reducing throughput due to competing I/O =
requests.  Try measuring aggregate performance when running fewer rsyncs =
at once and see whether it improves.
>>=20
>> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL =
and no atime, the server it self has 128GB ECC RAM.  I didn't have time =
to tune or really learn ZFS but at this point its only backing up the =
data for emergency purposes.
>=20
> OK.  If you've got 7 independent groups and can use separate network =
pipes for each parallel copy, then using 7 simultaneous scripts is =
likely reasonable.
>=20
>>> Of course, putting half a million files into a single directory =
level is also a bad idea, even with dirhash support.  You'd do better to =
break them up into subdirs containing fewer than ~10K files apiece.
>>=20
>> I can't, thats our job structure obviously developed by scrip kiddies =
and not systems ppl, but I digress.
>=20
> Identifying something which is "broken as designed" is still helpful, =
since it indicates what needs to change.
>=20
>>>> Obviously reading all the meta data is a PITA.
>>>=20
>>> Yes.
>>>=20
>>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a =
diff.
>>>=20
>>> Yeah, probably not-- you're almost certainly I/O bound, not network =
bound.
>>=20
>> Actually it was network bound via 1 rsync process which is why I =
broke up 154 dirs into 7 batches of 22 each.
>=20
> Oh.  Um, unless you can make more network bandwidth available, you've =
saturated the bottleneck.
> Doing a single copy task is likely to complete faster than splitting =
up the job into subtasks in such a case.

Well, using iftop, I am now at least able to get ~1Gb with 7 scripts =
going were before it was in the 10Ms with 1.

Also, physically looking at my ZFS server, it now shows the drives =
lights are blinking faster, like every second.  Were as before it was =
sort of seldom, like every 3 seconds or so.

I was thinking to perhaps zip dirs up and then xfer the file over but it =
would prolly take as long to zip/unzip.

This bloody project structure we have is nuts.

- aurf=