Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Aug 2013 12:40:48 -0700
From:      aurfalien <aurfalien@gmail.com>
To:        Adam Vande More <amvandemore@gmail.com>
Cc:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: copying milllions of small files and millions of dirs
Message-ID:  <9A3518B5-5CF7-4EB7-AC8A-1F2614A6A88B@gmail.com>
In-Reply-To: <CA%2BtpaK232RXLqG1jK5XNT%2BU4rRudaAcz2%2B=ccigVNfNkbGn7gA@mail.gmail.com>
References:  <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <CA%2BtpaK232RXLqG1jK5XNT%2BU4rRudaAcz2%2B=ccigVNfNkbGn7gA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Aug 15, 2013, at 12:36 PM, Adam Vande More wrote:

> On Thu, Aug 15, 2013 at 1:13 PM, aurfalien <aurfalien@gmail.com> wrote:
> Hi all,
> 
> Is there a faster way to copy files over NFS?
> 
> Remove NFS from the setup.  

Yea, your mouth to gods ears.

My BlueArc is an NFS NAS only box.

So no way to get to the data other then NFS.

- aurf
From owner-freebsd-questions@FreeBSD.ORG  Thu Aug 15 19:53:08 2013
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 85B52578
 for <freebsd-questions@freebsd.org>; Thu, 15 Aug 2013 19:53:08 +0000 (UTC)
 (envelope-from cswiger@mac.com)
Received: from st11p05mm-asmtp003.mac.com (st11p05mm-asmtp003.mac.com
 [17.172.108.248])
 by mx1.freebsd.org (Postfix) with ESMTP id 5A4C22616
 for <freebsd-questions@freebsd.org>; Thu, 15 Aug 2013 19:53:08 +0000 (UTC)
Received: from cswiger1.apple.com (unknown [17.209.8.53])
 by st11p05mm-asmtp003.mac.com
 (Oracle Communications Messaging Server 7u4-27.07(7.0.4.27.6) 64bit (built Jun
 21 2013)) with ESMTPSA id <0MRL0032P5S3K940@st11p05mm-asmtp003.mac.com> for
 freebsd-questions@freebsd.org; Thu, 15 Aug 2013 18:52:53 +0000 (GMT)
X-Proofpoint-Virus-Version: vendor=fsecure
 engine=2.50.10432:5.10.8794,1.0.431,0.0.0000
 definitions=2013-08-15_08:2013-08-15,2013-08-15,1970-01-01 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0
 reason=mlx scancount=1 engine=7.0.1-1305240000 definitions=main-1308150123
Content-type: text/plain; charset=us-ascii
MIME-version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: copying milllions of small files and millions of dirs
From: Charles Swiger <cswiger@mac.com>
In-reply-to: <6483A298-6216-4306-913C-B3E0F4A3BC8D@gmail.com>
Date: Thu, 15 Aug 2013 11:52:51 -0700
Content-transfer-encoding: quoted-printable
Message-id: <1F06D736-1019-4223-8546-5DBB0F5D878B@mac.com>
References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com>
 <CC3CFFD3-6742-447B-AA5D-2A4F6C483883@mac.com>
 <6483A298-6216-4306-913C-B3E0F4A3BC8D@gmail.com>
To: aurfalien <aurfalien@gmail.com>
X-Mailer: Apple Mail (2.1508)
Cc: FreeBSD Questions <freebsd-questions@freebsd.org>
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>;
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Aug 2013 19:53:08 -0000

On Aug 15, 2013, at 11:37 AM, aurfalien <aurfalien@gmail.com> wrote:
> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
>> On Aug 15, 2013, at 11:13 AM, aurfalien <aurfalien@gmail.com> wrote:
>>> Is there a faster way to copy files over NFS?
>>=20
>> Probably.
>=20
> Ok, thanks for the specifics.

You're most welcome.

>>> Currently breaking up a simple rsync over 7 or so scripts which =
copies 22 dirs having ~500,000 dirs or files each.
>>=20
>> There's a maximum useful concurrency which depends on how many disk =
spindles and what flavor of RAID is in use; exceeding it will result in =
thrashing the disks and heavily reducing throughput due to competing I/O =
requests.  Try measuring aggregate performance when running fewer rsyncs =
at once and see whether it improves.
>=20
> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL =
and no atime, the server it self has 128GB ECC RAM.  I didn't have time =
to tune or really learn ZFS but at this point its only backing up the =
data for emergency purposes.

OK.  If you've got 7 independent groups and can use separate network =
pipes for each parallel copy, then using 7 simultaneous scripts is =
likely reasonable.

>> Of course, putting half a million files into a single directory level =
is also a bad idea, even with dirhash support.  You'd do better to break =
them up into subdirs containing fewer than ~10K files apiece.
>=20
> I can't, thats our job structure obviously developed by scrip kiddies =
and not systems ppl, but I digress.

Identifying something which is "broken as designed" is still helpful, =
since it indicates what needs to change.

>>> Obviously reading all the meta data is a PITA.
>>=20
>> Yes.
>>=20
>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a =
diff.
>>=20
>> Yeah, probably not-- you're almost certainly I/O bound, not network =
bound.
>=20
> Actually it was network bound via 1 rsync process which is why I broke =
up 154 dirs into 7 batches of 22 each.

Oh.  Um, unless you can make more network bandwidth available, you've =
saturated the bottleneck.
Doing a single copy task is likely to complete faster than splitting up =
the job into subtasks in such a case.

Regards,
--=20
-Chuck




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9A3518B5-5CF7-4EB7-AC8A-1F2614A6A88B>