Date: Thu, 15 Aug 2013 12:40:48 -0700 From: aurfalien <aurfalien@gmail.com> To: Adam Vande More <amvandemore@gmail.com> Cc: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: copying milllions of small files and millions of dirs Message-ID: <9A3518B5-5CF7-4EB7-AC8A-1F2614A6A88B@gmail.com> In-Reply-To: <CA%2BtpaK232RXLqG1jK5XNT%2BU4rRudaAcz2%2B=ccigVNfNkbGn7gA@mail.gmail.com> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <CA%2BtpaK232RXLqG1jK5XNT%2BU4rRudaAcz2%2B=ccigVNfNkbGn7gA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 15, 2013, at 12:36 PM, Adam Vande More wrote: > On Thu, Aug 15, 2013 at 1:13 PM, aurfalien <aurfalien@gmail.com> wrote: > Hi all, > > Is there a faster way to copy files over NFS? > > Remove NFS from the setup. Yea, your mouth to gods ears. My BlueArc is an NFS NAS only box. So no way to get to the data other then NFS. - aurf From owner-freebsd-questions@FreeBSD.ORG Thu Aug 15 19:53:08 2013 Return-Path: <owner-freebsd-questions@FreeBSD.ORG> Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 85B52578 for <freebsd-questions@freebsd.org>; Thu, 15 Aug 2013 19:53:08 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from st11p05mm-asmtp003.mac.com (st11p05mm-asmtp003.mac.com [17.172.108.248]) by mx1.freebsd.org (Postfix) with ESMTP id 5A4C22616 for <freebsd-questions@freebsd.org>; Thu, 15 Aug 2013 19:53:08 +0000 (UTC) Received: from cswiger1.apple.com (unknown [17.209.8.53]) by st11p05mm-asmtp003.mac.com (Oracle Communications Messaging Server 7u4-27.07(7.0.4.27.6) 64bit (built Jun 21 2013)) with ESMTPSA id <0MRL0032P5S3K940@st11p05mm-asmtp003.mac.com> for freebsd-questions@freebsd.org; Thu, 15 Aug 2013 18:52:53 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794,1.0.431,0.0.0000 definitions=2013-08-15_08:2013-08-15,2013-08-15,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1305240000 definitions=main-1308150123 Content-type: text/plain; charset=us-ascii MIME-version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: copying milllions of small files and millions of dirs From: Charles Swiger <cswiger@mac.com> In-reply-to: <6483A298-6216-4306-913C-B3E0F4A3BC8D@gmail.com> Date: Thu, 15 Aug 2013 11:52:51 -0700 Content-transfer-encoding: quoted-printable Message-id: <1F06D736-1019-4223-8546-5DBB0F5D878B@mac.com> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <CC3CFFD3-6742-447B-AA5D-2A4F6C483883@mac.com> <6483A298-6216-4306-913C-B3E0F4A3BC8D@gmail.com> To: aurfalien <aurfalien@gmail.com> X-Mailer: Apple Mail (2.1508) Cc: FreeBSD Questions <freebsd-questions@freebsd.org> X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions <freebsd-questions.freebsd.org> List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions> List-Post: <mailto:freebsd-questions@freebsd.org> List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help> List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, <mailto:freebsd-questions-request@freebsd.org?subject=subscribe> X-List-Received-Date: Thu, 15 Aug 2013 19:53:08 -0000 On Aug 15, 2013, at 11:37 AM, aurfalien <aurfalien@gmail.com> wrote: > On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote: >> On Aug 15, 2013, at 11:13 AM, aurfalien <aurfalien@gmail.com> wrote: >>> Is there a faster way to copy files over NFS? >>=20 >> Probably. >=20 > Ok, thanks for the specifics. You're most welcome. >>> Currently breaking up a simple rsync over 7 or so scripts which = copies 22 dirs having ~500,000 dirs or files each. >>=20 >> There's a maximum useful concurrency which depends on how many disk = spindles and what flavor of RAID is in use; exceeding it will result in = thrashing the disks and heavily reducing throughput due to competing I/O = requests. Try measuring aggregate performance when running fewer rsyncs = at once and see whether it improves. >=20 > Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL = and no atime, the server it self has 128GB ECC RAM. I didn't have time = to tune or really learn ZFS but at this point its only backing up the = data for emergency purposes. OK. If you've got 7 independent groups and can use separate network = pipes for each parallel copy, then using 7 simultaneous scripts is = likely reasonable. >> Of course, putting half a million files into a single directory level = is also a bad idea, even with dirhash support. You'd do better to break = them up into subdirs containing fewer than ~10K files apiece. >=20 > I can't, thats our job structure obviously developed by scrip kiddies = and not systems ppl, but I digress. Identifying something which is "broken as designed" is still helpful, = since it indicates what needs to change. >>> Obviously reading all the meta data is a PITA. >>=20 >> Yes. >>=20 >>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a = diff. >>=20 >> Yeah, probably not-- you're almost certainly I/O bound, not network = bound. >=20 > Actually it was network bound via 1 rsync process which is why I broke = up 154 dirs into 7 batches of 22 each. Oh. Um, unless you can make more network bandwidth available, you've = saturated the bottleneck. Doing a single copy task is likely to complete faster than splitting up = the job into subtasks in such a case. Regards, --=20 -Chuck
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9A3518B5-5CF7-4EB7-AC8A-1F2614A6A88B>