From owner-freebsd-questions@FreeBSD.ORG Thu Aug 15 20:40:30 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id DD6ED749 for ; Thu, 15 Aug 2013 20:40:29 +0000 (UTC) (envelope-from aurfalien@gmail.com) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B5B6F2870 for ; Thu, 15 Aug 2013 20:40:29 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id bg4so1109045pad.32 for ; Thu, 15 Aug 2013 13:40:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=cbZMxGpydzQy8FlySMYCnF+J4wX28RAL5q9cBnTXjaA=; b=wbk+Gx9LFrkxAS0Q2j7jB7J6UpfwpGpIFEuL/gS1L9+cgqqqrUiiDfVKgGlr18LWMv jRXEu0jTF1i657moJhgF8oRrJF4f/6HIMxTcfZUzDOL90dATE6FydawbT8SQaftuKcEU qs1jdvsr6OX6QYeVusYe/KXN7DfAIO7p4O8+UQiLfMxQZrMPsYzr/6d8y57gv6lrzIrg j3G2EGzO9JFpwWJSHeHsW5cWpVOmE4MGjXUgou+6HrIVWbSfbVJkLe4dZeaDo2pjtc+N jZ0VEGvo1lOjAGeUqY3Z2Q+kWL8wfgVw04nKRAjuMCq1DC5dvB4gXCobzR4xvTv5QXWE Kbbw== X-Received: by 10.66.122.99 with SMTP id lr3mr24625pab.187.1376599229444; Thu, 15 Aug 2013 13:40:29 -0700 (PDT) Received: from briankrusicw.logan.tv ([64.17.255.138]) by mx.google.com with ESMTPSA id wr9sm1841540pbc.7.2013.08.15.13.40.27 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 15 Aug 2013 13:40:28 -0700 (PDT) Subject: Re: copying milllions of small files and millions of dirs Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: aurfalien In-Reply-To: Date: Thu, 15 Aug 2013 13:40:25 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <8AB33749-728B-48FD-B17F-72FE54BD564A@gmail.com> References: <7E7AEB5A-7102-424E-8B1E-A33E0A2C8B2C@gmail.com> <520D33D6.8050607@fjl.co.uk> To: Charles Swiger X-Mailer: Apple Mail (2.1085) Cc: frank2@fjl.co.uk, freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Aug 2013 20:40:30 -0000 On Aug 15, 2013, at 1:22 PM, Charles Swiger wrote: > [ ...combining replies for brevity... ] >=20 > On Aug 15, 2013, at 1:02 PM, Frank Leonhardt wrote: >> I'm reading all this with interest. The first thing I'd have tried = would be tar (and probably netcat) but I'm a probably bit of a dinosaur. = (If someone wants to buy me some really big drives I promise I'll = update). If it's really NFS or nothing I guess you couldn't open a = socket anyway. >=20 > Either tar via netcat or SSH, or dump / restore via similar pipeline = are quite traditional. tar is more flexible for partial filesystem = copies, whereas the dump / restore is more oriented towards complete = filesystem copies. If the destination starts off empty, they're = probably faster than rsync, but rsync does delta updates which is a huge = win if you're going to be copying changes onto a slightly older version. Yep, so looks like it is what it is as the data set is changing while I = do the base sync. So I'll have to do several more to pick up new comers = etc... > Anyway, you're entirely right that the capabilities of the source = matter a great deal. > If it could do zfs send / receive, or similar snapshot mirroring, that = would likely do better than userland tools. >=20 >> I'd be interested to know whether tar is still worth using in this = world of volume managers and SMP. >=20 > Yes. >=20 > On Aug 15, 2013, at 12:14 PM, aurfalien wrote: > [ ... ] >>>>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of = a diff. >>>>>=20 >>>>> Yeah, probably not-- you're almost certainly I/O bound, not = network bound. >>>>=20 >>>> Actually it was network bound via 1 rsync process which is why I = broke up 154 dirs into 7 batches of 22 each. >>>=20 >>> Oh. Um, unless you can make more network bandwidth available, = you've saturated the bottleneck. >>> Doing a single copy task is likely to complete faster than splitting = up the job into subtasks in such a case. >>=20 >> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts = going were before it was in the 10Ms with 1. >=20 > 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 = MB/s obviously wasn't close saturating a 10Gb link. Cool. Looks like I am doing my best which is what I wanted to know. I = chose to do 7 rsync scripts as it evenly divides into 154 parent dirs :) You should see how our backup system deal with this; Atempo Time = Navigator or Tina as its called. It takes an hour just to lay down the dirs on tape before even starting = to backup, crazyness. And thats just for 1 parent dir having an avg of = 500,000 dirs. Actually I'm prolly wrong as the initial creation is = 125,000 dirs, of which a few are sym links. Then it grows from there. Looking at the Tina stats, we see a million = objects or more. - aurf=