From owner-freebsd-fs@FreeBSD.ORG  Fri Apr  5 10:17:29 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 94DDE54C
 for <freebsd-fs@freebsd.org>; Fri,  5 Apr 2013 10:17:29 +0000 (UTC)
 (envelope-from joar.jegleim@gmail.com)
Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com
 [IPv6:2a00:1450:400c:c00::229])
 by mx1.freebsd.org (Postfix) with ESMTP id 1CE8CD27
 for <freebsd-fs@freebsd.org>; Fri,  5 Apr 2013 10:17:28 +0000 (UTC)
Received: by mail-wg0-f41.google.com with SMTP id y10so1579674wgg.4
 for <freebsd-fs@freebsd.org>; Fri, 05 Apr 2013 03:17:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:date:message-id:subject:from:to
 :content-type; bh=FIaHdCygKnUxnqlZp/Lim7gIHyA/vKXHlyZM5AJq8T4=;
 b=DAnc0828ViI2EgghY/wtGxZ08NNKynOyCBxVLRPzMK7Yo9GFwtf3TFWn3fZhHNcxWp
 NCtb/hKw431rJWtN5Foxjtawy7lipzKThRsMsgLEj1v8gDFnxZk4eGAwJgFx1LwtuRXM
 Nmdz7y1WRiAX2s6+IuxBKvYnR9Hkps4n2nhxcSf6+cw2SUVlQh9LCVfhcHniniME966M
 HvTfN4qGzyXordxZmQTbHNvOz65ADSzcQv3Z1rPIWUuZyMU3ftwXrldOtBuD5KCsMfz6
 8Mg36fJIIL8avrdIawkBOdFAHIQH9QNLG1D76aaHaM6YKJJREDy2pkTA1T2NVXyMxJJg
 ODOQ==
MIME-Version: 1.0
X-Received: by 10.194.82.104 with SMTP id h8mr15367296wjy.3.1365157047452;
 Fri, 05 Apr 2013 03:17:27 -0700 (PDT)
Received: by 10.216.34.9 with HTTP; Fri, 5 Apr 2013 03:17:27 -0700 (PDT)
Date: Fri, 5 Apr 2013 12:17:27 +0200
Message-ID: <CAFfb-hpt4iKSb0S2fgQ16Hp51KLWJew1Se32yX1cUPYi6pp72g@mail.gmail.com>
Subject: Regarding regular zfs
From: Joar Jegleim <joar.jegleim@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Apr 2013 10:17:29 -0000

Hi FreeBSD !

I've already sent this one to questions@freebsd.org, but realised this list
would be a better option.

So I've got this setup where we have a storage server delivering about
2 million jpeg's as a backend for a website ( it's ~1TB of data)
The storage server is running zfs and every 15 minutes it does a zfs
send to a 'slave', and our proxy will fail over to the slave if the
main storage server goes down .
I've got this script that initially zfs send's a whole zfs volume, and
for every send after that only sends the diff . So after the initial zfs
send, the diff's usually take less than a minute to send over.

I've had increasing problems on the 'slave', it seem to grind to a
halt for anything between 5-20 seconds after every zfs receive . Everything
on the server halts / hangs completely.

I've had a couple go's on trying to solve / figure out what's
happening without luck, and this 3rd time I've invested even more time
on the problem .

To sum it up:
-Server was initially on 8.2-RELEASE
-I've set some sysctl variables such as:

# 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze'
situations, suspect zfs.arc ate too much memory)
vfs.zfs.arc_max=17179869184

# 8.2 default to 30 here, setting it to 5 which is default from 8.3 and
onwards
vfs.zfs.txg.timeout="5"

# Set TXG write limit to a lower threshold.  This helps "level out"
# the throughput rate (see "zpool iostat").  A value of 256MB works well
# for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on
# disks which have 64 MB cache. <<BR>>
# NOTE: in <v28, this tunable is called 'vfs.zfs.txg.write_limit_override'.
#vfs.zfs.txg.write_limit_override=1073741824 # for 8.2
vfs.zfs.write_limit_override=1073741824 # for 8.3 and above

-I've implemented mbuffer for the zfs send / receive operations. With
mbuffer the sync went a lot faster, but still got the same symptoms
when the zfs receive is done, the hang / unresponsiveness returns for
5-20 seconds
-I've upgraded to 8.3-RELEASE ( + zpool upgrade and zfs upgrade to
V28), same symptoms
-I've upgraded to 9.1-RELEASE, still same symptoms

The period where the server is unresponsive after a zfs receive, I
suspected it would correlate with the amount of data being sent, but
even if there is only a couple MB's data the hang / unresponsiveness
is still substantial .

I suspect it may have something to do with the zfs volume being sent
is mount'ed on the slave, and I'm also doing the backups from the
slave, which means a lot of the time the backup server is rsyncing the
zfs volume being updated.
I've noticed that the unresponsiveness / hang situations occur while
the backupserver is rsync'ing from the zfs volume being updated, when
the backupserver is 'done' and nothing is working with files in the
zfs volume being updated i hardly notice any of the symptoms (mabye
just a minor lag for much less than a second, hardly noticeable) .

So my question(s) to the list would be:
In my setup have I taken the use case for zfs send / receive too far
(?) as in, it's not meant for this kind of syncing and this often, so
there's actually nothing 'wrong'.

-- 
----------------------
Joar Jegleim
Homepage: http://cosmicb.no
Linkedin: http://no.linkedin.com/in/joarjegleim
fb: http://www.facebook.com/joar.jegleim
AKA: CosmicB @Freenode

----------------------