From owner-freebsd-fs@FreeBSD.ORG  Fri Apr  5 11:08:15 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 5D735660
 for <freebsd-fs@freebsd.org>; Fri,  5 Apr 2013 11:08:15 +0000 (UTC)
 (envelope-from ml@my.gd)
Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49])
 by mx1.freebsd.org (Postfix) with ESMTP id EB714F25
 for <freebsd-fs@freebsd.org>; Fri,  5 Apr 2013 11:08:14 +0000 (UTC)
Received: by mail-wg0-f49.google.com with SMTP id e11so3569416wgh.28
 for <freebsd-fs@freebsd.org>; Fri, 05 Apr 2013 04:08:13 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:references:mime-version:in-reply-to:content-type
 :content-transfer-encoding:message-id:cc:x-mailer:from:subject:date
 :to:x-gm-message-state;
 bh=yYrq//B5M4a/o/QH/paWTv2qIzcMza4aAOwK7voYw/M=;
 b=a24pMTUlyLB+hCzPOtNmKJvZMu8Ymj9oEPgdZY6eIhtb20wdqRYu9PLtuD5+N1XFeu
 W5VfqQIDDnswhazOCaVat5MeX7ayICOuz52aPBw86mesg6BieG61RjSbmyvOSBvrWz6f
 a7r7L1xyLjbWAxiXKMIwqpujKKEBXZsrNP99fzhmTMF+BpjFnHlWD+wh0luwsCg2AFkB
 YZcf4ZPtzNvZOR2L4iGpRXfRIZWaw3uCeTC5c9WKiUf3E/O0JgeICQhCwvzWt/MDKhkd
 v7punoD25jAZ5VFFsHVqx1Bg4NRRxRAwWFCzsCL+qNsds7jnq8U1ffWD/WLGC5TDez9Z
 2flw==
X-Received: by 10.180.103.40 with SMTP id ft8mr3358230wib.28.1365160093564;
 Fri, 05 Apr 2013 04:08:13 -0700 (PDT)
Received: from [100.79.118.101] (22.26.90.92.rev.sfr.net. [92.90.26.22])
 by mx.google.com with ESMTPS id du2sm3037095wib.0.2013.04.05.04.08.11
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 05 Apr 2013 04:08:12 -0700 (PDT)
References: <CAFfb-hpt4iKSb0S2fgQ16Hp51KLWJew1Se32yX1cUPYi6pp72g@mail.gmail.com>
Mime-Version: 1.0 (1.0)
In-Reply-To: <CAFfb-hpt4iKSb0S2fgQ16Hp51KLWJew1Se32yX1cUPYi6pp72g@mail.gmail.com>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <8B0FFF01-B8CC-41C0-B0A2-58046EA4E998@my.gd>
X-Mailer: iPhone Mail (10B144)
From: Damien Fleuriot <ml@my.gd>
Subject: Re: Regarding regular zfs
Date: Fri, 5 Apr 2013 13:07:39 +0200
To: Joar Jegleim <joar.jegleim@gmail.com>
X-Gm-Message-State: ALoCoQnqcBcCFnKWpcoeKlA+guiu+z1Kd/t5IIr1ZMJbQI/b/zb+D8tHwHIc9edPb6uWOKPHgaoF
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Apr 2013 11:08:15 -0000


On 5 Apr 2013, at 12:17, Joar Jegleim <joar.jegleim@gmail.com> wrote:

> Hi FreeBSD !
> 
> I've already sent this one to questions@freebsd.org, but realised this list
> would be a better option.
> 
> So I've got this setup where we have a storage server delivering about
> 2 million jpeg's as a backend for a website ( it's ~1TB of data)
> The storage server is running zfs and every 15 minutes it does a zfs
> send to a 'slave', and our proxy will fail over to the slave if the
> main storage server goes down .
> I've got this script that initially zfs send's a whole zfs volume, and
> for every send after that only sends the diff . So after the initial zfs
> send, the diff's usually take less than a minute to send over.
> 
> I've had increasing problems on the 'slave', it seem to grind to a
> halt for anything between 5-20 seconds after every zfs receive . Everything
> on the server halts / hangs completely.
> 
> I've had a couple go's on trying to solve / figure out what's
> happening without luck, and this 3rd time I've invested even more time
> on the problem .
> 
> To sum it up:
> -Server was initially on 8.2-RELEASE
> -I've set some sysctl variables such as:
> 
> # 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze'
> situations, suspect zfs.arc ate too much memory)
> vfs.zfs.arc_max=17179869184
> 
> # 8.2 default to 30 here, setting it to 5 which is default from 8.3 and
> onwards
> vfs.zfs.txg.timeout="5"
> 
> # Set TXG write limit to a lower threshold.  This helps "level out"
> # the throughput rate (see "zpool iostat").  A value of 256MB works well
> # for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on
> # disks which have 64 MB cache. <<BR>>
> # NOTE: in <v28, this tunable is called 'vfs.zfs.txg.write_limit_override'.
> #vfs.zfs.txg.write_limit_override=1073741824 # for 8.2
> vfs.zfs.write_limit_override=1073741824 # for 8.3 and above
> 
> -I've implemented mbuffer for the zfs send / receive operations. With
> mbuffer the sync went a lot faster, but still got the same symptoms
> when the zfs receive is done, the hang / unresponsiveness returns for
> 5-20 seconds
> -I've upgraded to 8.3-RELEASE ( + zpool upgrade and zfs upgrade to
> V28), same symptoms
> -I've upgraded to 9.1-RELEASE, still same symptoms
> 
> The period where the server is unresponsive after a zfs receive, I
> suspected it would correlate with the amount of data being sent, but
> even if there is only a couple MB's data the hang / unresponsiveness
> is still substantial .
> 
> I suspect it may have something to do with the zfs volume being sent
> is mount'ed on the slave, and I'm also doing the backups from the
> slave, which means a lot of the time the backup server is rsyncing the
> zfs volume being updated.
> I've noticed that the unresponsiveness / hang situations occur while
> the backupserver is rsync'ing from the zfs volume being updated, when
> the backupserver is 'done' and nothing is working with files in the
> zfs volume being updated i hardly notice any of the symptoms (mabye
> just a minor lag for much less than a second, hardly noticeable) .
> 
> So my question(s) to the list would be:
> In my setup have I taken the use case for zfs send / receive too far
> (?) as in, it's not meant for this kind of syncing and this often, so
> there's actually nothing 'wrong'.
> 
> -- 
> ----------------------
> Joar Jegleim
> 

Quick and dirty reply, what's your pool usage % ?

>75-80% an performance takes a dive.

Let's just make sure you're not there yet.