From owner-freebsd-fs@FreeBSD.ORG Fri Apr 5 11:08:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5D735660 for ; Fri, 5 Apr 2013 11:08:15 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49]) by mx1.freebsd.org (Postfix) with ESMTP id EB714F25 for ; Fri, 5 Apr 2013 11:08:14 +0000 (UTC) Received: by mail-wg0-f49.google.com with SMTP id e11so3569416wgh.28 for ; Fri, 05 Apr 2013 04:08:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:references:mime-version:in-reply-to:content-type :content-transfer-encoding:message-id:cc:x-mailer:from:subject:date :to:x-gm-message-state; bh=yYrq//B5M4a/o/QH/paWTv2qIzcMza4aAOwK7voYw/M=; b=a24pMTUlyLB+hCzPOtNmKJvZMu8Ymj9oEPgdZY6eIhtb20wdqRYu9PLtuD5+N1XFeu W5VfqQIDDnswhazOCaVat5MeX7ayICOuz52aPBw86mesg6BieG61RjSbmyvOSBvrWz6f a7r7L1xyLjbWAxiXKMIwqpujKKEBXZsrNP99fzhmTMF+BpjFnHlWD+wh0luwsCg2AFkB YZcf4ZPtzNvZOR2L4iGpRXfRIZWaw3uCeTC5c9WKiUf3E/O0JgeICQhCwvzWt/MDKhkd v7punoD25jAZ5VFFsHVqx1Bg4NRRxRAwWFCzsCL+qNsds7jnq8U1ffWD/WLGC5TDez9Z 2flw== X-Received: by 10.180.103.40 with SMTP id ft8mr3358230wib.28.1365160093564; Fri, 05 Apr 2013 04:08:13 -0700 (PDT) Received: from [100.79.118.101] (22.26.90.92.rev.sfr.net. [92.90.26.22]) by mx.google.com with ESMTPS id du2sm3037095wib.0.2013.04.05.04.08.11 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 05 Apr 2013 04:08:12 -0700 (PDT) References: Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <8B0FFF01-B8CC-41C0-B0A2-58046EA4E998@my.gd> X-Mailer: iPhone Mail (10B144) From: Damien Fleuriot Subject: Re: Regarding regular zfs Date: Fri, 5 Apr 2013 13:07:39 +0200 To: Joar Jegleim X-Gm-Message-State: ALoCoQnqcBcCFnKWpcoeKlA+guiu+z1Kd/t5IIr1ZMJbQI/b/zb+D8tHwHIc9edPb6uWOKPHgaoF Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2013 11:08:15 -0000 On 5 Apr 2013, at 12:17, Joar Jegleim wrote: > Hi FreeBSD ! > > I've already sent this one to questions@freebsd.org, but realised this list > would be a better option. > > So I've got this setup where we have a storage server delivering about > 2 million jpeg's as a backend for a website ( it's ~1TB of data) > The storage server is running zfs and every 15 minutes it does a zfs > send to a 'slave', and our proxy will fail over to the slave if the > main storage server goes down . > I've got this script that initially zfs send's a whole zfs volume, and > for every send after that only sends the diff . So after the initial zfs > send, the diff's usually take less than a minute to send over. > > I've had increasing problems on the 'slave', it seem to grind to a > halt for anything between 5-20 seconds after every zfs receive . Everything > on the server halts / hangs completely. > > I've had a couple go's on trying to solve / figure out what's > happening without luck, and this 3rd time I've invested even more time > on the problem . > > To sum it up: > -Server was initially on 8.2-RELEASE > -I've set some sysctl variables such as: > > # 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze' > situations, suspect zfs.arc ate too much memory) > vfs.zfs.arc_max=17179869184 > > # 8.2 default to 30 here, setting it to 5 which is default from 8.3 and > onwards > vfs.zfs.txg.timeout="5" > > # Set TXG write limit to a lower threshold. This helps "level out" > # the throughput rate (see "zpool iostat"). A value of 256MB works well > # for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on > # disks which have 64 MB cache. <
> > # NOTE: in #vfs.zfs.txg.write_limit_override=1073741824 # for 8.2 > vfs.zfs.write_limit_override=1073741824 # for 8.3 and above > > -I've implemented mbuffer for the zfs send / receive operations. With > mbuffer the sync went a lot faster, but still got the same symptoms > when the zfs receive is done, the hang / unresponsiveness returns for > 5-20 seconds > -I've upgraded to 8.3-RELEASE ( + zpool upgrade and zfs upgrade to > V28), same symptoms > -I've upgraded to 9.1-RELEASE, still same symptoms > > The period where the server is unresponsive after a zfs receive, I > suspected it would correlate with the amount of data being sent, but > even if there is only a couple MB's data the hang / unresponsiveness > is still substantial . > > I suspect it may have something to do with the zfs volume being sent > is mount'ed on the slave, and I'm also doing the backups from the > slave, which means a lot of the time the backup server is rsyncing the > zfs volume being updated. > I've noticed that the unresponsiveness / hang situations occur while > the backupserver is rsync'ing from the zfs volume being updated, when > the backupserver is 'done' and nothing is working with files in the > zfs volume being updated i hardly notice any of the symptoms (mabye > just a minor lag for much less than a second, hardly noticeable) . > > So my question(s) to the list would be: > In my setup have I taken the use case for zfs send / receive too far > (?) as in, it's not meant for this kind of syncing and this often, so > there's actually nothing 'wrong'. > > -- > ---------------------- > Joar Jegleim > Quick and dirty reply, what's your pool usage % ? >75-80% an performance takes a dive. Let's just make sure you're not there yet.