From owner-freebsd-fs@FreeBSD.ORG Fri Apr 5 12:58:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7ACF44A9 for ; Fri, 5 Apr 2013 12:58:06 +0000 (UTC) (envelope-from joar.jegleim@gmail.com) Received: from mail-we0-x236.google.com (mail-we0-x236.google.com [IPv6:2a00:1450:400c:c03::236]) by mx1.freebsd.org (Postfix) with ESMTP id 15D793FB for ; Fri, 5 Apr 2013 12:58:05 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id k14so2889665wer.27 for ; Fri, 05 Apr 2013 05:58:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=fw7a1NNpMxNFOK950nQlk4YohmMort5XGSjuTBYOk40=; b=tfJV+Q8G6F+G8XGUOg2AyPYT5uMXeWYY4ujJ7mvSOBsLwOrMPnDxq8MeWcHbU/qBTR 3uGtMmBtNo86ti3kPfMX70CByM7bJFioJzRejMsrkqiY65AzdYsEttwzaqffdM9KbHNX ORtCuk55ubS0Uf3lrmFbNfB7dnsoev/h4qK75xJsZpGhghwSOMemc+LnTwgs0SzJJldD YVZC/kwBW1i/vMYpoQinW5uYc9G8xPj0IhhZDxRtMfpEW4wcNat7AvRX8QISUvGmB0ox srEhlZciAzA1iyl3HghbxHwtREUYgAhCpT1ueGNXphc5m31OEJMWKMh0+SOpvkVU8mXk pVYw== MIME-Version: 1.0 X-Received: by 10.194.82.104 with SMTP id h8mr16421522wjy.3.1365166685008; Fri, 05 Apr 2013 05:58:05 -0700 (PDT) Received: by 10.216.34.9 with HTTP; Fri, 5 Apr 2013 05:58:04 -0700 (PDT) In-Reply-To: <8B0FFF01-B8CC-41C0-B0A2-58046EA4E998@my.gd> References: <8B0FFF01-B8CC-41C0-B0A2-58046EA4E998@my.gd> Date: Fri, 5 Apr 2013 14:58:04 +0200 Message-ID: Subject: Re: Regarding regular zfs From: Joar Jegleim To: Damien Fleuriot Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2013 12:58:06 -0000 zpool usage is 9% :) -- ---------------------- Joar Jegleim Homepage: http://cosmicb.no Linkedin: http://no.linkedin.com/in/joarjegleim fb: http://www.facebook.com/joar.jegleim AKA: CosmicB @Freenode ---------------------- On 5 April 2013 13:07, Damien Fleuriot wrote: > > On 5 Apr 2013, at 12:17, Joar Jegleim wrote: > > > Hi FreeBSD ! > > > > I've already sent this one to questions@freebsd.org, but realised this > list > > would be a better option. > > > > So I've got this setup where we have a storage server delivering about > > 2 million jpeg's as a backend for a website ( it's ~1TB of data) > > The storage server is running zfs and every 15 minutes it does a zfs > > send to a 'slave', and our proxy will fail over to the slave if the > > main storage server goes down . > > I've got this script that initially zfs send's a whole zfs volume, and > > for every send after that only sends the diff . So after the initial zfs > > send, the diff's usually take less than a minute to send over. > > > > I've had increasing problems on the 'slave', it seem to grind to a > > halt for anything between 5-20 seconds after every zfs receive . > Everything > > on the server halts / hangs completely. > > > > I've had a couple go's on trying to solve / figure out what's > > happening without luck, and this 3rd time I've invested even more time > > on the problem . > > > > To sum it up: > > -Server was initially on 8.2-RELEASE > > -I've set some sysctl variables such as: > > > > # 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze' > > situations, suspect zfs.arc ate too much memory) > > vfs.zfs.arc_max=17179869184 > > > > # 8.2 default to 30 here, setting it to 5 which is default from 8.3 and > > onwards > > vfs.zfs.txg.timeout="5" > > > > # Set TXG write limit to a lower threshold. This helps "level out" > > # the throughput rate (see "zpool iostat"). A value of 256MB works well > > # for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on > > # disks which have 64 MB cache. <
> > > # NOTE: in 'vfs.zfs.txg.write_limit_override'. > > #vfs.zfs.txg.write_limit_override=1073741824 # for 8.2 > > vfs.zfs.write_limit_override=1073741824 # for 8.3 and above > > > > -I've implemented mbuffer for the zfs send / receive operations. With > > mbuffer the sync went a lot faster, but still got the same symptoms > > when the zfs receive is done, the hang / unresponsiveness returns for > > 5-20 seconds > > -I've upgraded to 8.3-RELEASE ( + zpool upgrade and zfs upgrade to > > V28), same symptoms > > -I've upgraded to 9.1-RELEASE, still same symptoms > > > > The period where the server is unresponsive after a zfs receive, I > > suspected it would correlate with the amount of data being sent, but > > even if there is only a couple MB's data the hang / unresponsiveness > > is still substantial . > > > > I suspect it may have something to do with the zfs volume being sent > > is mount'ed on the slave, and I'm also doing the backups from the > > slave, which means a lot of the time the backup server is rsyncing the > > zfs volume being updated. > > I've noticed that the unresponsiveness / hang situations occur while > > the backupserver is rsync'ing from the zfs volume being updated, when > > the backupserver is 'done' and nothing is working with files in the > > zfs volume being updated i hardly notice any of the symptoms (mabye > > just a minor lag for much less than a second, hardly noticeable) . > > > > So my question(s) to the list would be: > > In my setup have I taken the use case for zfs send / receive too far > > (?) as in, it's not meant for this kind of syncing and this often, so > > there's actually nothing 'wrong'. > > > > -- > > ---------------------- > > Joar Jegleim > > > > Quick and dirty reply, what's your pool usage % ? > > >75-80% an performance takes a dive. > > Let's just make sure you're not there yet. >