From owner-freebsd-fs@FreeBSD.ORG Mon Apr 8 08:29:53 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EA76F727 for ; Mon, 8 Apr 2013 08:29:53 +0000 (UTC) (envelope-from joar.jegleim@gmail.com) Received: from mail-we0-x22f.google.com (mail-we0-x22f.google.com [IPv6:2a00:1450:400c:c03::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 73D11A47 for ; Mon, 8 Apr 2013 08:29:53 +0000 (UTC) Received: by mail-we0-f175.google.com with SMTP id t11so4322753wey.6 for ; Mon, 08 Apr 2013 01:29:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=xOaa7inGBn0tYXLgJYvF23Pb9Qn5rd1eQTJ10KIJ7lc=; b=IOcoQ6o88vafjnYr0jpF/vCGqNLsCrXI8Ar5S0H3fPp9WqWnhV68D+Vyl+TEijIt5T 8RbzuAkW2xbF0gC4a0MCWki+gD5y9I3+u0VVx5SlsuWH4V0xDMmP1nzM41qeOivmMmP7 AYihZumlKDxKOjaBDOhD3Q8vlXWWEqwwf2whzZe+WdhMcW20cMURCYgL9rJfc1YKZ2XG dKZS26CB1+4BswYLLY+de5Tu8wCDWvRB4Fc0/UXWYniGhP0DZFjd7l1IMBA1gVdOuZqe 2P1MP4svnvCnd7/eNCb4xF4bLSeVnWpzpVCLD0lO0/l4VGKp5P4Agg4+1HRFjae88wT1 LSXw== MIME-Version: 1.0 X-Received: by 10.180.77.66 with SMTP id q2mr11219345wiw.13.1365409792393; Mon, 08 Apr 2013 01:29:52 -0700 (PDT) Received: by 10.216.34.9 with HTTP; Mon, 8 Apr 2013 01:29:52 -0700 (PDT) In-Reply-To: <20130405211249.GB31958@server.rulingia.com> References: <20130405211249.GB31958@server.rulingia.com> Date: Mon, 8 Apr 2013 10:29:52 +0200 Message-ID: Subject: Re: Regarding regular zfs From: Joar Jegleim To: Peter Jeremy Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Apr 2013 08:29:54 -0000 [...]"Are you deleting old snapshots after the newer snapshots have been sent?"[...] yeah, the script deletes old snapshots. The slave will usually hold 2 snapshots ( 1 being the initial snapshot received via zfs send from master, 2nd being the latest snapshot received from master) . [...]"Can you clarify which machine you mean by server in the last line above. I presume you mean the slave machine running "zfs recv". If you monitor the "server" with "vmstat -v 1", "gstat -a" and "zfs-mon -a" (the latter is part of ports/sysutils/zfs-stats) during the "freeze", what do you see? Are the disks saturated or idle? Are the "cache" or "free" values close to zero?" [...] The last line "Everything on the server halts / hangs completely." I'm talking about the 'slave' (the receiving end) I'll check how cache is doing, but as I wrote in my previous reply, the 'slave' server is completely unresponsive, nothing works at all for 5-15 seconds, when the server is responsive again (can ssh in and so on) I can't seem to find anything in dmesg or any log hinting about anything at all that went 'wrong' . "There was a bug in interface between ZFS ARC and FreeBSD VM that resulted in ARC starvation. This was fixed between 8.2 and 8.3/9.0." ah, ok . "Do you have atime enabled or disabled? What happens when you don't run rsync at the same time? Are you able to break into DDB?" atime is disabled. When I don't run rsync the server seem ok, I've tried to detect any hang (as in I ssh into the server and issue various commands such as top, ls and so on) while not rsync'ing and there might have been a really minor 'glitch' but it was hardly noticeable at all, and nothing compared to those 5-15 seconds when the backup server is doing the rsync (from the live volume, not a snapshot) . I could try DDB, I'm gonna have to get back to you on that, I haven't debug'ed FreeBSD kernel before and the system is in production, so I would have to be cautious. I might be able to try out that during this week . [...]Apart from the rsync whilst receiving, everything sounds OK. It's possible that the rsync whilst receiving is triggering a bug.[...] I sort of think so too, at least since the whole OS is unresponsive / hang for anything from 5-15 seconds . -- ---------------------- Joar Jegleim Homepage: http://cosmicb.no Linkedin: http://no.linkedin.com/in/joarjegleim fb: http://www.facebook.com/joar.jegleim AKA: CosmicB @Freenode ---------------------- On 5 April 2013 23:12, Peter Jeremy wrote: > On 2013-Apr-05 12:17:27 +0200, Joar Jegleim > wrote: > >I've got this script that initially zfs send's a whole zfs volume, and > >for every send after that only sends the diff . So after the initial zfs > >send, the diff's usually take less than a minute to send over. > > Are you deleting old snapshots after the newer snapshots have been sent? > > >I've had increasing problems on the 'slave', it seem to grind to a > >halt for anything between 5-20 seconds after every zfs receive . > Everything > >on the server halts / hangs completely. > > Can you clarify which machine you mean by server in the last line above. > I presume you mean the slave machine running "zfs recv". > > If you monitor the "server" with "vmstat -v 1", "gstat -a" and "zfs-mon -a" > (the latter is part of ports/sysutils/zfs-stats) during the "freeze", > what do you see? Are the disks saturated or idle? Are the "cache" or > "free" values close to zero? > > ># 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze' > >situations, suspect zfs.arc ate too much memory) > > There was a bug in interface between ZFS ARC and FreeBSD VM that resulted > in ARC starvation. This was fixed between 8.2 and 8.3/9.0. > > >I suspect it may have something to do with the zfs volume being sent > >is mount'ed on the slave, and I'm also doing the backups from the > >slave, which means a lot of the time the backup server is rsyncing the > >zfs volume being updated. > > Do you have atime enabled or disabled? What happens when you don't run > rsync at the same time? > > Are you able to break into DDB? > > >In my setup have I taken the use case for zfs send / receive too far > >(?) as in, it's not meant for this kind of syncing and this often, so > >there's actually nothing 'wrong'. > > Apart from the rsync whilst receiving, everything sounds OK. It's > possible that the rsync whilst receiving is triggering a bug. > > -- > Peter Jeremy >