From owner-freebsd-stable@FreeBSD.ORG Mon Aug 23 23:37:20 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 676971065672 for ; Mon, 23 Aug 2010 23:37:20 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 285518FC13 for ; Mon, 23 Aug 2010 23:37:20 +0000 (UTC) Received: from [2a01:348:132:51::10] (helo=carrick-users) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1OngZr-000ENq-Pg; Tue, 24 Aug 2010 00:37:19 +0100 Received: (from tdb@localhost) by carrick-users (8.14.4/8.14.4/Submit) id o7NNbJUe055293; Tue, 24 Aug 2010 00:37:19 +0100 (BST) (envelope-from tdb) Date: Tue, 24 Aug 2010 00:37:19 +0100 From: Tim Bishop To: Dan Nelson Message-ID: <20100823233719.GB5352@carrick-users.bishnet.net> References: <20100821220435.GA6208@carrick-users.bishnet.net> <20100821222429.GB73221@dan.emsphone.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100821222429.GB73221@dan.emsphone.com> X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-stable@freebsd.org Subject: Re: 8.1R ZFS almost locking up system X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2010 23:37:20 -0000 On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote: > In the last episode (Aug 21), Tim Bishop said: > > I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems > > that ZFS gets in to an almost unresponsive state. Last time it did it > > (two weeks ago) I couldn't even log in, although the system was up, this > > time I could manage a reboot but couldn't stop any applications (they > > were likely hanging on I/O). > > Could your pool be very close to full? Zfs will throttle itself when it's > almost out of disk space. I know it's "saved" me from filling up my > filesystems a couple times :) It's not close to full, so I don't think that's the issue. > > A few items from top, including zfskern: > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > 5 root 4 -8 - 0K 60K zio->i 0 54:38 3.47% zfskern > > 91775 70 1 44 0 53040K 31144K tx->tx 1 2:11 0.00% postgres > > 39661 tdb 1 44 0 55776K 32968K tx->tx 0 0:39 0.00% mutt > > 14828 root 1 47 0 14636K 1572K tx->tx 1 0:03 0.00% zfs > > 11188 root 1 51 0 14636K 1572K tx->tx 0 0:03 0.00% zfs > > > > At some point during this process my zfs snapshots have been failing to > > complete: > > > > root 5 0.8 0.0 0 60 ?? DL 7Aug10 54:43.83 [zfskern] > > root 8265 0.0 0.0 14636 1528 ?? D 10:00AM 0:03.12 zfs snapshot -r pool0@2010-08-21_10:00:01--1d > > root 11188 0.0 0.1 14636 1572 ?? D 11:00AM 0:02.93 zfs snapshot -r pool0@2010-08-21_11:00:01--1d > > root 14828 0.0 0.1 14636 1572 ?? D 12:00PM 0:03.04 zfs snapshot -r pool0@2010-08-21_12:00:00--1d > > root 17862 0.0 0.1 14636 1572 ?? D 1:00PM 0:01.96 zfs snapshot -r pool0@2010-08-21_13:00:01--1d > > root 20986 0.0 0.1 14636 1572 ?? D 2:00PM 0:02.07 zfs snapshot -r pool0@2010-08-21_14:00:01--1d > > procstat -k on some of these processes might help to pinpoint what part of > the zfs code they're all waiting in. I'll do that. Thanks for the pointer :-) Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984