From owner-freebsd-fs@FreeBSD.ORG Mon Jul 29 10:57:29 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C7F48B8D; Mon, 29 Jul 2013 10:57:29 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id 9A7202CFB; Mon, 29 Jul 2013 10:57:29 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3c3dC60WwHz29B; Mon, 29 Jul 2013 06:57:22 -0400 (EDT) Message-ID: <51F64A81.5010404@terranova.net> Date: Mon, 29 Jul 2013 06:57:05 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: d@delphij.net Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D47A5F.3030501@delphij.net> In-Reply-To: <51D47A5F.3030501@delphij.net> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, kib@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jul 2013 10:57:29 -0000 Xin Li wrote: > Hi, > > Sorry for the top posting but I am quite convinced that this is a > known issue that we have seen with our customer. Please try applying > this patch [1] and please report back if that fixes your problem. It has been 21 days since I booted a kernel with your patch applied and so far so good. This is the longest this system has gone without livelocking before. I'll report back one last time if this system makes it another few weeks without a livelock, since that will be an extremely strong indication that I was having the problem that you seem to have resolved with this patch. > Note that if you would like to provide more help, we would appreciate > that you test Konstantin's patch as well, at: > > http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html > > [1] See attachment; the commit is > https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6 > > Cheers, > > On 07/03/13 09:40, Travis Mikalson wrote: >> Hello, > >> To cut to the chase, I have a procstat -kk -a captured during a >> livelock for you here: >> http://tog.net/freebsd/zfsdeadlock-storage1-20130703 > >> The other relevant configurations I could think of to show you are >> available within that http://tog.net/freebsd/ directory. > >> If you want any additional information that I haven't given here >> please let me know! > >> This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat >> May 18 17:41:39 EDT 2013 > >> I didn't see too many relevant ZFS-related fixes after that date so >> am waiting for another round of interesting commits to update >> again. > >> Unfortunately, this system has been livelocking on average about >> once every 7-14 days. Its lot in life is a ZFS storage server >> serving NFS and istgt traffic. > >> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool >> looks like this, it has eight 1TB SAS drives and two SSDs being >> used for log and cache. > >> pool: storage1 state: ONLINE status: The pool is formatted using a >> legacy on-disk format. The pool can still be used, but some >> features are unavailable. action: Upgrade the pool using 'zpool >> upgrade'. Once this is done, the pool will no longer be accessible >> on software that does not support feature flags. scan: scrub >> repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 >> config: > >> NAME STATE READ WRITE CKSUM storage1 ONLINE 0 >> 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 >> 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 >> 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 >> 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 >> 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 >> 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE >> 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 >> ONLINE 0 0 0 da9p3 ONLINE 0 0 0 > >> errors: No known data errors