From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 13:50:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 65EBE95A; Thu, 4 Jul 2013 13:50:05 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [216.89.226.5]) by mx1.freebsd.org (Postfix) with ESMTP id 1F98A1784; Thu, 4 Jul 2013 13:50:04 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bmLCv6nbWz7kS; Thu, 4 Jul 2013 09:50:03 -0400 (EDT) Message-ID: <51D57D84.8090004@terranova.net> Date: Thu, 04 Jul 2013 09:49:56 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> <51D47A5F.3030501@delphij.net> <51D575C9.4040402@terranova.net> In-Reply-To: <51D575C9.4040402@terranova.net> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: d@delphij.net, kib@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 13:50:05 -0000 Travis Mikalson wrote: > Xin Li wrote: >> Hi, >> >> Sorry for the top posting but I am quite convinced that this is a >> known issue that we have seen with our customer. Please try applying >> this patch [1] and please report back if that fixes your problem. >> >> Note that if you would like to provide more help, we would appreciate >> that you test Konstantin's patch as well, at: > > I will apply both patches and see what happens. It will be a couple of > weeks with no deadlocks before we get an idea if it was effective. (Or, > god forbid, I come back with another different-looking deadlock.) Actually it looks like Konstantin's patch is already incorporated into yours. Konstantin's diff: - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need)) { Your diff: - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need) != 0) { > Thanks! > >> http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html >> >> [1] See attachment; the commit is >> https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6 >> >> Cheers, >> >> On 07/03/13 09:40, Travis Mikalson wrote: >>> Hello, >>> To cut to the chase, I have a procstat -kk -a captured during a >>> livelock for you here: >>> http://tog.net/freebsd/zfsdeadlock-storage1-20130703 >>> The other relevant configurations I could think of to show you are >>> available within that http://tog.net/freebsd/ directory. >>> If you want any additional information that I haven't given here >>> please let me know! >>> This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat >>> May 18 17:41:39 EDT 2013 >>> I didn't see too many relevant ZFS-related fixes after that date so >>> am waiting for another round of interesting commits to update >>> again. >>> Unfortunately, this system has been livelocking on average about >>> once every 7-14 days. Its lot in life is a ZFS storage server >>> serving NFS and istgt traffic. >>> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool >>> looks like this, it has eight 1TB SAS drives and two SSDs being >>> used for log and cache. >>> pool: storage1 state: ONLINE status: The pool is formatted using a >>> legacy on-disk format. The pool can still be used, but some >>> features are unavailable. action: Upgrade the pool using 'zpool >>> upgrade'. Once this is done, the pool will no longer be accessible >>> on software that does not support feature flags. scan: scrub >>> repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 >>> config: >>> NAME STATE READ WRITE CKSUM storage1 ONLINE 0 >>> 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 >>> 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 >>> 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 >>> 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 >>> 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 >>> 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE >>> 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 >>> ONLINE 0 0 0 da9p3 ONLINE 0 0 0 >>> errors: No known data errors