Date: Thu, 04 Jul 2013 09:49:56 -0400 From: Travis Mikalson <bofh@terranova.net> To: freebsd-fs@freebsd.org Cc: d@delphij.net, kib@freebsd.org Subject: Re: Report: ZFS deadlock in 9-STABLE Message-ID: <51D57D84.8090004@terranova.net> In-Reply-To: <51D575C9.4040402@terranova.net> References: <51D45401.5050801@terranova.net> <51D47A5F.3030501@delphij.net> <51D575C9.4040402@terranova.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Travis Mikalson wrote: > Xin Li wrote: >> Hi, >> >> Sorry for the top posting but I am quite convinced that this is a >> known issue that we have seen with our customer. Please try applying >> this patch [1] and please report back if that fixes your problem. >> >> Note that if you would like to provide more help, we would appreciate >> that you test Konstantin's patch as well, at: > > I will apply both patches and see what happens. It will be a couple of > weeks with no deadlocks before we get an idea if it was effective. (Or, > god forbid, I come back with another different-looking deadlock.) Actually it looks like Konstantin's patch is already incorporated into yours. Konstantin's diff: - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need)) { Your diff: - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need) != 0) { > Thanks! > >> http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html >> >> [1] See attachment; the commit is >> https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6 >> >> Cheers, >> >> On 07/03/13 09:40, Travis Mikalson wrote: >>> Hello, >>> To cut to the chase, I have a procstat -kk -a captured during a >>> livelock for you here: >>> http://tog.net/freebsd/zfsdeadlock-storage1-20130703 >>> The other relevant configurations I could think of to show you are >>> available within that http://tog.net/freebsd/ directory. >>> If you want any additional information that I haven't given here >>> please let me know! >>> This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat >>> May 18 17:41:39 EDT 2013 >>> I didn't see too many relevant ZFS-related fixes after that date so >>> am waiting for another round of interesting commits to update >>> again. >>> Unfortunately, this system has been livelocking on average about >>> once every 7-14 days. Its lot in life is a ZFS storage server >>> serving NFS and istgt traffic. >>> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool >>> looks like this, it has eight 1TB SAS drives and two SSDs being >>> used for log and cache. >>> pool: storage1 state: ONLINE status: The pool is formatted using a >>> legacy on-disk format. The pool can still be used, but some >>> features are unavailable. action: Upgrade the pool using 'zpool >>> upgrade'. Once this is done, the pool will no longer be accessible >>> on software that does not support feature flags. scan: scrub >>> repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 >>> config: >>> NAME STATE READ WRITE CKSUM storage1 ONLINE 0 >>> 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 >>> 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 >>> 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 >>> 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 >>> 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 >>> 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE >>> 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 >>> ONLINE 0 0 0 da9p3 ONLINE 0 0 0 >>> errors: No known data errors
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51D57D84.8090004>