Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 04 Jul 2013 09:49:56 -0400
From:      Travis Mikalson <bofh@terranova.net>
To:        freebsd-fs@freebsd.org
Cc:        d@delphij.net, kib@freebsd.org
Subject:   Re: Report: ZFS deadlock in 9-STABLE
Message-ID:  <51D57D84.8090004@terranova.net>
In-Reply-To: <51D575C9.4040402@terranova.net>
References:  <51D45401.5050801@terranova.net> <51D47A5F.3030501@delphij.net> <51D575C9.4040402@terranova.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Travis Mikalson wrote:
> Xin Li wrote:
>> Hi,
>>
>> Sorry for the top posting but I am quite convinced that this is a
>> known issue that we have seen with our customer.  Please try applying
>> this patch [1] and please report back if that fixes your problem.
>>
>> Note that if you would like to provide more help, we would appreciate
>> that you test Konstantin's patch as well, at:
> 
> I will apply both patches and see what happens. It will be a couple of
> weeks with no deadlocks before we get an idea if it was effective. (Or,
> god forbid, I come back with another different-looking deadlock.)

Actually it looks like Konstantin's patch is already incorporated into
yours. Konstantin's diff:

-               while (ithd->it_need) {
+               while (atomic_load_acq_int(&ithd->it_need)) {

Your diff:

-               while (ithd->it_need) {
+               while (atomic_load_acq_int(&ithd->it_need) != 0) {

> Thanks!
> 
>> http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html
>>
>> [1] See attachment; the commit is
>> https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6
>>
>> Cheers,
>>
>> On 07/03/13 09:40, Travis Mikalson wrote:
>>> Hello,
>>> To cut to the chase, I have a procstat -kk -a captured during a
>>> livelock for you here: 
>>> http://tog.net/freebsd/zfsdeadlock-storage1-20130703
>>> The other relevant configurations I could think of to show you are 
>>> available within that http://tog.net/freebsd/ directory.
>>> If you want any additional information that I haven't given here
>>> please let me know!
>>> This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat
>>> May 18 17:41:39 EDT 2013
>>> I didn't see too many relevant ZFS-related fixes after that date so
>>> am waiting for another round of interesting commits to update
>>> again.
>>> Unfortunately, this system has been livelocking on average about
>>> once every 7-14 days. Its lot in life is a ZFS storage server
>>> serving NFS and istgt traffic.
>>> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool
>>> looks like this, it has eight 1TB SAS drives and two SSDs being
>>> used for log and cache.
>>> pool: storage1 state: ONLINE status: The pool is formatted using a
>>> legacy on-disk format.  The pool can still be used, but some
>>> features are unavailable. action: Upgrade the pool using 'zpool
>>> upgrade'.  Once this is done, the pool will no longer be accessible
>>> on software that does not support feature flags. scan: scrub
>>> repaired 0 in 6h4m with 0 errors on Sun Jan  6 06:39:38 2013 
>>> config:
>>> NAME        STATE     READ WRITE CKSUM storage1    ONLINE       0
>>> 0     0 raidz1-0  ONLINE       0     0     0 da0     ONLINE       0
>>> 0     0 da2     ONLINE       0     0     0 da4     ONLINE       0
>>> 0     0 da6     ONLINE       0     0     0 raidz1-1  ONLINE       0
>>> 0     0 da1     ONLINE       0     0     0 da3     ONLINE       0
>>> 0     0 da5     ONLINE       0     0     0 da7     ONLINE       0
>>> 0     0 logs mirror-2  ONLINE       0     0     0 da8p2   ONLINE
>>> 0     0     0 da9p2   ONLINE       0     0     0 cache da8p3
>>> ONLINE       0     0     0 da9p3     ONLINE       0     0     0
>>> errors: No known data errors



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51D57D84.8090004>