Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Apr 2009 21:56:17 -0400
From:      Ben Kelly <ben@wanderview.com>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        current@freebsd.org
Subject:   Re: [patch] zfs livelock and thread priorities
Message-ID:  <38E0E938-68DA-4D2E-8191-3CEC836A82E9@wanderview.com>
In-Reply-To: <49F8E71B.2020102@freebsd.org>
References:  <AC3C4C3F-40C6-4AF9-BAF3-2C4D1E444839@wanderview.com>	<ed91d4a80904142135n429dea52o672abf51116fa707@mail.gmail.com>	<ed91d4a80904241816r28531a04r2dc70fa8960d430e@mail.gmail.com>	<bc2d970904241947r50576efbgc93164a9e4dd297d@mail.gmail.com>	<ed91d4a80904242059n3642a40aud55df6d1b6a1695@mail.gmail.com>	<FC83DB1E-6C08-4BD4-8BC9-437D714FEE9E@wanderview.com>	<ed91d4a80904271839l49420c8rbcfd52dd6e72eb83@mail.gmail.com>	<ed91d4a80904281111q3b9a3c45vc9fcf129dde8c10d@mail.gmail.com>	<F86D3461-3ABD-4A56-B9A6-36857364DF4B@wanderview.com>	<4D8E4457-89AA-4F19-9960-E090D3B8E319@wanderview.com>	<20090429064303.GA2189@a91-153-125-115.elisa-laajakaista.fi>	<A83EA714-1EB5-41C1-91E2-FD031FD0DE0E@wanderview.com> <DA5E47A7-8D3C-4D79-A36E-4460ADC9E3F3@wanderview.com> <49F8E71B.2020102@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Apr 29, 2009, at 7:47 PM, Lawrence Stewart wrote:
> Ben Kelly wrote:
>> On Apr 29, 2009, at 7:58 AM, Ben Kelly wrote:
>>> On Apr 29, 2009, at 2:43 AM, Jaakko Heinonen wrote:
>>>> On 2009-04-28, Ben Kelly wrote:
>>>>>> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff
>>>>>
>>>>> The patch is updated in the same location above.
>>>>
>>>> There's a fatal typo in the patch:
>>>>
>>>> -    ZFS_OBJ_HOLD_ENTER(zfsvfs, z_id);
>>>> +    locked == ZFS_OBJ_HOLD_TRYENTER(zfsvfs, z_id);
>>>>            ^^^^
>>>
>>> Yikes!  Thanks for catching this!
>>>
>>> The patch has been updated at the same URL.  If anyone has patched  
>>> their system please grab the new version.  Sorry for the confusion.
>> Argh!  The patch was still broken even after this fix.
>> Apparently when I tested my taskqueue solution I forgot to do a  
>> make installkernel.  For some reason the taskqueue approach  
>> deadlocks my server at home under normal conditions.  Therefore I  
>> have reverted the patch to use the simple return.  I still don't  
>> think this is the right solution, but I don't have time to  
>> completely figure out what is going on right now.
>> Again, sorry for the mess!
>
> As far as I can tell, one of the developers is working on a patch to  
> address the same issue you're discussing in this thread. He ran into  
> it on his SSD ZFS installation and the symptoms sound likely to be  
> the same as what you're discussing. I believe he's testing a patch  
> which is inspired by the one the opensolaris guys used to fix the  
> bug, which you can look at here:
>
> http://people.freebsd.org/~pjd/patches/vn_rele_hang.patch
>
> The open solaris one has major incompatibilities with FreeBSD so  
> can't be applied directly.
>
> As soon as it's ready I think he'll be making it available for wider  
> testing so stay tuned.
>
> Cheers,
> Lawrence
>
> PS Apologies if the issue you're working on is not the same as the  
> one addressed by the opensolaris patch above.


Thank you!  This does appear to be the same issue and I look forward  
to seeing the final fix.

For now I've gone ahead and updated my patch with a naive adaptation  
of the opensolaris diff.  It seems more correct than what I had and I  
was worried people would waste time testing my broken approach.  I've  
only been able to test it on my i386, non-SMP server however.

Thanks again.

- Ben



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?38E0E938-68DA-4D2E-8191-3CEC836A82E9>