Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Aug 2009 02:34:10 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Andrew Brampton <brampton+freebsd@gmail.com>
Cc:        freebsd-current@freebsd.org
Subject:   Re: [patch] rw_try_wlock does not set recursive bit when recursing
Message-ID:  <3bbf2fe10908261734k541cc5f1kc13ffd1e581d3f13@mail.gmail.com>
In-Reply-To: <d41814900908261638w297be37ax53462163fd7e13ed@mail.gmail.com>
References:  <d41814900908261537r1726c420p11d3547afde98bbc@mail.gmail.com> <3bbf2fe10908261557i286f5ec5q58eb4fec1abbfc7b@mail.gmail.com> <d41814900908261638w297be37ax53462163fd7e13ed@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2009/8/27 Andrew Brampton <brampton+freebsd@gmail.com>:
> 2009/8/26 Attilio Rao <attilio@freebsd.org>:
>> 2009/8/27 Andrew Brampton <brampton+freebsd@gmail.com>:
>>> Hi,
>>> The following sequence of commands fails on line 4 with an assertion
>>> that the lock is not currently held:
>>>
>>> 1: rw_wlock(&rw);
>>> 2: if ( rw_try_wlock(&rw) )
>>> 3:    rw_wunlock(&rw);
>>> 4: rw_wunlock(&rw);
>>>
>>> This is because after line 3 is executed the rw lock is no longer
>>> held. I tracked this bug down to _rw_try_wlock which correctly
>>> increments rw_recurse, but does not set the RW_LOCK_RECURSED bit.
>>> Without this bit the third line unlocks the lock and leaves it in a
>>> unlocked state (when it should still be locked). Adding a line to set
>>> this bit makes _rw_try_wlock match the code in _rw_wlock_hard.
>>
>> Sorry, but I really don't understand how that can be a bug.
>> On STABLE_7, RW_LOCK_RECURSED is not used for checking if the lock is
>> recursed or not.
>> it just got set for improving debugging and eventually we decided to
>> drop it for 8.0.
>>
>> However, for deciding if a lock is recursed or not in both STABLE_7
>> and HEAD we used just checking against the recursion count which is
>> correctly handled by the function.
>>
>> What you describe can't be the problem.
>>
>> Attilio
>>
>
> Ok, so I have had a better look at the code in CURRENT, and compared
> it to the code in STABLE_7. I apologise but I think I mixed up my
> sources somewhere and the problem does not appear in CURRENT. The
> problem does however occur in STABLE_7, and I can explain that below.
>
> 1: rw_wlock(&rw);
> 2: if ( rw_try_wlock(&rw) )
> 3:    rw_wunlock(&rw);
> 4: rw_wunlock(&rw);
>
> Line 1, _rw_wlock gets called which calls __rw_wlock. This in turn
> calls _rw_write_lock which changes rw->rw_lock to tid
> Line 2, _rw_try_wlock gets called, which check if the lock is already
> held (which it is), if so it then rw->rw_recurse++
> Line 3, _rw_wunlock gets called which calls __rw_wunlock, which then
> calls _rw_write_unlock. Now _rw_write_unlock trys to unlock the lock
> by checking if rw->rw_lock is tid, if so it sets the lock to
> RW_UNLOCKED. This is where the problem occurs, there is no check on
> rx->rw_recurse.
>
> Now, if we used this code instead:
> 1: rw_wlock(&rw);
> 2: if ( rw_try_wlock(&rw) )
> 3:    rw_wunlock(&rw);
> 4: rw_wunlock(&rw);
>
> The order goes:
> Line 1, _rw_wlock gets called which calls __rw_wlock. This in turn
> calls _rw_write_lock which changes rw->rw_lock to tid
> Line 2, _rw_wlock gets called which calls __rw_wlock. __rw_wlock
> checks if rw->rw_lock is RW_UNLOCKED, otherwise it ends up calling
> _rw_wlock_hard. Inside _rw_wlock_hard it checks if the lock is already
> held, if so it increments rw_recurse, and changes rw->rw_lock to
> (rw->rw_lock | RW_LOCK_RECURSED) by calling atomic_set_ptr.
> Line 3, _rw_wunlock gets called which calls __rw_wunlock, which then
> calls _rw_write_unlock. Now _rw_write_unlock trys to unlock the lock
> by checking if rw->rw_lock is tid. This is where things differ from
> the previous code, rw->rw_lock is NOT tid, it is now (tid |
> RW_LOCK_RECURSED), and thus the code drops into _rw_wunlock_hard which
> decrements rx->rw_recurse and removes the RW_LOCK_RECURSED flag if
> rw_recurse equals zero.
>
> Hopefully you see how not setting the RW_LOCK_RECURSED flag causes a
> problem in rw_wunlock. So this can be fixed by either back porting the
> CURRENT code to STABLE_7, or using the one line fix in my patch. As I
> already stated, I am using my patch and the sample code I have given
> no longer panics.

I see the problem now.
I think the simple patch of your is good and I will commit it.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe10908261734k541cc5f1kc13ffd1e581d3f13>