Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 May 2011 10:23:47 -0700
From:      Garrett Cooper <yanegomi@gmail.com>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        Kirk McKusick <mckusick@mckusick.com>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full
Message-ID:  <9E4C162F-B4EA-4378-A010-3E8D0D23EA93@gmail.com>
In-Reply-To: <20110504090718.GN48734@deviant.kiev.zoral.com.ua>
References:  <BANLkTik4=O_1PWB2GzGzY=m51dG-Kbhe%2BQ@mail.gmail.com> <201105040559.p445xEJ5024585@chez.mckusick.com> <BANLkTikAQ6Jz4Jbjxh51iA-cjCYmdx1mSg@mail.gmail.com> <BANLkTik8F_SvEzW-vPW9=dZUEJuYOy9WcQ@mail.gmail.com> <20110504090718.GN48734@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On May 4, 2011, at 2:07 AM, Kostik Belousov wrote:

> On Tue, May 03, 2011 at 11:58:49PM -0700, Garrett Cooper wrote:
>> On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper <yanegomi@gmail.com> =
wrote:
>>> On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick =
<mckusick@mckusick.com> wrote:
>>>>> Date: Tue, 3 May 2011 22:40:26 -0700
>>>>> Subject: Nasty non-recursive lockmgr panic on softdep only enabled =
UFS
>>>>>  partition when filesystem full
>>>>> From: Garrett Cooper <yanegomi@gmail.com>
>>>>> To: Jeff Roberson <jeff@freebsd.org>,
>>>>>         Marshall Kirk McKusick <mckusick@mckusick.com>
>>>>> Cc: FreeBSD Current <freebsd-current@freebsd.org>
>>>>>=20
>>>>> Hi Jeff and Dr. McKusick,
>>>>>     Ran into this panic when /usr ran out of space doing a make
>>>>> universe on amd64/r221219 (it took ~15 minutes for the panic to =
occur
>>>>> after the filesystem ran out of space -- wasn't quite sure what it =
was
>>>>> doing at the time):
>>>>>=20
>>>>> ...
>>>>>=20
>>>>>     Let me know what other commands you would like for me to run =
in kgdb.
>>>>> Thanks,
>>>>> -Garrett
>>>>=20
>>>> You did not indicate whether you are running an 8.X system or a =
9-current
>>>> system. It would be helpful to know that.
>>>=20
>>> I've actually been running CURRENT for a few years now, but you're =
right --
>>> I didn't mention that part.
>>>=20
>>>> Jeff thinks that there may be a potential race in the locking code =
for
>>>> softdep_request_cleanup. If so, this patch for 9-current should fix =
it:
>>>>=20
>>>> Index: ffs_softdep.c
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> --- ffs_softdep.c       (revision 221385)
>>>> +++ ffs_softdep.c       (working copy)
>>>> @@ -11380,7 +11380,8 @@
>>>>                                continue;
>>>>                        }
>>>>                        MNT_IUNLOCK(mp);
>>>> -                       if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, =
curthread)) {
>>>> +                       if (vget(lvp, LK_EXCLUSIVE | LK_NOWAIT | =
LK_INTERLOCK,
>>>> +                           curthread)) {
>>>>                                MNT_ILOCK(mp);
>>>>                                continue;
>>>>                        }
>>>>=20
>>>> If you are running an 8.X system, hopefully you will be able to =
apply it.
>>>=20
>>>    I've applied it, rebuilt and installed the kernel, and trying to
>>> repro the case again. Will let you know how things go!
>>=20
>>    Happened again with the change. It's really easy to repro:
>>=20
>> 1. Get a filesystem with UFS+SU
>> 2. Execute something that does a large number of small writes to a =
partition.
>> 3. 'dd if=3D/dev/zero of=3DFOO bs=3D10m' on the same partition
>>=20
>>    The kernel will panic with the issue I discussed above.
>> Thanks!
>=20
> Jeff' change is required to avoid LORs, but it is not sufficient to
> prevent recursion. We must skip the vnode supplied as a parameter to
> softdep_request_cleanup(). Theoretically, other vnodes might be also
> locked by curthread, thus I think the change below is needed. Try =
this.
>=20
> diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
> index a6d4441..25fa5d6 100644
> --- a/sys/ufs/ffs/ffs_softdep.c
> +++ b/sys/ufs/ffs/ffs_softdep.c
> @@ -11380,7 +11380,9 @@ retry:
> 				continue;
> 			}
> 			MNT_IUNLOCK(mp);
> -			if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, =
curthread)) {
> +			if (VOP_ISLOCKED(lvp) ||
> +			    vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | =
LK_NOWAIT,
> +			    curthread)) {
> 				MNT_ILOCK(mp);
> 				continue;
> 			}

	Ran into the same panic after I applied the patch above with the =
repro steps I described before. One thing that I noticed is that the =
issue isn't as easy to reproduce unless you add the dd in parallel with =
the make operation.
Thanks,
-Garrett=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9E4C162F-B4EA-4378-A010-3E8D0D23EA93>