Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 May 2011 12:07:18 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Garrett Cooper <yanegomi@gmail.com>
Cc:        Kirk McKusick <mckusick@mckusick.com>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full
Message-ID:  <20110504090718.GN48734@deviant.kiev.zoral.com.ua>
In-Reply-To: <BANLkTik8F_SvEzW-vPW9=dZUEJuYOy9WcQ@mail.gmail.com>
References:  <BANLkTik4=O_1PWB2GzGzY=m51dG-Kbhe%2BQ@mail.gmail.com> <201105040559.p445xEJ5024585@chez.mckusick.com> <BANLkTikAQ6Jz4Jbjxh51iA-cjCYmdx1mSg@mail.gmail.com> <BANLkTik8F_SvEzW-vPW9=dZUEJuYOy9WcQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZaNUQUWeUBsaJYg9
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, May 03, 2011 at 11:58:49PM -0700, Garrett Cooper wrote:
> On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper <yanegomi@gmail.com> wrot=
e:
> > On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick <mckusick@mckusick.com> =
wrote:
> >>> Date: Tue, 3 May 2011 22:40:26 -0700
> >>> Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS
> >>> =9Apartition when filesystem full
> >>> From: Garrett Cooper <yanegomi@gmail.com>
> >>> To: Jeff Roberson <jeff@freebsd.org>,
> >>> =9A =9A =9A =9A Marshall Kirk McKusick <mckusick@mckusick.com>
> >>> Cc: FreeBSD Current <freebsd-current@freebsd.org>
> >>>
> >>> Hi Jeff and Dr. McKusick,
> >>> =9A =9A Ran into this panic when /usr ran out of space doing a make
> >>> universe on amd64/r221219 (it took ~15 minutes for the panic to occur
> >>> after the filesystem ran out of space -- wasn't quite sure what it was
> >>> doing at the time):
> >>>
> >>> ...
> >>>
> >>> =9A =9A Let me know what other commands you would like for me to run =
in kgdb.
> >>> Thanks,
> >>> -Garrett
> >>
> >> You did not indicate whether you are running an 8.X system or a 9-curr=
ent
> >> system. It would be helpful to know that.
> >
> > I've actually been running CURRENT for a few years now, but you're righ=
t --
> > I didn't mention that part.
> >
> >> Jeff thinks that there may be a potential race in the locking code for
> >> softdep_request_cleanup. If so, this patch for 9-current should fix it:
> >>
> >> Index: ffs_softdep.c
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> --- ffs_softdep.c =9A =9A =9A (revision 221385)
> >> +++ ffs_softdep.c =9A =9A =9A (working copy)
> >> @@ -11380,7 +11380,8 @@
> >> =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9Acontinu=
e;
> >> =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A}
> >> =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9AMNT_IUNLOCK(mp);
> >> - =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A if (vget(lvp, LK_EXCLUSI=
VE | LK_INTERLOCK, curthread)) {
> >> + =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A if (vget(lvp, LK_EXCLUSI=
VE | LK_NOWAIT | LK_INTERLOCK,
> >> + =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A curthread)) {
> >> =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9AMNT_ILO=
CK(mp);
> >> =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9Acontinu=
e;
> >> =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A =9A}
> >>
> >> If you are running an 8.X system, hopefully you will be able to apply =
it.
> >
> > =9A =9AI've applied it, rebuilt and installed the kernel, and trying to
> > repro the case again. Will let you know how things go!
>=20
>     Happened again with the change. It's really easy to repro:
>=20
> 1. Get a filesystem with UFS+SU
> 2. Execute something that does a large number of small writes to a partit=
ion.
> 3. 'dd if=3D/dev/zero of=3DFOO bs=3D10m' on the same partition
>=20
>     The kernel will panic with the issue I discussed above.
> Thanks!

Jeff' change is required to avoid LORs, but it is not sufficient to
prevent recursion. We must skip the vnode supplied as a parameter to
softdep_request_cleanup(). Theoretically, other vnodes might be also
locked by curthread, thus I think the change below is needed. Try this.

diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index a6d4441..25fa5d6 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -11380,7 +11380,9 @@ retry:
 				continue;
 			}
 			MNT_IUNLOCK(mp);
-			if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) {
+			if (VOP_ISLOCKED(lvp) ||
+			    vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT,
+			    curthread)) {
 				MNT_ILOCK(mp);
 				continue;
 			}

--ZaNUQUWeUBsaJYg9
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk3BF0YACgkQC3+MBN1Mb4i3iwCgz7uiG4c0n6uwFrvwpleaYTxO
jCkAoLNhIi1EzRnMf7XANzcTxW71VY8d
=As9g
-----END PGP SIGNATURE-----

--ZaNUQUWeUBsaJYg9--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110504090718.GN48734>