Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 May 2011 20:36:00 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Garrett Cooper <yanegomi@gmail.com>
Cc:        Kirk McKusick <mckusick@mckusick.com>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full
Message-ID:  <20110505173600.GB48734@deviant.kiev.zoral.com.ua>
In-Reply-To: <9E4C162F-B4EA-4378-A010-3E8D0D23EA93@gmail.com>
References:  <BANLkTik4=O_1PWB2GzGzY=m51dG-Kbhe%2BQ@mail.gmail.com> <201105040559.p445xEJ5024585@chez.mckusick.com> <BANLkTikAQ6Jz4Jbjxh51iA-cjCYmdx1mSg@mail.gmail.com> <BANLkTik8F_SvEzW-vPW9=dZUEJuYOy9WcQ@mail.gmail.com> <20110504090718.GN48734@deviant.kiev.zoral.com.ua> <9E4C162F-B4EA-4378-A010-3E8D0D23EA93@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--sQwJCUUiTMMw/Obe
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, May 05, 2011 at 10:23:47AM -0700, Garrett Cooper wrote:
> On May 4, 2011, at 2:07 AM, Kostik Belousov wrote:
>=20
> > On Tue, May 03, 2011 at 11:58:49PM -0700, Garrett Cooper wrote:
> >> On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper <yanegomi@gmail.com> w=
rote:
> >>> On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick <mckusick@mckusick.com=
> wrote:
> >>>>> Date: Tue, 3 May 2011 22:40:26 -0700
> >>>>> Subject: Nasty non-recursive lockmgr panic on softdep only enabled =
UFS
> >>>>>  partition when filesystem full
> >>>>> From: Garrett Cooper <yanegomi@gmail.com>
> >>>>> To: Jeff Roberson <jeff@freebsd.org>,
> >>>>>         Marshall Kirk McKusick <mckusick@mckusick.com>
> >>>>> Cc: FreeBSD Current <freebsd-current@freebsd.org>
> >>>>>=20
> >>>>> Hi Jeff and Dr. McKusick,
> >>>>>     Ran into this panic when /usr ran out of space doing a make
> >>>>> universe on amd64/r221219 (it took ~15 minutes for the panic to occ=
ur
> >>>>> after the filesystem ran out of space -- wasn't quite sure what it =
was
> >>>>> doing at the time):
> >>>>>=20
> >>>>> ...
> >>>>>=20
> >>>>>     Let me know what other commands you would like for me to run in=
 kgdb.
> >>>>> Thanks,
> >>>>> -Garrett
> >>>>=20
> >>>> You did not indicate whether you are running an 8.X system or a 9-cu=
rrent
> >>>> system. It would be helpful to know that.
> >>>=20
> >>> I've actually been running CURRENT for a few years now, but you're ri=
ght --
> >>> I didn't mention that part.
> >>>=20
> >>>> Jeff thinks that there may be a potential race in the locking code f=
or
> >>>> softdep_request_cleanup. If so, this patch for 9-current should fix =
it:
> >>>>=20
> >>>> Index: ffs_softdep.c
> >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>>> --- ffs_softdep.c       (revision 221385)
> >>>> +++ ffs_softdep.c       (working copy)
> >>>> @@ -11380,7 +11380,8 @@
> >>>>                                continue;
> >>>>                        }
> >>>>                        MNT_IUNLOCK(mp);
> >>>> -                       if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, c=
urthread)) {
> >>>> +                       if (vget(lvp, LK_EXCLUSIVE | LK_NOWAIT | LK_=
INTERLOCK,
> >>>> +                           curthread)) {
> >>>>                                MNT_ILOCK(mp);
> >>>>                                continue;
> >>>>                        }
> >>>>=20
> >>>> If you are running an 8.X system, hopefully you will be able to appl=
y it.
> >>>=20
> >>>    I've applied it, rebuilt and installed the kernel, and trying to
> >>> repro the case again. Will let you know how things go!
> >>=20
> >>    Happened again with the change. It's really easy to repro:
> >>=20
> >> 1. Get a filesystem with UFS+SU
> >> 2. Execute something that does a large number of small writes to a par=
tition.
> >> 3. 'dd if=3D/dev/zero of=3DFOO bs=3D10m' on the same partition
> >>=20
> >>    The kernel will panic with the issue I discussed above.
> >> Thanks!
> >=20
> > Jeff' change is required to avoid LORs, but it is not sufficient to
> > prevent recursion. We must skip the vnode supplied as a parameter to
> > softdep_request_cleanup(). Theoretically, other vnodes might be also
> > locked by curthread, thus I think the change below is needed. Try this.
> >=20
> > diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
> > index a6d4441..25fa5d6 100644
> > --- a/sys/ufs/ffs/ffs_softdep.c
> > +++ b/sys/ufs/ffs/ffs_softdep.c
> > @@ -11380,7 +11380,9 @@ retry:
> > 				continue;
> > 			}
> > 			MNT_IUNLOCK(mp);
> > -			if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) {
> > +			if (VOP_ISLOCKED(lvp) ||
> > +			    vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT,
> > +			    curthread)) {
> > 				MNT_ILOCK(mp);
> > 				continue;
> > 			}
>=20
> 	Ran into the same panic after I applied the patch above with the repro s=
teps I described before. One thing that I noticed is that the issue isn't a=
s easy to reproduce unless you add the dd in parallel with the make operati=
on.

Well, I misread your original report. Also, there is another issue
that is easily reproducable in similar situation. The latest patch
is below.

diff --git a/sys/sys/mount.h b/sys/sys/mount.h
index 231e3d6..f064053 100644
--- a/sys/sys/mount.h
+++ b/sys/sys/mount.h
@@ -366,6 +366,8 @@ void          __mnt_vnode_markerfree(struct vnode **mvp=
, struct mount *mp);
 #define MNT_LAZY	3	/* push data not written by filesystem syncer */
 #define MNT_SUSPEND	4	/* Suspend file system after sync */
=20
+#define	MNT_WAIT_ADV	0x10000000	/* MNT_WAIT prevent deadlock */
+
 /*
  * Generic file handle
  */
diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
index e60514d..87837cc 100644
--- a/sys/ufs/ffs/ffs_alloc.c
+++ b/sys/ufs/ffs/ffs_alloc.c
@@ -420,13 +420,13 @@ nospace:
 	 */
 	if (reclaimed =3D=3D 0) {
 		reclaimed =3D 1;
-		softdep_request_cleanup(fs, vp, cred, FLUSH_BLOCKS_WAIT);
-		UFS_UNLOCK(ump);
 		if (bp) {
+			UFS_UNLOCK(ump);
 			brelse(bp);
 			bp =3D NULL;
+			UFS_LOCK(ump);
 		}
-		UFS_LOCK(ump);
+		softdep_request_cleanup(fs, vp, cred, FLUSH_BLOCKS_WAIT);
 		goto retry;
 	}
 	UFS_UNLOCK(ump);
diff --git a/sys/ufs/ffs/ffs_extern.h b/sys/ufs/ffs/ffs_extern.h
index d819c8a..d12e1dc 100644
--- a/sys/ufs/ffs/ffs_extern.h
+++ b/sys/ufs/ffs/ffs_extern.h
@@ -141,7 +141,7 @@ void	softdep_setup_inofree(struct mount *, struct buf *=
, ino_t,
 void	softdep_setup_sbupdate(struct ufsmount *, struct fs *, struct buf *);
 void 	*softdep_setup_trunc(struct vnode *vp, off_t length, int flags);
 void	softdep_fsync_mountdev(struct vnode *);
-int	softdep_sync_metadata(struct vnode *);
+int	softdep_sync_metadata(struct vnode *, int flags);
 int     softdep_process_worklist(struct mount *, int);
 int     softdep_fsync(struct vnode *);
 int	softdep_waitidle(struct mount *);
diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index a6d4441..0b66e68 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -492,7 +492,7 @@ softdep_flushworklist(oldmnt, countp, td)
 }
=20
 int
-softdep_sync_metadata(struct vnode *vp)
+softdep_sync_metadata(struct vnode *vp, int flags)
 {
=20
 	return (0);
@@ -733,7 +733,7 @@ static	void unlinked_inodedep(struct mount *, struct in=
odedep *);
 static	void clear_unlinked_inodedep(struct inodedep *);
 static	struct inodedep *first_unlinked_inodedep(struct ufsmount *);
 static	int flush_pagedep_deps(struct vnode *, struct mount *,
-	    struct diraddhd *);
+	    struct diraddhd *, int);
 static	void free_pagedep(struct pagedep *);
 static	int flush_newblk_dep(struct vnode *, struct mount *, ufs_lbn_t);
 static	int flush_inodedep_deps(struct mount *, ino_t);
@@ -10662,7 +10662,7 @@ restart:
  * associated with the file. If any I/O errors occur, they are returned.
  */
 int
-softdep_sync_metadata(struct vnode *vp)
+softdep_sync_metadata(struct vnode *vp, int flags)
 {
 	struct pagedep *pagedep;
 	struct allocindir *aip;
@@ -10792,7 +10792,8 @@ loop:
 					continue;
 				if ((error =3D
 				    flush_pagedep_deps(vp, wk->wk_mp,
-						&pagedep->pd_diraddhd[i]))) {
+						&pagedep->pd_diraddhd[i],
+						flags))) {
 					FREE_LOCK(&lk);
 					goto loop_end;
 				}
@@ -11056,10 +11057,11 @@ flush_newblk_dep(vp, mp, lbn)
  * Called with splbio blocked.
  */
 static int
-flush_pagedep_deps(pvp, mp, diraddhdp)
+flush_pagedep_deps(pvp, mp, diraddhdp, flags)
 	struct vnode *pvp;
 	struct mount *mp;
 	struct diraddhd *diraddhdp;
+	int flags;
 {
 	struct inodedep *inodedep;
 	struct inoref *inoref;
@@ -11069,8 +11071,13 @@ flush_pagedep_deps(pvp, mp, diraddhdp)
 	int error =3D 0;
 	struct buf *bp;
 	ino_t inum;
+	int lkflags;
=20
 	ump =3D VFSTOUFS(mp);
+	lkflags =3D LK_EXCLUSIVE;
+	if ((flags & MNT_WAIT_ADV) !=3D 0)
+		lkflags |=3D LK_NOWAIT;
+
 restart:
 	while ((dap =3D LIST_FIRST(diraddhdp)) !=3D NULL) {
 		/*
@@ -11112,7 +11119,7 @@ restart:
 		}
 		if (dap->da_state & MKDIR_BODY) {
 			FREE_LOCK(&lk);
-			if ((error =3D ffs_vgetf(mp, inum, LK_EXCLUSIVE, &vp,
+			if ((error =3D ffs_vgetf(mp, inum, lkflags, &vp,
 			    FFSV_FORCEINSMQ)))
 				break;
 			error =3D flush_newblk_dep(vp, mp, 0);
@@ -11176,7 +11183,7 @@ retry:
 		 */
 		if (dap =3D=3D LIST_FIRST(diraddhdp)) {
 			FREE_LOCK(&lk);
-			if ((error =3D ffs_vgetf(mp, inum, LK_EXCLUSIVE, &vp,
+			if ((error =3D ffs_vgetf(mp, inum, lkflags, &vp,
 			    FFSV_FORCEINSMQ)))
 				break;
 			error =3D ffs_update(vp, MNT_WAIT);
@@ -11379,17 +11386,17 @@ retry:
 				VI_UNLOCK(lvp);
 				continue;
 			}
-			MNT_IUNLOCK(mp);
-			if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) {
-				MNT_ILOCK(mp);
+			if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT,
+			    curthread)) {
 				continue;
 			}
+			MNT_IUNLOCK(mp);
 			if (lvp->v_vflag & VV_NOSYNC) {	/* unlinked */
 				vput(lvp);
 				MNT_ILOCK(mp);
 				continue;
 			}
-			(void) ffs_syncvnode(lvp, MNT_WAIT);
+			(void) ffs_syncvnode(lvp, MNT_WAIT | MNT_WAIT_ADV);
 			vput(lvp);
 			MNT_ILOCK(mp);
 		}
diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
index cf6a5a8..c73e2a5 100644
--- a/sys/ufs/ffs/ffs_vnops.c
+++ b/sys/ufs/ffs/ffs_vnops.c
@@ -216,9 +216,11 @@ ffs_syncvnode(struct vnode *vp, int waitfor)
 	struct bufobj *bo;
 	struct buf *bp;
 	struct buf *nbp;
-	int s, error, wait, passes, skipmeta;
+	int s, error, wait, passes, skipmeta, wait_adv;
 	ufs_lbn_t lbn;
=20
+	wait_adv =3D waitfor & MNT_WAIT_ADV;
+	waitfor &=3D ~MNT_WAIT_ADV;
 	wait =3D (waitfor =3D=3D MNT_WAIT);
 	lbn =3D lblkno(ip->i_fs, (ip->i_size + ip->i_fs->fs_bsize - 1));
 	bo =3D &vp->v_bufobj;
@@ -328,7 +330,7 @@ loop:
 		 * with the vnode has been written.
 		 */
 		splx(s);
-		if ((error =3D softdep_sync_metadata(vp)) !=3D 0)
+		if ((error =3D softdep_sync_metadata(vp, wait_adv)) !=3D 0)
 			return (error);
 		s =3D splbio();
=20

--sQwJCUUiTMMw/Obe
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk3C4AAACgkQC3+MBN1Mb4i9SQCfQ9R48iy4hvi+sp+42WvpjjCk
lI0AoIHZ+R0+wqmTQ05gD3WWZrg+MUO+
=VBOf
-----END PGP SIGNATURE-----

--sQwJCUUiTMMw/Obe--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110505173600.GB48734>