Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Aug 2006 19:49:03 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Peter Holm <peter@holm.cc>, tegge@freebsd.org
Cc:        freebsd-fs@freebsd.org
Subject:   Deadlock between nfsd and snapshots. [Was: Re: Livelock while accessing /tmp]
Message-ID:  <20060818164903.GF20768@deviant.kiev.zoral.com.ua>
In-Reply-To: <20060817170314.GA17490@peter.osted.lan>
References:  <20060816155310.GA64420@peter.osted.lan> <20060817105155.GC1483@deviant.kiev.zoral.com.ua> <22339.193.3.142.123.1155814154.squirrel@webmail4.pair.com> <20060817113203.GD1483@deviant.kiev.zoral.com.ua> <20060817170314.GA17490@peter.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

--nqkreNcslJAfgyzk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Aug 17, 2006 at 07:03:14PM +0200, Peter Holm wrote:
>=20
> Ok, I got a new one after some 6 hours of testing with the attached
> script + the default stress test:
> http://people.freebsd.org/~pho/stress/log/cons205a.html
>=20
> - Peter

First, big thanks to Peter for helping debugging the problem !

This deadlock happens between processes 764 (nfsd) and 62981 (mksnap_ffs).
In fact, deadlock is not specific to nfsd. It happens when ufs_inactive()
interposes with ffs_snapshot.


Look:

db> where 764
Tracing pid 764 tid 100076 td 0xc3fdb870
sched_switch(c3fdb870,0,1) at sched_switch+0x183
mi_switch(1,0) at mi_switch+0x280
sleepq_switch(c40ca57c,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd
sleepq_wait(c40ca57c,0,c0927acf,3f3,c093229c,...) at sleepq_wait+0x46
msleep(c40ca57c,c40ca534,29f,c0927b18,0,...) at msleep+0x27d
vn_start_secondary_write(c59bc820,e6586988,1) at vn_start_secondary_write+0=
x122
ufs_inactive(e65869b8) at ufs_inactive+0x257
VOP_INACTIVE_APV(c09d9a00,e65869b8) at VOP_INACTIVE_APV+0x7e
vinactive(c59bc820,c3fdb870) at vinactive+0x72
vput(c59bc820,c0a0b0c8,1,c0932293,407,...) at vput+0x1b3
nfsrv_read(c4703600,c3f12900,c3fdb870,e6586c40) at nfsrv_read+0xc21
nfssvc_nfsd(c3fdb870) at nfssvc_nfsd+0x409
nfssvc(c3fdb870,e6586d04) at nfssvc+0x18c
syscall(3b,3b,3b,1,0,...) at syscall+0x256
Xint0x80_syscall() at Xint0x80_syscall+0x1f

db> where 62981
Tracing pid 62981 tid 100135 td 0xc46e3d80
sched_switch(c46e3d80,0,1) at sched_switch+0x183
mi_switch(1,0) at mi_switch+0x280
sleepq_switch(c59bc878,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd
sleepq_wait(c59bc878,0,c59bc89c,b1,c0926903,...) at sleepq_wait+0x46
msleep(c59bc878,c0a0a930,50,c0924f24,0,...) at msleep+0x27d
acquire(e66ee5a8,40,60000,c46e3d80,0,...) at acquire+0x76
lockmgr(c59bc878,2002,c59bc89c,c46e3d80) at lockmgr+0x44a
ffs_lock(e66ee600) at ffs_lock+0x6e
VOP_LOCK_APV(c09d9a00,e66ee600) at VOP_LOCK_APV+0x87
vn_lock(c59bc820,2002,c46e3d80,c59bc820) at vn_lock+0xa8
ffs_snapshot(c40ca510,c3defb60,c3defb60,c401e000,c4016514,...) at ffs_snaps=
hot+0x1210
ffs_mount(c40ca510,c46e3d80,20000000,201300,0,...) at ffs_mount+0x927
vfs_domount(c46e3d80,c3dffa80,c3d45b40,1211300,c3f662c0,c0a0b0c8,0,c09268fa=
,2b0) at vfs_domount+0x554
vfs_donmount(c46e3d80,1211300,e66eebac) at vfs_donmount+0x414
kernel_mount(c3fc5690,1211300,bfbfecdc,0,0,...) at kernel_mount+0x6d
ffs_cmount(c3fc5690,bfbfe500,1211300,c46e3d80,c09d96e0,...) at ffs_cmount+0=
x5d
mount(c46e3d80,e66eed04) at mount+0x15e
syscall(3b,3b,3b,2816772c,bfbfe4a0,...) at syscall+0x256

mnt_kern_flag =3D 0x2c000000 (MNTK_SUSPEND | MNTK_SUSPEND2 | MNTK_MPSAFE).

vn_lock in the ffs_snapshot is called with flags LK_INTERLOCK | LK_EXCLUSIV=
E.
There is only one such place in the ffs_snapshot.c, at line 541.

On the other hand, ufs_inactive calls vn_start_secondary_write(vp, XXX, V_W=
AIT).
ufs_inactive is running with vnode locked, If happens at the right time,
system will deadlock.

nfsd is the most vulnerable to the problem due to it oftenly being the
only (and last) user of vnode, vput() from nfsd have high chance resulting
in vinactive().

Below is the patch that set VI_OWEINACT for the inode if the last call to
vn_start_sec_write(..., V_NOWAIT) fails. The return from that point is safe
because mp =3D=3D NULL means that no previous code that changes inode was e=
xecuted.

Please, review and test.

Index: sys/ufs/ufs/ufs_inode.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
retrieving revision 1.67
diff -u -r1.67 ufs_inode.c
--- sys/ufs/ufs/ufs_inode.c	9 May 2006 22:33:43 -0000	1.67
+++ sys/ufs/ufs/ufs_inode.c	18 Aug 2006 16:42:48 -0000
@@ -147,9 +147,23 @@
 			mp =3D NULL;
 			ip->i_flag &=3D ~IN_ACCESS;
 		} else {
-			if (mp =3D=3D NULL)
-				(void) vn_start_secondary_write(vp, &mp,
-								V_WAIT);
+			if (mp =3D=3D NULL) {
+			loop1:
+				if (vn_start_secondary_write(vp, &mp, V_NOWAIT)) {
+					MNT_ILOCK(mp);
+					if ((mp->mnt_kern_flag &
+					     (MNTK_SUSPEND2 | MNTK_SUSPENDED)) =3D=3D 0) {
+						MNT_IUNLOCK(mp);
+						goto loop1;
+					}
+				=09
+					VI_LOCK(vp);
+					vp->v_iflag |=3D VI_OWEINACT;
+					VI_UNLOCK(vp);
+					MNT_IUNLOCK(mp);
+					return (0);
+				}
+			}
 			UFS_UPDATE(vp, 0);
 		}
 	}

--nqkreNcslJAfgyzk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFE5e9+C3+MBN1Mb4gRAqlxAKCqmgB9LqfeuVA0H5wTihtwDcurBACcCWs7
k+kLvfy3/ko+YS7pDWeagoo=
=PGnw
-----END PGP SIGNATURE-----

--nqkreNcslJAfgyzk--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060818164903.GF20768>