Date: Fri, 18 Aug 2006 19:49:03 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Peter Holm <peter@holm.cc>, tegge@freebsd.org Cc: freebsd-fs@freebsd.org Subject: Deadlock between nfsd and snapshots. [Was: Re: Livelock while accessing /tmp] Message-ID: <20060818164903.GF20768@deviant.kiev.zoral.com.ua> In-Reply-To: <20060817170314.GA17490@peter.osted.lan> References: <20060816155310.GA64420@peter.osted.lan> <20060817105155.GC1483@deviant.kiev.zoral.com.ua> <22339.193.3.142.123.1155814154.squirrel@webmail4.pair.com> <20060817113203.GD1483@deviant.kiev.zoral.com.ua> <20060817170314.GA17490@peter.osted.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
--nqkreNcslJAfgyzk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 17, 2006 at 07:03:14PM +0200, Peter Holm wrote: >=20 > Ok, I got a new one after some 6 hours of testing with the attached > script + the default stress test: > http://people.freebsd.org/~pho/stress/log/cons205a.html >=20 > - Peter First, big thanks to Peter for helping debugging the problem ! This deadlock happens between processes 764 (nfsd) and 62981 (mksnap_ffs). In fact, deadlock is not specific to nfsd. It happens when ufs_inactive() interposes with ffs_snapshot. Look: db> where 764 Tracing pid 764 tid 100076 td 0xc3fdb870 sched_switch(c3fdb870,0,1) at sched_switch+0x183 mi_switch(1,0) at mi_switch+0x280 sleepq_switch(c40ca57c,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd sleepq_wait(c40ca57c,0,c0927acf,3f3,c093229c,...) at sleepq_wait+0x46 msleep(c40ca57c,c40ca534,29f,c0927b18,0,...) at msleep+0x27d vn_start_secondary_write(c59bc820,e6586988,1) at vn_start_secondary_write+0= x122 ufs_inactive(e65869b8) at ufs_inactive+0x257 VOP_INACTIVE_APV(c09d9a00,e65869b8) at VOP_INACTIVE_APV+0x7e vinactive(c59bc820,c3fdb870) at vinactive+0x72 vput(c59bc820,c0a0b0c8,1,c0932293,407,...) at vput+0x1b3 nfsrv_read(c4703600,c3f12900,c3fdb870,e6586c40) at nfsrv_read+0xc21 nfssvc_nfsd(c3fdb870) at nfssvc_nfsd+0x409 nfssvc(c3fdb870,e6586d04) at nfssvc+0x18c syscall(3b,3b,3b,1,0,...) at syscall+0x256 Xint0x80_syscall() at Xint0x80_syscall+0x1f db> where 62981 Tracing pid 62981 tid 100135 td 0xc46e3d80 sched_switch(c46e3d80,0,1) at sched_switch+0x183 mi_switch(1,0) at mi_switch+0x280 sleepq_switch(c59bc878,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd sleepq_wait(c59bc878,0,c59bc89c,b1,c0926903,...) at sleepq_wait+0x46 msleep(c59bc878,c0a0a930,50,c0924f24,0,...) at msleep+0x27d acquire(e66ee5a8,40,60000,c46e3d80,0,...) at acquire+0x76 lockmgr(c59bc878,2002,c59bc89c,c46e3d80) at lockmgr+0x44a ffs_lock(e66ee600) at ffs_lock+0x6e VOP_LOCK_APV(c09d9a00,e66ee600) at VOP_LOCK_APV+0x87 vn_lock(c59bc820,2002,c46e3d80,c59bc820) at vn_lock+0xa8 ffs_snapshot(c40ca510,c3defb60,c3defb60,c401e000,c4016514,...) at ffs_snaps= hot+0x1210 ffs_mount(c40ca510,c46e3d80,20000000,201300,0,...) at ffs_mount+0x927 vfs_domount(c46e3d80,c3dffa80,c3d45b40,1211300,c3f662c0,c0a0b0c8,0,c09268fa= ,2b0) at vfs_domount+0x554 vfs_donmount(c46e3d80,1211300,e66eebac) at vfs_donmount+0x414 kernel_mount(c3fc5690,1211300,bfbfecdc,0,0,...) at kernel_mount+0x6d ffs_cmount(c3fc5690,bfbfe500,1211300,c46e3d80,c09d96e0,...) at ffs_cmount+0= x5d mount(c46e3d80,e66eed04) at mount+0x15e syscall(3b,3b,3b,2816772c,bfbfe4a0,...) at syscall+0x256 mnt_kern_flag =3D 0x2c000000 (MNTK_SUSPEND | MNTK_SUSPEND2 | MNTK_MPSAFE). vn_lock in the ffs_snapshot is called with flags LK_INTERLOCK | LK_EXCLUSIV= E. There is only one such place in the ffs_snapshot.c, at line 541. On the other hand, ufs_inactive calls vn_start_secondary_write(vp, XXX, V_W= AIT). ufs_inactive is running with vnode locked, If happens at the right time, system will deadlock. nfsd is the most vulnerable to the problem due to it oftenly being the only (and last) user of vnode, vput() from nfsd have high chance resulting in vinactive(). Below is the patch that set VI_OWEINACT for the inode if the last call to vn_start_sec_write(..., V_NOWAIT) fails. The return from that point is safe because mp =3D=3D NULL means that no previous code that changes inode was e= xecuted. Please, review and test. Index: sys/ufs/ufs/ufs_inode.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_inode.c,v retrieving revision 1.67 diff -u -r1.67 ufs_inode.c --- sys/ufs/ufs/ufs_inode.c 9 May 2006 22:33:43 -0000 1.67 +++ sys/ufs/ufs/ufs_inode.c 18 Aug 2006 16:42:48 -0000 @@ -147,9 +147,23 @@ mp =3D NULL; ip->i_flag &=3D ~IN_ACCESS; } else { - if (mp =3D=3D NULL) - (void) vn_start_secondary_write(vp, &mp, - V_WAIT); + if (mp =3D=3D NULL) { + loop1: + if (vn_start_secondary_write(vp, &mp, V_NOWAIT)) { + MNT_ILOCK(mp); + if ((mp->mnt_kern_flag & + (MNTK_SUSPEND2 | MNTK_SUSPENDED)) =3D=3D 0) { + MNT_IUNLOCK(mp); + goto loop1; + } + =09 + VI_LOCK(vp); + vp->v_iflag |=3D VI_OWEINACT; + VI_UNLOCK(vp); + MNT_IUNLOCK(mp); + return (0); + } + } UFS_UPDATE(vp, 0); } } --nqkreNcslJAfgyzk Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFE5e9+C3+MBN1Mb4gRAqlxAKCqmgB9LqfeuVA0H5wTihtwDcurBACcCWs7 k+kLvfy3/ko+YS7pDWeagoo= =PGnw -----END PGP SIGNATURE----- --nqkreNcslJAfgyzk--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060818164903.GF20768>