Date: Sun, 20 Aug 2006 19:28:45 +0200 From: Peter Holm <peter@holm.cc> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-fs@freebsd.org, tegge@freebsd.org Subject: Re: Deadlock between nfsd and snapshots. [Was: Re: Livelock while accessing /tmp] Message-ID: <20060820172845.GA74767@peter.osted.lan> In-Reply-To: <20060818164903.GF20768@deviant.kiev.zoral.com.ua> References: <20060816155310.GA64420@peter.osted.lan> <20060817105155.GC1483@deviant.kiev.zoral.com.ua> <22339.193.3.142.123.1155814154.squirrel@webmail4.pair.com> <20060817113203.GD1483@deviant.kiev.zoral.com.ua> <20060817170314.GA17490@peter.osted.lan> <20060818164903.GF20768@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 18, 2006 at 07:49:03PM +0300, Konstantin Belousov wrote: > On Thu, Aug 17, 2006 at 07:03:14PM +0200, Peter Holm wrote: > > > > Ok, I got a new one after some 6 hours of testing with the attached > > script + the default stress test: > > http://people.freebsd.org/~pho/stress/log/cons205a.html > > > > - Peter > > First, big thanks to Peter for helping debugging the problem ! > > This deadlock happens between processes 764 (nfsd) and 62981 (mksnap_ffs). > In fact, deadlock is not specific to nfsd. It happens when ufs_inactive() > interposes with ffs_snapshot. > > > Look: > > db> where 764 > Tracing pid 764 tid 100076 td 0xc3fdb870 > sched_switch(c3fdb870,0,1) at sched_switch+0x183 > mi_switch(1,0) at mi_switch+0x280 > sleepq_switch(c40ca57c,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd > sleepq_wait(c40ca57c,0,c0927acf,3f3,c093229c,...) at sleepq_wait+0x46 > msleep(c40ca57c,c40ca534,29f,c0927b18,0,...) at msleep+0x27d > vn_start_secondary_write(c59bc820,e6586988,1) at vn_start_secondary_write+0x122 > ufs_inactive(e65869b8) at ufs_inactive+0x257 > VOP_INACTIVE_APV(c09d9a00,e65869b8) at VOP_INACTIVE_APV+0x7e > vinactive(c59bc820,c3fdb870) at vinactive+0x72 > vput(c59bc820,c0a0b0c8,1,c0932293,407,...) at vput+0x1b3 > nfsrv_read(c4703600,c3f12900,c3fdb870,e6586c40) at nfsrv_read+0xc21 > nfssvc_nfsd(c3fdb870) at nfssvc_nfsd+0x409 > nfssvc(c3fdb870,e6586d04) at nfssvc+0x18c > syscall(3b,3b,3b,1,0,...) at syscall+0x256 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > > db> where 62981 > Tracing pid 62981 tid 100135 td 0xc46e3d80 > sched_switch(c46e3d80,0,1) at sched_switch+0x183 > mi_switch(1,0) at mi_switch+0x280 > sleepq_switch(c59bc878,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd > sleepq_wait(c59bc878,0,c59bc89c,b1,c0926903,...) at sleepq_wait+0x46 > msleep(c59bc878,c0a0a930,50,c0924f24,0,...) at msleep+0x27d > acquire(e66ee5a8,40,60000,c46e3d80,0,...) at acquire+0x76 > lockmgr(c59bc878,2002,c59bc89c,c46e3d80) at lockmgr+0x44a > ffs_lock(e66ee600) at ffs_lock+0x6e > VOP_LOCK_APV(c09d9a00,e66ee600) at VOP_LOCK_APV+0x87 > vn_lock(c59bc820,2002,c46e3d80,c59bc820) at vn_lock+0xa8 > ffs_snapshot(c40ca510,c3defb60,c3defb60,c401e000,c4016514,...) at ffs_snapshot+0x1210 > ffs_mount(c40ca510,c46e3d80,20000000,201300,0,...) at ffs_mount+0x927 > vfs_domount(c46e3d80,c3dffa80,c3d45b40,1211300,c3f662c0,c0a0b0c8,0,c09268fa,2b0) at vfs_domount+0x554 > vfs_donmount(c46e3d80,1211300,e66eebac) at vfs_donmount+0x414 > kernel_mount(c3fc5690,1211300,bfbfecdc,0,0,...) at kernel_mount+0x6d > ffs_cmount(c3fc5690,bfbfe500,1211300,c46e3d80,c09d96e0,...) at ffs_cmount+0x5d > mount(c46e3d80,e66eed04) at mount+0x15e > syscall(3b,3b,3b,2816772c,bfbfe4a0,...) at syscall+0x256 > > mnt_kern_flag = 0x2c000000 (MNTK_SUSPEND | MNTK_SUSPEND2 | MNTK_MPSAFE). > > vn_lock in the ffs_snapshot is called with flags LK_INTERLOCK | LK_EXCLUSIVE. > There is only one such place in the ffs_snapshot.c, at line 541. > > On the other hand, ufs_inactive calls vn_start_secondary_write(vp, XXX, V_WAIT). > ufs_inactive is running with vnode locked, If happens at the right time, > system will deadlock. > > nfsd is the most vulnerable to the problem due to it oftenly being the > only (and last) user of vnode, vput() from nfsd have high chance resulting > in vinactive(). > > Below is the patch that set VI_OWEINACT for the inode if the last call to > vn_start_sec_write(..., V_NOWAIT) fails. The return from that point is safe > because mp == NULL means that no previous code that changes inode was executed. > > Please, review and test. > I have tested your patch for more than 24 hours and ran into this panic: http://people.freebsd.org/~pho/stress/log/cons205b.html - Peter > Index: sys/ufs/ufs/ufs_inode.c > =================================================================== > RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_inode.c,v > retrieving revision 1.67 > diff -u -r1.67 ufs_inode.c > --- sys/ufs/ufs/ufs_inode.c 9 May 2006 22:33:43 -0000 1.67 > +++ sys/ufs/ufs/ufs_inode.c 18 Aug 2006 16:42:48 -0000 > @@ -147,9 +147,23 @@ > mp = NULL; > ip->i_flag &= ~IN_ACCESS; > } else { > - if (mp == NULL) > - (void) vn_start_secondary_write(vp, &mp, > - V_WAIT); > + if (mp == NULL) { > + loop1: > + if (vn_start_secondary_write(vp, &mp, V_NOWAIT)) { > + MNT_ILOCK(mp); > + if ((mp->mnt_kern_flag & > + (MNTK_SUSPEND2 | MNTK_SUSPENDED)) == 0) { > + MNT_IUNLOCK(mp); > + goto loop1; > + } > + > + VI_LOCK(vp); > + vp->v_iflag |= VI_OWEINACT; > + VI_UNLOCK(vp); > + MNT_IUNLOCK(mp); > + return (0); > + } > + } > UFS_UPDATE(vp, 0); > } > } > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (FreeBSD) > > iD8DBQFE5e9+C3+MBN1Mb4gRAqlxAKCqmgB9LqfeuVA0H5wTihtwDcurBACcCWs7 > k+kLvfy3/ko+YS7pDWeagoo= > =PGnw > -----END PGP SIGNATURE----- -- Peter Holm
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060820172845.GA74767>