FreeBSD Mail Archives

Date:      Sun, 20 Aug 2006 19:28:45 +0200
From:      Peter Holm <peter@holm.cc>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-fs@freebsd.org, tegge@freebsd.org
Subject:   Re: Deadlock between nfsd and snapshots. [Was: Re: Livelock while accessing /tmp]
Message-ID:  <20060820172845.GA74767@peter.osted.lan>
In-Reply-To: <20060818164903.GF20768@deviant.kiev.zoral.com.ua>
References:  <20060816155310.GA64420@peter.osted.lan> <20060817105155.GC1483@deviant.kiev.zoral.com.ua> <22339.193.3.142.123.1155814154.squirrel@webmail4.pair.com> <20060817113203.GD1483@deviant.kiev.zoral.com.ua> <20060817170314.GA17490@peter.osted.lan> <20060818164903.GF20768@deviant.kiev.zoral.com.ua>

On Fri, Aug 18, 2006 at 07:49:03PM +0300, Konstantin Belousov wrote:
> On Thu, Aug 17, 2006 at 07:03:14PM +0200, Peter Holm wrote:
> > 
> > Ok, I got a new one after some 6 hours of testing with the attached
> > script + the default stress test:
> > http://people.freebsd.org/~pho/stress/log/cons205a.html
> > 
> > - Peter
> 
> First, big thanks to Peter for helping debugging the problem !
> 
> This deadlock happens between processes 764 (nfsd) and 62981 (mksnap_ffs).
> In fact, deadlock is not specific to nfsd. It happens when ufs_inactive()
> interposes with ffs_snapshot.
> 
> 
> Look:
> 
> db> where 764
> Tracing pid 764 tid 100076 td 0xc3fdb870
> sched_switch(c3fdb870,0,1) at sched_switch+0x183
> mi_switch(1,0) at mi_switch+0x280
> sleepq_switch(c40ca57c,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd
> sleepq_wait(c40ca57c,0,c0927acf,3f3,c093229c,...) at sleepq_wait+0x46
> msleep(c40ca57c,c40ca534,29f,c0927b18,0,...) at msleep+0x27d
> vn_start_secondary_write(c59bc820,e6586988,1) at vn_start_secondary_write+0x122
> ufs_inactive(e65869b8) at ufs_inactive+0x257
> VOP_INACTIVE_APV(c09d9a00,e65869b8) at VOP_INACTIVE_APV+0x7e
> vinactive(c59bc820,c3fdb870) at vinactive+0x72
> vput(c59bc820,c0a0b0c8,1,c0932293,407,...) at vput+0x1b3
> nfsrv_read(c4703600,c3f12900,c3fdb870,e6586c40) at nfsrv_read+0xc21
> nfssvc_nfsd(c3fdb870) at nfssvc_nfsd+0x409
> nfssvc(c3fdb870,e6586d04) at nfssvc+0x18c
> syscall(3b,3b,3b,1,0,...) at syscall+0x256
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> 
> db> where 62981
> Tracing pid 62981 tid 100135 td 0xc46e3d80
> sched_switch(c46e3d80,0,1) at sched_switch+0x183
> mi_switch(1,0) at mi_switch+0x280
> sleepq_switch(c59bc878,c0a0b0b0,0,c092000a,211,...) at sleepq_switch+0xcd
> sleepq_wait(c59bc878,0,c59bc89c,b1,c0926903,...) at sleepq_wait+0x46
> msleep(c59bc878,c0a0a930,50,c0924f24,0,...) at msleep+0x27d
> acquire(e66ee5a8,40,60000,c46e3d80,0,...) at acquire+0x76
> lockmgr(c59bc878,2002,c59bc89c,c46e3d80) at lockmgr+0x44a
> ffs_lock(e66ee600) at ffs_lock+0x6e
> VOP_LOCK_APV(c09d9a00,e66ee600) at VOP_LOCK_APV+0x87
> vn_lock(c59bc820,2002,c46e3d80,c59bc820) at vn_lock+0xa8
> ffs_snapshot(c40ca510,c3defb60,c3defb60,c401e000,c4016514,...) at ffs_snapshot+0x1210
> ffs_mount(c40ca510,c46e3d80,20000000,201300,0,...) at ffs_mount+0x927
> vfs_domount(c46e3d80,c3dffa80,c3d45b40,1211300,c3f662c0,c0a0b0c8,0,c09268fa,2b0) at vfs_domount+0x554
> vfs_donmount(c46e3d80,1211300,e66eebac) at vfs_donmount+0x414
> kernel_mount(c3fc5690,1211300,bfbfecdc,0,0,...) at kernel_mount+0x6d
> ffs_cmount(c3fc5690,bfbfe500,1211300,c46e3d80,c09d96e0,...) at ffs_cmount+0x5d
> mount(c46e3d80,e66eed04) at mount+0x15e
> syscall(3b,3b,3b,2816772c,bfbfe4a0,...) at syscall+0x256
> 
> mnt_kern_flag = 0x2c000000 (MNTK_SUSPEND | MNTK_SUSPEND2 | MNTK_MPSAFE).
> 
> vn_lock in the ffs_snapshot is called with flags LK_INTERLOCK | LK_EXCLUSIVE.
> There is only one such place in the ffs_snapshot.c, at line 541.
> 
> On the other hand, ufs_inactive calls vn_start_secondary_write(vp, XXX, V_WAIT).
> ufs_inactive is running with vnode locked, If happens at the right time,
> system will deadlock.
> 
> nfsd is the most vulnerable to the problem due to it oftenly being the
> only (and last) user of vnode, vput() from nfsd have high chance resulting
> in vinactive().
> 
> Below is the patch that set VI_OWEINACT for the inode if the last call to
> vn_start_sec_write(..., V_NOWAIT) fails. The return from that point is safe
> because mp == NULL means that no previous code that changes inode was executed.
> 
> Please, review and test.
> 

I have tested your patch for more than 24 hours and ran into this
panic: http://people.freebsd.org/~pho/stress/log/cons205b.html

- Peter

> Index: sys/ufs/ufs/ufs_inode.c
> ===================================================================
> RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_inode.c,v
> retrieving revision 1.67
> diff -u -r1.67 ufs_inode.c
> --- sys/ufs/ufs/ufs_inode.c	9 May 2006 22:33:43 -0000	1.67
> +++ sys/ufs/ufs/ufs_inode.c	18 Aug 2006 16:42:48 -0000
> @@ -147,9 +147,23 @@
>  			mp = NULL;
>  			ip->i_flag &= ~IN_ACCESS;
>  		} else {
> -			if (mp == NULL)
> -				(void) vn_start_secondary_write(vp, &mp,
> -								V_WAIT);
> +			if (mp == NULL) {
> +			loop1:
> +				if (vn_start_secondary_write(vp, &mp, V_NOWAIT)) {
> +					MNT_ILOCK(mp);
> +					if ((mp->mnt_kern_flag &
> +					     (MNTK_SUSPEND2 | MNTK_SUSPENDED)) == 0) {
> +						MNT_IUNLOCK(mp);
> +						goto loop1;
> +					}
> +					
> +					VI_LOCK(vp);
> +					vp->v_iflag |= VI_OWEINACT;
> +					VI_UNLOCK(vp);
> +					MNT_IUNLOCK(mp);
> +					return (0);
> +				}
> +			}
>  			UFS_UPDATE(vp, 0);
>  		}
>  	}

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (FreeBSD)
> 
> iD8DBQFE5e9+C3+MBN1Mb4gRAqlxAKCqmgB9LqfeuVA0H5wTihtwDcurBACcCWs7
> k+kLvfy3/ko+YS7pDWeagoo=
> =PGnw
> -----END PGP SIGNATURE-----


-- 
Peter Holm

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060820172845.GA74767>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation