Date: Fri, 18 Aug 2006 20:20:01 +0000 (UTC) From: Tor Egge <Tor.Egge@cvsup.no.freebsd.org> To: kostikbel@gmail.com Cc: freebsd-fs@freebsd.org, tegge@freebsd.org Subject: Re: Deadlock between nfsd and snapshots. Message-ID: <20060818.202001.74745664.Tor.Egge@cvsup.no.freebsd.org> In-Reply-To: <20060818164903.GF20768@deviant.kiev.zoral.com.ua> References: <20060817113203.GD1483@deviant.kiev.zoral.com.ua> <20060817170314.GA17490@peter.osted.lan> <20060818164903.GF20768@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
> First, big thanks to Peter for helping debugging the problem ! > > This deadlock happens between processes 764 (nfsd) and 62981 (mksnap_ffs). > In fact, deadlock is not specific to nfsd. It happens when ufs_inactive() > interposes with ffs_snapshot. [snip] > On the other hand, ufs_inactive calls vn_start_secondary_write(vp, XXX, > V_WAIT). ufs_inactive is running with vnode locked, If happens at the right > time, system will deadlock. > > nfsd is the most vulnerable to the problem due to it oftenly being the > only (and last) user of vnode, vput() from nfsd have high chance resulting > in vinactive(). > > Below is the patch that set VI_OWEINACT for the inode if the last call to > vn_start_sec_write(..., V_NOWAIT) fails. The return from that point is safe > because mp == NULL means that no previous code that changes inode was > executed. > Please, review and test. The deadlock indicates that one or more of IN_CHANGE, IN_MODIFIED or IN_UPDATE was set on the inode, indicating a write operation (e.g. VOP_WRITE(), VOP_RENAME(), VOP_CREATE(), VOP_REMOVE(), VOP_LINK(), VOP_SYMLINK(), VOP_SETATTR(), VOP_MKDIR(), VOP_RMDIR(), VOP_MKNOD()) that was not protected by vn_start_write() or vn_start_secondary_write(). The suspension of the file system should have cleared those flags on all related inodes. Write operations protected by vn_start_write() should have blocked without holding any vnode lock until the file system was resumed while write operations protected by vn_start_secondary_write() should have triggered a retry of the vnode sync loop in ffs_sync(). Such unprotected write operations might render the snapshot inconsistent. Your patch addresses the deadlock symptom but not the cause. - Tor Egge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060818.202001.74745664.Tor.Egge>