Date: Tue, 22 Aug 2006 19:13:15 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Tor Egge <Tor.Egge@cvsup.no.freebsd.org> Cc: freebsd-fs@freebsd.org, tegge@freebsd.org Subject: Re: Deadlock between nfsd and snapshots. Message-ID: <20060822175540.V58720@delplex.bde.org> In-Reply-To: <20060821.132151.41668008.Tor.Egge@cvsup.no.freebsd.org> References: <20060817170314.GA17490@peter.osted.lan> <20060818164903.GF20768@deviant.kiev.zoral.com.ua> <20060818.202001.74745664.Tor.Egge@cvsup.no.freebsd.org> <20060821.132151.41668008.Tor.Egge@cvsup.no.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 21 Aug 2006, Tor Egge wrote: > I wrote: > >> The deadlock indicates that one or more of IN_CHANGE, IN_MODIFIED or >> IN_UPDATE was set on the inode, indicating a write operation >> (e.g. VOP_WRITE(), VOP_RENAME(), VOP_CREATE(), VOP_REMOVE(), VOP_LINK(), >> VOP_SYMLINK(), VOP_SETATTR(), VOP_MKDIR(), VOP_RMDIR(), VOP_MKNOD()) that was >> not protected by vn_start_write() or vn_start_secondary_write(). > > The most common "write" operation was probably VOP_GETATTR(). Reading the attributes really is a write operation if it causes marks for update to be turned into updates. ufs_inactive() is also a write operation if it causes this. I think marking for update shouldn't require much locking. > ufs_itimes(), called from ufs_getattr(), might set the IN_MODIFIED inode flag > if IN_ACCESS is set on the inode even if neither IN_CHANGE nor IN_UPDATE is > set, transitioning the inode flags into a state where ufs_inactive() calls the > blocking variant of vn_start_secondary_write(). In the implementation of marking access times for update on exec, we try hard to avoid calling vn_start_write(), etc. Early implementations did call vn_start_write(), etc., and had some bugs from this. The current implementation is mainly for ffs and is rather fragile: VOP_SETATTR() depends on callers calling vfs_start_write, but vfs_mark_atime() calls VOP_SETATTR() without calling vfs_start_write(). The correctness of this depends on the VA_MARK_ATIME case of VOP_SETATTR() not writing any more than VOP_READ() would or should. I think ufs_itimes() shouldn't call vn_start_write() any more than ufs_getattr() should. Callers should be aware that GETATTR may write and thus it seems to be necessary for them to call vn_start_write() unconditionally. ufs_itimes() is a utility function that is called from places other than ufs_getattr(). The other places are ufs_*close() and ufs_setattr(). These don't cause any additional problems. VOP_CLOSE() is called from several places which already seem to have sufficient locking. VOP_GETATTR() is called from many places that don't call vn_start_write(), starting with vn_stat(). The updates to timestamps in in ufs_itimes() and ufs_getattr() are still soft (not even delayed writes, but writes to the vnode that will usually become delayed writes later). Perhaps vn_start_write() can be avoided for them, but since they are logically writes this might be hard to implement correctly. ufs_inactive() has to do a (possibly delayed) physical write to force any updates to disk, so it needs strong locking. BTW, ufs_itimes() has some possibly related kludges involving changing from r/w mounts to r/o mounts. Some of the marks for update aren't handled quite right and are still present after the change. Then they want to be turned into updates on a r/o file system. This is impossible. The problem is handled bogusly, essentially by clearing them without doing the update, in a way that triggers seome of my local debugging code. > calling ufs_itimes() with only a shared vnode lock might cause unsafe accesses > to the inode flags. Setting of IN_ACCESS at the end of ffs_read() and > ffs_extread() might also be unsafe. If DIRECTIO is enabled then O_DIRECT reads > might not even attempt to set the IN_ACCESS flag. I think setting it should be safe -- see above. Not setting the IN_ACCESS flag in the early return for the O_DIRECT case is a different bug. read() is supposed to set IN_ACCESS on sucessful completion and returning early for the O_DIRECT case would defeat this. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060822175540.V58720>