Date: Thu, 24 Aug 2006 04:43:26 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Tor Egge <Tor.Egge@cvsup.no.freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: Deadlock between nfsd and snapshots. Message-ID: <20060824040523.B64391@delplex.bde.org> In-Reply-To: <20060823.163111.41690377.Tor.Egge@cvsup.no.freebsd.org> References: <20060822130743.GL56637@deviant.kiev.zoral.com.ua> <20060822.214638.74697110.Tor.Egge@cvsup.no.freebsd.org> <20060823203148.M62850@delplex.bde.org> <20060823.163111.41690377.Tor.Egge@cvsup.no.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 23 Aug 2006, Tor Egge wrote: >>> Protecting the existing i_flag and the timestamps with the vnode interlock >>> when the current thread only has a shared vnode lock should be sufficient >>> to protect against the races, removing the need for #3, #4 and #4 below. > >> I agree that this should be sufficient. Don't know if it is. Actually, >> I thought that the vnode lock was more exclusive. How can a shared lock >> work even for a reader if a writer is changing inode contents? > > Holding a shared lock prevents others from holding an exclusive lock, thus > the possible modifications to the inode are limited (i_flags and timestamps). Seems to be not much of a problem (since vn_stat() acquires an exclusive lock. >>> There are some constraints with regards to setting IN_MODIFIED on an inode. >>> If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe to set >>> IN_MODIFIED since the file system might be suspended or in the process of >>> being suspended with the vnode sync loop in ffs_sync() having iterated past >>> the vnode. > >> The case can't happen. IN_CHANGE is always set if IN_MODIFIED will >> (or should be) set later. There are some buggy cases where the combined >> setting of { IN_CHANGE, IN_MODIFIED } is incorrect, but these don't cause >> any problems here. We just have to avoid setting (or having to set) >> IN_MODIFED when setting it is not safe. Hopefully this is only in suspend >> mode. > > If VOP_READ() has recently been called then IN_ACCESS might be set without > IN_CHANGE, IN_UPDATE or IN_MODIFIED set. That's when ufs_itimes() needs to be > careful. The VOP_READ() call might have been performed after the vnode was > handled in the vnode sync loop. Certainly. You didn't mention IN_ACCESS originally, and I thought about it but didn't mention it either. For IN_ACCESS, we can consider the change to not actually have occured (and thus force a setting of IN_MODIFED) until we set the atime, or even later with IN_LAZYMOD, or never if we are really lazy. We can set IN_ACCESS in ffs_read() safely because it's not in the dinode (and vn_rdwr() acquires an exclusive lock). We can decide not to convert it to IN_MODIFIED later if this is too hard. Not setting the atime properly is harmless compared with not setting other timestamps. I should only have said that "IN_CHANGE is always set if IN_MODIFIED will be set later due to a _disk_ inode change that has _already_ occurred" (IN_ACCESS and IN_UPDATE both give in-core-only inode changes that will affect the disk inode later). >>> If the file system is suspended then IN_MODIFIED cannot be set. If >>> IN_MODIFIED, IN_CHANGE or IN_UPDATE is set and the file system is suspended >>> then something is wrong. > >> I think ufs_itimes() needs to use a non-blocking vn_start_write() and do >> nothing (except perhaps assert that the above harmful IN_* flags are not >> set) if it (ufs_itimes()) would set IN_MODIFIED. > > If a nonblocking vn_start_write() call fails then you don't know if the file > system is suspended or in the processes of being suspended. If the file system > is in the process of being suspended then the vnode sync loop might still be > running. Yes. I missed the different suspend flags. Now I'll ask for a vfs level function to avoid accessing the is-suspended flag directly. >> - problem for atime updates caused by reads. These become writes while in >> suspend mode. We want reads to work in suspend mode, so we cannout >> disallow reads, and we cannot disallow the implicit writes without >> breaking atime semantics. This problem can be partly avoided by >> ignoring IN_ACCESS in ufs_itimes() while in suspend mode, or by >> converting it to IN_LAZYMOD. If the inode gets reclaimed then we >> lose the atime update; otherwise the atime gets updated some time >> after we leave suspend mode, and either way the update doesn't go >> to snapshots, as is necessary for having a coherent snapshot. > > vnodes are not reclaimed while the file system is suspended or in the > processess of being suspended. Does this mean that big tree walks soon run out of vnodes and cause everything to block until the file system is unsuspended? >> - problem for IN_LAZYMOD in ufs_reclaim(). Currently not reached. A quick >> fix would be to lose whatever updates are being done lazily. > > What problem ? It blindly converts IN_LAZYMOD to IN_MODIFIED and calls ufs_update(), but it has no problem with suspend if it is not called while suspended. It isn't missing vnode locking like I first thought -- the locking annotation was just wrong in the old version of vnode_if.src that I was looking at. The locking used to be "L L L" but was documented as "U U U". Now it is "E E E" and it probably needs to be exclusive to set IN_MODIFIED. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060824040523.B64391>