Date: Mon, 3 Dec 2007 21:11:36 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Don Lewis <truckman@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: File remove problem Message-ID: <20071203202947.N1698@besplex.bde.org> In-Reply-To: <200712030517.lB35HgtK039158@gw.catspoiler.org> References: <200712030517.lB35HgtK039158@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2 Dec 2007, Don Lewis wrote: > On 3 Dec, Bruce Evans wrote: >> On Sun, 2 Dec 2007, Don Lewis wrote: >> >>> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in >>> addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This >>> also looks like it should be a reasonable optimization for read-only >>> file systems that should eliminate unnecessary work at the lower levels >>> of the code. >> >> But I let these happen and discard IN_ATIME marks if fs_ronly. I >> thought that the optimization went the other way -- unconditionally >> setting the marks was very efficient, and discarding them in ufs_itimes() >> was efficient too. I think this is still true now with larger locking >> overheads, and the marks should be discarded later in the MNT_NOATIME >> case too. It is expected that the marks are set much more often than >> they are looked at by ufs_itimes(), since most calls to ufs_itimes() >> are in close() and read() is much more common than close(). > > ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking > MNT_RDONLY there as well is free. Setting and clearing the mark will > consume a few instruction cycles, dirty a cache line, and increase main > memory write-back traffic, though the expense is likely to be small. The check can also avoid the new vnode locking for useless settings of IN_ATIME. But what locks the MNT flags? Nothing directly I think. Here we must not care if we read a stale value, and ufs_itimes() must not care if the value changed just after we read it. > Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set > on a downgrade to read-only seems to be the right thing to do. Either reads or ufs_itimes() must use MNT_RDONLY to prevent changes to the inode after MNT_RDONLY is set early in the r/w to r/o transition. Checking MNT_RDONLY gives the more correct behaviour of not having to discard even IN_ATIME settings that were made before the transition began. I don't understand how unmount (apparently) works so well without setting MNT_RDONLY to prevent further writes like the transition does. >> ufs_itimes() >> is also called in stat() but I think that is less common than close() >> (except for some tree walks). WIth non-delayed marking, ufs_itimes() >> would still have to check fs_ronly, and the only gain would be that >> it could then skip checking the marks except as an invariants check. >> But it can gain like that even with delayed setting -- just ignore any >> old marks while fs_ronly (except as an invariants check), but clear them >> at mount or unmount time so that there shouldn't be any. > > I think that setting the marks when the file system is read-only causes > the syncer to do extra work. I think that ffs_sync() still gets called > if the file system is read-only, and if it encounters any inodes with > marks set, it calls ffs_syncvnode() on them. I think VOP_SYNC() actually isn't called on r/o file systems. Callers check MNT_RDONLY or possibly dirty block list pointers. msdosfs had a buf that would have caused panics if msdosfs_sync() were called on an fs that had ever been mounted r/w but is currently r/o. >>> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY >>> check, appears to be protected by the MNT_RDONLY check in >>> vfs_mark_atime(). >> >> Thanks, I had forgotten about that. In vfs_mark_atime(), there is much >> more efficiency to be gained by not setting marks that will be discarded, >> since it takes a VOP to set them and many file systems don't support >> this setting. However, it is hard for vfs_mark_atime() to know when the >> mark will be discarded without calling the fs: >> >> - it already doesn't know which fs's support it >> - it should be checking fs_ronly for ffs > > I think that MNT_RDONLY is correct here. We want to stop new atime > updates as soon as the downgrade starts, just like we stop new > user-initiated writes. Right. Same as for normal read accesses or delayed killing of IN_ACCESS in ufs_itimes(). >> - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY >> >> fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious >> too. Upper layers set the MNT flags before giving VOP_MOUNT() a chance >> to adjust the marks. This is automatically safe in one direction only >> (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops >> changes), and always bad for strict invariants. > > Maybe a reasonable way to handle this would be to set the > flags before calling VOP_MOUNT() when they are being changed from 0 to > 1, and clear them after calling VOP_MOUNT() when changing them from 1 to > 0. Adding explicit locking sounds painful ... This already happens for MNT_RDONLY. ffs_mount() has dead code which obfuscates this and other things by setting all the generic flags again. It gets the timing of the setting of MNT_RDONLY backwards by delaying the setting until the end of the transition from r/w to r/o, but this has no effect since MNT_RDONLY is set throughout the transition. It only gets the timing of the clearing of MNT_RDONLY right by delaying it until the end of the transition from r/o to r/w. Some flags have the wrong sense for this to be right. E.g., early clearing of MNT_ASYNC is safe, but early setting of it is not. tegge fixed some races from this. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071203202947.N1698>