Date: Sun, 2 Dec 2007 19:35:36 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Kostik Belousov <kostikbel@gmail.com> Cc: freebsd-fs@freebsd.org, Don Lewis <truckman@freebsd.org> Subject: Re: File remove problem Message-ID: <20071202183809.I1560@besplex.bde.org> In-Reply-To: <20071202055919.GR83121@deviant.kiev.zoral.com.ua> References: <20071201215706.B12006@besplex.bde.org> <200712012207.lB1M7oNg015468@gw.catspoiler.org> <20071202055919.GR83121@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2 Dec 2007, Kostik Belousov wrote: > On Sat, Dec 01, 2007 at 02:07:50PM -0800, Don Lewis wrote: >> On 1 Dec, Bruce Evans wrote: >>> On Sat, 1 Dec 2007, Kostik Belousov wrote: >> >>>> +static int >>>> +ffs_isronly(struct ufsmount *ump) >>>> +{ >>>> + struct fs *fs = ump->um_fs; >>>> + >>>> + return (fs->fs_ronly); >>>> +} >>>> + >>> >>> Could be ump->um_fs->fs_ronly. >> >> That's the change that I would have made. A #include for <ufs/ffs/fs.h> >> would have to be added, which some might argue would be a layering >> violation. I'd prefer to avoid the extra indirection. > > I would argue that the ufs already knows too much about the ffs. But, > this seems to be the first explicit reference to the ffs from the ufs > code. With your approval, see below. It's more like the fourth: - ufs_itimes() is a layering violation. However, with both ffs and ufs needing to set timestamps (for ffs, only in ffs_update()), and with both ffs and ufs both needing to set IN_* all over the place, it isn't clear which layer timstamps belong in. - ufs_vnops.c now includes ffs_extern.h for some reason (5.2 didn't). - ufs_gjournal.c includes both ffs_extern.h and fs.h. It uses ip->i_fs a lot to access the superblock in ufs_gjournal_modref(). > diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c > index 448f436..22e29e9 100644 > --- a/sys/ufs/ufs/ufs_inode.c > +++ b/sys/ufs/ufs/ufs_inode.c > @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 2007/06/22 13:22:37 kib E > #ifdef UFS_GJOURNAL > #include <ufs/ufs/gjournal.h> > #endif > +#include <ufs/ffs/fs.h> > > /* > * Last reference to an inode. If necessary, write or delete it. ufs/ffs includes are conventionally separated from ufs/ufs includes by a blank line. About 2/3 of the files in ufs/ffs follow this convention. > @@ -90,8 +91,7 @@ ufs_inactive(ap) > ufs_gjournal_close(vp); > #endif > if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) || > - (ip->i_nlink <= 0 && > - (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) { > + (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) { > loop: > if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) { > /* Cannot delete file while file system is suspended */ > @@ -121,7 +121,7 @@ ufs_inactive(ap) > } > if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) > softdep_releasefile(ip); > - if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { > + if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) { > #ifdef QUOTA > if (!getinoquota(ip)) > (void)chkiq(ip, -1, NOCRED, FORCE); > Should be ip->i_fs->fs_ronly. The locking for fs_ronly is unclear. It seems to be locked mainly by vn_start_write(), and that enough for everything except probably access time changes. I've now tested the following similar change in ufs_itimes() after removing all other related fixes. % Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.293 % diff -u -2 -r1.293 ufs_vnops.c % --- ufs_vnops.c 8 Nov 2007 17:21:51 -0000 1.293 % +++ ufs_vnops.c 2 Dec 2007 04:56:58 -0000 % @@ -89,4 +89,5 @@ % #endif % % +#include <ufs/ffs/fs.h> % #include <ufs/ffs/ffs_extern.h> % % @@ -137,8 +138,38 @@ % % ip = VTOI(vp); % + /* % + * MNT_RDONLY can barely be trusted here. Full r/o mode is indicated % + * by fs_ronly, and the MNT_RDONLY setting [should] differ from the % + * fs_ronly setting only during transition from r/w mode to r/o mode. % + * We set IN_ACCESS even in full r/o mode, so we must discard it % + * unconditionally here. During the transition, we must convert % + * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect % + * to set IN_MODIFIED. We also set the timestamps indicated by % + * IN_CHANGE | IN_UPDATE normally during the transition, since the % + * update marks may have been set correctly before the transition and % + * not yet converted into timestamps. Callers that set IN_CHANGE | % + * IN_UPDATE during the transition are buggy, since userland writes % + * are supposed to be denied (by MNT_RDONLY checks) during the % + * transition, while kernel writes should should only be for syncs % + * and syncs should not touch timestamps except to convert old % + * update marks to timestamps. Callers that set any update mark or % + * modification flag except IN_ACCESS while in full r/o mode are % + * broken; we will panic for them later. % + */ % if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) % - goto out; % + ip->i_flag &= ~IN_ACCESS; % if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) % return; % + if (ip->i_fs->fs_ronly) { /* XXX locking? */ % + vprint("ufs_itimes_locked: r/o mod", vp); % + /* % + * Should panic here, or return and let ffs_update() panic. % + * The fs_ronly check in ffs_update() is now almost redundant % + * and should not succeed, so it should be replaced by a % + * panic. It detects more invariants failures than we detect % + * here. % + */ % + goto out; % + } % % if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) The comments in this are too verbose. This seems to work for "rm a; mount -u -o ro /f" and "mv a b; mount ...", but since this is without your change to ufs_inactive(), I'm now surprised that it works. I think it works for the simple test cases as follows: - there are no unlinked open files, so ufs_inactive() should have already set up all the needed i/o, and the sync for the mount-update should finish that i/o. - however, in ufs_inactive() seems to be called as part of the sync, and in ~5.2 where it doesn't do any read-only checks, it seems to call ffs_truncate(). This truncate should be null (in the simple test cases), but it seems to have the side effect of generating more i/o (I hope just to convert bogus settings of IN_CHANGE | IN_UPDATE into dinode changes). Then other bugs cause an inconsistent fs if MNT_RDONLY is set. BTW, the other bugs don't affect plain 5.2, since it still has rev.1.182 of ufs_vnops.c to convert the bogus settings of IN_CHANGE | IN_UPDATE into IN_MODIFIED. I was "lucky" to see almost the same bugs in ~5.2 as in -current because I have debugging code in ffs_update() instead of rev.1.182 in ufs_vnops.c, but the debugging code showed too many apparently-harmless problems so it was turned off. In cases involving unlinked open files, the truncation has to be delayed until the sync. Things seem to work reasonably: If a file on the fs is open for read, then mount-update from rw to ro is allowed unless the file is unlinked; if the file is unlinked then there is an EBUSY error unless MNT_FORCE is used, but if MNT_FORCE is used, then the mount-update must be allowed to complete and this involves truncating and otherwise completing the removal of unlinked open files. In -current, your patch should make this work again, and with only my patch above the "update error: blocks 32: files 1" is back because ufs_inactive() doesn't do the truncation. I don't understand how WRITECLOSE inter-operates with this -- mount-update always sets it but there is still an EBUSY error unless MNT_FORCE is used, while MNT_FORCE should kill all opens. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071202183809.I1560>