Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Dec 2007 19:35:36 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        freebsd-fs@freebsd.org, Don Lewis <truckman@freebsd.org>
Subject:   Re: File remove problem
Message-ID:  <20071202183809.I1560@besplex.bde.org>
In-Reply-To: <20071202055919.GR83121@deviant.kiev.zoral.com.ua>
References:  <20071201215706.B12006@besplex.bde.org> <200712012207.lB1M7oNg015468@gw.catspoiler.org> <20071202055919.GR83121@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2 Dec 2007, Kostik Belousov wrote:

> On Sat, Dec 01, 2007 at 02:07:50PM -0800, Don Lewis wrote:
>> On  1 Dec, Bruce Evans wrote:
>>> On Sat, 1 Dec 2007, Kostik Belousov wrote:
>>
>>>> +static int
>>>> +ffs_isronly(struct ufsmount *ump)
>>>> +{
>>>> +	struct fs *fs = ump->um_fs;
>>>> +
>>>> +	return (fs->fs_ronly);
>>>> +}
>>>> +
>>>
>>> Could be ump->um_fs->fs_ronly.
>>
>> That's the change that I would have made.  A #include for <ufs/ffs/fs.h>
>> would have to be added, which some might argue would be a layering
>> violation.  I'd prefer to avoid the extra indirection.
>
> I would argue that the ufs already knows too much about the ffs. But,
> this seems to be the first explicit reference to the ffs from the ufs
> code. With your approval, see below.

It's more like the fourth:
- ufs_itimes() is a layering violation.  However, with both ffs and ufs
   needing to set timestamps (for ffs, only in ffs_update()), and with
   both ffs and ufs both needing to set IN_* all over the place, it isn't
   clear which layer timstamps belong in.
- ufs_vnops.c now includes ffs_extern.h for some reason (5.2 didn't).
- ufs_gjournal.c includes both ffs_extern.h and fs.h.  It uses ip->i_fs
   a lot to access the superblock in ufs_gjournal_modref().

> diff --git a/sys/ufs/ufs/ufs_inode.c b/sys/ufs/ufs/ufs_inode.c
> index 448f436..22e29e9 100644
> --- a/sys/ufs/ufs/ufs_inode.c
> +++ b/sys/ufs/ufs/ufs_inode.c
> @@ -60,6 +60,7 @@ __FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_inode.c,v 1.69 2007/06/22 13:22:37 kib E
> #ifdef UFS_GJOURNAL
> #include <ufs/ufs/gjournal.h>
> #endif
> +#include <ufs/ffs/fs.h>
>
> /*
>  * Last reference to an inode.  If necessary, write or delete it.

ufs/ffs includes are conventionally separated from ufs/ufs includes by a
blank line.  About 2/3 of the files in ufs/ffs follow this convention.

> @@ -90,8 +91,7 @@ ufs_inactive(ap)
> 	ufs_gjournal_close(vp);
> #endif
> 	if ((ip->i_effnlink == 0 && DOINGSOFTDEP(vp)) ||
> -	    (ip->i_nlink <= 0 &&
> -	     (vp->v_mount->mnt_flag & MNT_RDONLY) == 0)) {
> +	    (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly)) {
> 	loop:
> 		if (vn_start_secondary_write(vp, &mp, V_NOWAIT) != 0) {
> 			/* Cannot delete file while file system is suspended */
> @@ -121,7 +121,7 @@ ufs_inactive(ap)
> 	}
> 	if (ip->i_effnlink == 0 && DOINGSOFTDEP(vp))
> 		softdep_releasefile(ip);
> -	if (ip->i_nlink <= 0 && (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
> +	if (ip->i_nlink <= 0 && !VFSTOUFS(mp)->um_fs->fs_ronly) {
> #ifdef QUOTA
> 		if (!getinoquota(ip))
> 			(void)chkiq(ip, -1, NOCRED, FORCE);
>

Should be ip->i_fs->fs_ronly.

The locking for fs_ronly is unclear.  It seems to be locked mainly by
vn_start_write(), and that enough for everything except probably access
time changes.

I've now tested the following similar change in ufs_itimes() after
removing all other related fixes.

% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.293
% diff -u -2 -r1.293 ufs_vnops.c
% --- ufs_vnops.c	8 Nov 2007 17:21:51 -0000	1.293
% +++ ufs_vnops.c	2 Dec 2007 04:56:58 -0000
% @@ -89,4 +89,5 @@
%  #endif
% 
% +#include <ufs/ffs/fs.h>
%  #include <ufs/ffs/ffs_extern.h>
% 
% @@ -137,8 +138,38 @@
% 
%  	ip = VTOI(vp);
% +	/*
% +	 * MNT_RDONLY can barely be trusted here.  Full r/o mode is indicated
% +	 * by fs_ronly, and the MNT_RDONLY setting [should] differ from the
% +	 * fs_ronly setting only during transition from r/w mode to r/o mode.
% +	 * We set IN_ACCESS even in full r/o mode, so we must discard it
% +	 * unconditionally here.  During the transition, we must convert
% +	 * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect
% +	 * to set IN_MODIFIED.  We also set the timestamps indicated by
% +	 * IN_CHANGE | IN_UPDATE normally during the transition, since the
% +	 * update marks may have been set correctly before the transition and
% +	 * not yet converted into timestamps.  Callers that set IN_CHANGE |
% +	 * IN_UPDATE during the transition are buggy, since userland writes
% +	 * are supposed to be denied (by MNT_RDONLY checks) during the
% +	 * transition, while kernel writes should should only be for syncs
% +	 * and syncs should not touch timestamps except to convert old
% +	 * update marks to timestamps.  Callers that set any update mark or
% +	 * modification flag except IN_ACCESS while in full r/o mode are
% +	 * broken; we will panic for them later.
% +	 */
%  	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
% -		goto out;
% +		ip->i_flag &= ~IN_ACCESS;
%  	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
%  		return;
% +	if (ip->i_fs->fs_ronly) {	/* XXX locking? */
% +		vprint("ufs_itimes_locked: r/o mod", vp);
% +		/*
% +		 * Should panic here, or return and let ffs_update() panic.
% +		 * The fs_ronly check in ffs_update() is now almost redundant
% +		 * and should not succeed, so it should be replaced by a
% +		 * panic.  It detects more invariants failures than we detect
% +		 * here.
% +		 */
% +		goto out;
% +	}
% 
%  	if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))

The comments in this are too verbose.

This seems to work for "rm a; mount -u -o ro /f" and "mv a b; mount ...",
but since this is without your change to ufs_inactive(), I'm now surprised
that it works.  I think it works for the simple test cases as follows:

- there are no unlinked open files, so ufs_inactive() should have already
   set up all the needed i/o, and the sync for the mount-update should
   finish that i/o.

- however, in ufs_inactive() seems to be called as part of the sync, and
   in ~5.2 where it doesn't do any read-only checks, it seems to call
   ffs_truncate().  This truncate should be null (in the simple test
   cases), but it seems to have the side effect of generating more i/o
   (I hope just to convert bogus settings of IN_CHANGE | IN_UPDATE into
   dinode changes).  Then other bugs cause an inconsistent fs if MNT_RDONLY
   is set.

   BTW, the other bugs don't affect plain 5.2, since it still has
   rev.1.182 of ufs_vnops.c to convert the bogus settings of IN_CHANGE |
   IN_UPDATE into IN_MODIFIED.  I was "lucky" to see almost the same
   bugs in ~5.2 as in -current because I have debugging code in
   ffs_update() instead of rev.1.182 in ufs_vnops.c, but the debugging
   code showed too many apparently-harmless problems so it was turned
   off.

In cases involving unlinked open files, the truncation has to be delayed
until the sync.  Things seem to work reasonably:  If a file on the fs
is open for read, then mount-update from rw to ro is allowed unless the
file is unlinked; if the file is unlinked then there is an EBUSY error
unless MNT_FORCE is used, but if MNT_FORCE is used, then the mount-update
must be allowed to complete and this involves truncating and otherwise
completing the removal of unlinked open files.  In -current, your patch
should make this work again, and with only my patch above the
"update error: blocks 32: files 1" is back because ufs_inactive() doesn't
do the truncation.

I don't understand how WRITECLOSE inter-operates with this -- mount-update
always sets it but there is still an EBUSY error unless MNT_FORCE is
used, while MNT_FORCE should kill all opens.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071202183809.I1560>