Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Nov 2007 15:58:55 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        David Cecil <david.cecil@nokia.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: File remove problem
Message-ID:  <20071130151606.F12094@delplex.bde.org>
In-Reply-To: <474F69A7.9090404@nokia.com>
References:  <474F4E46.8030109@nokia.com> <20071130112043.H7217@besplex.bde.org> <474F69A7.9090404@nokia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 30 Nov 2007, David Cecil wrote:

> Thanks Bruce.
>
> Actually, I had found the same problem, and I came up with the first line of 
> your patch (adding IN_MODIFIED) myself, but I still saw the problem.  I

Yes, it's not that.  Testing reminded me that there is normally a
VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long
for unlink (it can only live long for open files).

Testing shows that the problem is easy to reproduce and often partially
detected before it becomes fatal.  I saw something like the following:

     after touch a; ln a b; rm a; unmount -- no problem with 1 link remaining
     after touch a;         rm a; unmount -- no problem with unmount
     after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 link...
     after touch a;       ; rm a; mount -u o ro -- worked once without soft
        updates but seemed to be responsible for a soft update panic later
     after touch a;       ; rm a; mount -u o ro -- usually fails with soft
        updates; the error is detected in various ways:
           under ~5.2, mount -u prints "/f: update error: blocks 0 files 1"
              but succeeds
           under -current, mount -u fails and a subroutine prints
 	     "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c"
 	     However, mount -u apparently cannot afford to fail at this
 	     poing since it has committed to succeeding -- further
 	     mount -u's and unmounts fail and it takes a reboot to reach
 	     an fsck that can fix the problem.

 	  mount -u seems to do some things right: at least under -current:
 	  - it calls ffs_sync() and thus ffs_update() with waitfor != 0.
 	  - IN_MODIFIED is usually already set in ffs_update().
 	  - softdep_update_inode_inodeblock() in ffs_update() seems to
 	    make null changes.  That doesn't seem right -- shouldn't it
 	    update the link count and finish removing the file?...  I
 	    just noticed that ufs_inactive() handles some of this.
 	  - it calls softdep_flushfiles() after doing the sync.  This
 	    doesn't seem to touch the inode.
 	  - apparently, softdep_flushfiles() fails in -current, while in
 	    ~5.2 it bogusly succeeds and then code just after it is called
 	    detects a problem but doesn't handle it.

> didn't pick up on the need for the second line (else if (DOINGASYNC(dvp)) {) 
> though.  It's a default mount, so I don't understand how that will help, i.e. 
> it won't be an async mount, right?

Ignore that.  It is for async mounts, to make them unconditionally async.

> One more point to address Julian's question, the partition is not mounted 
> with soft updates.

Interesting.  I saw no sign of the problem without soft updates except a
panic later after enabling soft updates.  I was running fsck a lot but
may have forgotten one since no error was detected.  The problem should
be easier to understand if it affects non-soft-updates.

[Context lost to top posting]

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071130151606.F12094>