Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Nov 2007 15:25:59 +1000
From:      David Cecil <david.cecil@nokia.com>
To:        ext Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: File remove problem
Message-ID:  <474F9EE7.5050004@nokia.com>
In-Reply-To: <20071130151606.F12094@delplex.bde.org>
References:  <474F4E46.8030109@nokia.com> <20071130112043.H7217@besplex.bde.org> <474F69A7.9090404@nokia.com> <20071130151606.F12094@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help


ext Bruce Evans wrote:
> On Fri, 30 Nov 2007, David Cecil wrote:
>
>> Thanks Bruce.
>>
>> Actually, I had found the same problem, and I came up with the first 
>> line of your patch (adding IN_MODIFIED) myself, but I still saw the 
>> problem.  I
>
> Yes, it's not that.  Testing reminded me that there is normally a
> VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long
> for unlink (it can only live long for open files).
>
> Testing shows that the problem is easy to reproduce and often partially
> detected before it becomes fatal.  I saw something like the following:
>
>     after touch a; ln a b; rm a; unmount -- no problem with 1 link 
> remaining
>     after touch a;         rm a; unmount -- no problem with unmount
>     after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 
> link...
>     after touch a;       ; rm a; mount -u o ro -- worked once without 
> soft
>        updates but seemed to be responsible for a soft update panic later
>     after touch a;       ; rm a; mount -u o ro -- usually fails with soft
>        updates; the error is detected in various ways:
>           under ~5.2, mount -u prints "/f: update error: blocks 0 
> files 1"
>              but succeeds
>           under -current, mount -u fails and a subroutine prints
>          "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c"
>          However, mount -u apparently cannot afford to fail at this
>          poing since it has committed to succeeding -- further
>          mount -u's and unmounts fail and it takes a reboot to reach
>          an fsck that can fix the problem.
>
>       mount -u seems to do some things right: at least under -current:
>       - it calls ffs_sync() and thus ffs_update() with waitfor != 0.

Do you know it calls it for this vnode?  I'm going to try and verify that.

>       - IN_MODIFIED is usually already set in ffs_update().
>       - softdep_update_inode_inodeblock() in ffs_update() seems to
>         make null changes.  That doesn't seem right -- shouldn't it
>         update the link count and finish removing the file?...  I
>         just noticed that ufs_inactive() handles some of this.
>       - it calls softdep_flushfiles() after doing the sync.  This
>         doesn't seem to touch the inode.
>       - apparently, softdep_flushfiles() fails in -current, while in
>         ~5.2 it bogusly succeeds and then code just after it is called
>         detects a problem but doesn't handle it.
>
>> One more point to address Julian's question, the partition is not 
>> mounted with soft updates.
>
> Interesting.  I saw no sign of the problem without soft updates except a
> panic later after enabling soft updates.  I was running fsck a lot but
> may have forgotten one since no error was detected.  The problem should
> be easier to understand if it affects non-soft-updates.

It is not especially easy to reproduce.  The only reliable mechanism I 
have involves mounting rw, removing a file, and remount ro during the 
boot cycle.  I can only guess it's timing related and this increases the 
chance of reproducing the problem.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?474F9EE7.5050004>