Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Nov 2007 23:15:47 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        kostikbel@gmail.com
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: File remove problem
Message-ID:  <200712010715.lB17FlZw011929@gw.catspoiler.org>
In-Reply-To: <20071130052840.GH83121@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 30 Nov, Kostik Belousov wrote:
> On Fri, Nov 30, 2007 at 03:58:55PM +1100, Bruce Evans wrote:
>> On Fri, 30 Nov 2007, David Cecil wrote:
>> 
>> >Thanks Bruce.
>> >
>> >Actually, I had found the same problem, and I came up with the first line 
>> >of your patch (adding IN_MODIFIED) myself, but I still saw the problem.  I
>> 
>> Yes, it's not that.  Testing reminded me that there is normally a
>> VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long
>> for unlink (it can only live long for open files).
>> 
>> Testing shows that the problem is easy to reproduce and often partially
>> detected before it becomes fatal.  I saw something like the following:
>> 
>>     after touch a; ln a b; rm a; unmount -- no problem with 1 link remaining
>>     after touch a;         rm a; unmount -- no problem with unmount
>>     after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 link...
>>     after touch a;       ; rm a; mount -u o ro -- worked once without soft
>>        updates but seemed to be responsible for a soft update panic later
>>     after touch a;       ; rm a; mount -u o ro -- usually fails with soft
>>        updates; the error is detected in various ways:
>>           under ~5.2, mount -u prints "/f: update error: blocks 0 files 1"
>>              but succeeds
>>           under -current, mount -u fails and a subroutine prints
>> 	     "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c"
>> 	     However, mount -u apparently cannot afford to fail at this
>> 	     poing since it has committed to succeeding -- further
>> 	     mount -u's and unmounts fail and it takes a reboot to reach
>> 	     an fsck that can fix the problem.
>> 
>> 	  mount -u seems to do some things right: at least under -current:
>> 	  - it calls ffs_sync() and thus ffs_update() with waitfor != 0.
>> 	  - IN_MODIFIED is usually already set in ffs_update().
>> 	  - softdep_update_inode_inodeblock() in ffs_update() seems to
>> 	    make null changes.  That doesn't seem right -- shouldn't it
>> 	    update the link count and finish removing the file?...  I
>> 	    just noticed that ufs_inactive() handles some of this.
>> 	  - it calls softdep_flushfiles() after doing the sync.  This
>> 	    doesn't seem to touch the inode.
>> 	  - apparently, softdep_flushfiles() fails in -current, while in
>> 	    ~5.2 it bogusly succeeds and then code just after it is called
>> 	    detects a problem but doesn't handle it.
>> 
>> >didn't pick up on the need for the second line (else if (DOINGASYNC(dvp)) 
>> >{) though.  It's a default mount, so I don't understand how that will 
>> >help, i.e. it won't be an async mount, right?
>> 
>> Ignore that.  It is for async mounts, to make them unconditionally async.
>> 
>> >One more point to address Julian's question, the partition is not mounted 
>> >with soft updates.
>> 
>> Interesting.  I saw no sign of the problem without soft updates except a
>> panic later after enabling soft updates.  I was running fsck a lot but
>> may have forgotten one since no error was detected.  The problem should
>> be easier to understand if it affects non-soft-updates.
>> 
>> [Context lost to top posting]
>> 
> 
> As a speculation, it might be that ufs_inactive() should conditionalize on
> fs_ronly instead of MNT_RDONLY. Then, VOP_INACTIVE() would set up the
> IN_CHANGE|IN_UPDATE and finally call the ffs_update() ?

That sounds reasonable to me.  I see that ffs_update(), which is called
by ufs_inactive(), looks at fs_ronly.

The other difference that I see between the remount to read-only case,
which is broken, and the unmount case, which is presumably working, is
that the remount case calls ffs_flushfiles with the WRITECLOSE flag,
which makes me a little suspicious of the WRITECLOSE code in vflush().




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200712010715.lB17FlZw011929>