Date: Fri, 30 Nov 2007 15:58:55 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: David Cecil <david.cecil@nokia.com> Cc: freebsd-fs@freebsd.org Subject: Re: File remove problem Message-ID: <20071130151606.F12094@delplex.bde.org> In-Reply-To: <474F69A7.9090404@nokia.com> References: <474F4E46.8030109@nokia.com> <20071130112043.H7217@besplex.bde.org> <474F69A7.9090404@nokia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 30 Nov 2007, David Cecil wrote: > Thanks Bruce. > > Actually, I had found the same problem, and I came up with the first line of > your patch (adding IN_MODIFIED) myself, but I still saw the problem. I Yes, it's not that. Testing reminded me that there is normally a VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long for unlink (it can only live long for open files). Testing shows that the problem is easy to reproduce and often partially detected before it becomes fatal. I saw something like the following: after touch a; ln a b; rm a; unmount -- no problem with 1 link remaining after touch a; rm a; unmount -- no problem with unmount after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 link... after touch a; ; rm a; mount -u o ro -- worked once without soft updates but seemed to be responsible for a soft update panic later after touch a; ; rm a; mount -u o ro -- usually fails with soft updates; the error is detected in various ways: under ~5.2, mount -u prints "/f: update error: blocks 0 files 1" but succeeds under -current, mount -u fails and a subroutine prints "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c" However, mount -u apparently cannot afford to fail at this poing since it has committed to succeeding -- further mount -u's and unmounts fail and it takes a reboot to reach an fsck that can fix the problem. mount -u seems to do some things right: at least under -current: - it calls ffs_sync() and thus ffs_update() with waitfor != 0. - IN_MODIFIED is usually already set in ffs_update(). - softdep_update_inode_inodeblock() in ffs_update() seems to make null changes. That doesn't seem right -- shouldn't it update the link count and finish removing the file?... I just noticed that ufs_inactive() handles some of this. - it calls softdep_flushfiles() after doing the sync. This doesn't seem to touch the inode. - apparently, softdep_flushfiles() fails in -current, while in ~5.2 it bogusly succeeds and then code just after it is called detects a problem but doesn't handle it. > didn't pick up on the need for the second line (else if (DOINGASYNC(dvp)) {) > though. It's a default mount, so I don't understand how that will help, i.e. > it won't be an async mount, right? Ignore that. It is for async mounts, to make them unconditionally async. > One more point to address Julian's question, the partition is not mounted > with soft updates. Interesting. I saw no sign of the problem without soft updates except a panic later after enabling soft updates. I was running fsck a lot but may have forgotten one since no error was detected. The problem should be easier to understand if it affects non-soft-updates. [Context lost to top posting] Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071130151606.F12094>