From owner-freebsd-fs@FreeBSD.ORG Sat Dec 1 07:27:11 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B14416A41B for ; Sat, 1 Dec 2007 07:27:11 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 2CDB413C455 for ; Sat, 1 Dec 2007 07:27:11 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id lB17FlZw011929; Fri, 30 Nov 2007 23:15:51 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200712010715.lB17FlZw011929@gw.catspoiler.org> Date: Fri, 30 Nov 2007 23:15:47 -0800 (PST) From: Don Lewis To: kostikbel@gmail.com In-Reply-To: <20071130052840.GH83121@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Dec 2007 07:27:11 -0000 On 30 Nov, Kostik Belousov wrote: > On Fri, Nov 30, 2007 at 03:58:55PM +1100, Bruce Evans wrote: >> On Fri, 30 Nov 2007, David Cecil wrote: >> >> >Thanks Bruce. >> > >> >Actually, I had found the same problem, and I came up with the first line >> >of your patch (adding IN_MODIFIED) myself, but I still saw the problem. I >> >> Yes, it's not that. Testing reminded me that there is normally a >> VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long >> for unlink (it can only live long for open files). >> >> Testing shows that the problem is easy to reproduce and often partially >> detected before it becomes fatal. I saw something like the following: >> >> after touch a; ln a b; rm a; unmount -- no problem with 1 link remaining >> after touch a; rm a; unmount -- no problem with unmount >> after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 link... >> after touch a; ; rm a; mount -u o ro -- worked once without soft >> updates but seemed to be responsible for a soft update panic later >> after touch a; ; rm a; mount -u o ro -- usually fails with soft >> updates; the error is detected in various ways: >> under ~5.2, mount -u prints "/f: update error: blocks 0 files 1" >> but succeeds >> under -current, mount -u fails and a subroutine prints >> "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c" >> However, mount -u apparently cannot afford to fail at this >> poing since it has committed to succeeding -- further >> mount -u's and unmounts fail and it takes a reboot to reach >> an fsck that can fix the problem. >> >> mount -u seems to do some things right: at least under -current: >> - it calls ffs_sync() and thus ffs_update() with waitfor != 0. >> - IN_MODIFIED is usually already set in ffs_update(). >> - softdep_update_inode_inodeblock() in ffs_update() seems to >> make null changes. That doesn't seem right -- shouldn't it >> update the link count and finish removing the file?... I >> just noticed that ufs_inactive() handles some of this. >> - it calls softdep_flushfiles() after doing the sync. This >> doesn't seem to touch the inode. >> - apparently, softdep_flushfiles() fails in -current, while in >> ~5.2 it bogusly succeeds and then code just after it is called >> detects a problem but doesn't handle it. >> >> >didn't pick up on the need for the second line (else if (DOINGASYNC(dvp)) >> >{) though. It's a default mount, so I don't understand how that will >> >help, i.e. it won't be an async mount, right? >> >> Ignore that. It is for async mounts, to make them unconditionally async. >> >> >One more point to address Julian's question, the partition is not mounted >> >with soft updates. >> >> Interesting. I saw no sign of the problem without soft updates except a >> panic later after enabling soft updates. I was running fsck a lot but >> may have forgotten one since no error was detected. The problem should >> be easier to understand if it affects non-soft-updates. >> >> [Context lost to top posting] >> > > As a speculation, it might be that ufs_inactive() should conditionalize on > fs_ronly instead of MNT_RDONLY. Then, VOP_INACTIVE() would set up the > IN_CHANGE|IN_UPDATE and finally call the ffs_update() ? That sounds reasonable to me. I see that ffs_update(), which is called by ufs_inactive(), looks at fs_ronly. The other difference that I see between the remount to read-only case, which is broken, and the unmount case, which is presumably working, is that the remount case calls ffs_flushfiles with the WRITECLOSE flag, which makes me a little suspicious of the WRITECLOSE code in vflush().