From owner-freebsd-fs@FreeBSD.ORG Fri Nov 30 07:14:48 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42FD816A542 for ; Fri, 30 Nov 2007 07:14:48 +0000 (UTC) (envelope-from david.cecil@nokia.com) Received: from mgw-mx09.nokia.com (smtp.nokia.com [192.100.105.134]) by mx1.freebsd.org (Postfix) with ESMTP id F25C313C458 for ; Fri, 30 Nov 2007 07:14:47 +0000 (UTC) (envelope-from david.cecil@nokia.com) Received: from esebh108.NOE.Nokia.com (esebh108.ntc.nokia.com [172.21.143.145]) by mgw-mx09.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id lAU7DrQe017046; Fri, 30 Nov 2007 01:14:56 -0600 Received: from esebh104.NOE.Nokia.com ([172.21.143.34]) by esebh108.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 30 Nov 2007 09:14:03 +0200 Received: from syebe101.NOE.Nokia.com ([172.30.128.65]) by esebh104.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 30 Nov 2007 09:14:03 +0200 Received: from [172.30.67.20] ([172.30.67.20]) by syebe101.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 30 Nov 2007 18:13:59 +1100 Message-ID: <474FB836.5060905@nokia.com> Date: Fri, 30 Nov 2007 17:13:58 +1000 From: David Cecil User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: ext Bruce Evans References: <474F4E46.8030109@nokia.com> <20071130112043.H7217@besplex.bde.org> <474F69A7.9090404@nokia.com> <20071130151606.F12094@delplex.bde.org> In-Reply-To: <20071130151606.F12094@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 30 Nov 2007 07:13:59.0360 (UTC) FILETIME=[93E9C400:01C83320] X-Nokia-AV: Clean Cc: freebsd-fs@freebsd.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Nov 2007 07:14:48 -0000 I've determined the following for the scenario I have. These steps are executed during the boot cycle, and I reproduce the problem about 1 in 5-10 times: 1. mount -u -w / 2. rm -f /etc/myfile 3. mount -u -o ro / 1. finished Remounted R/W 2. started ufs_remove 786 ffs_truncate 268 ffs_update 87 ffs_update 92 ffs_update 99 ffs_update 140 ffs_update 87 ffs_update 92 ffs_update 99 ffs_update 140 2. finished: Removed file 3. Finished Remounted R/O Note that line 140 in ffs_update is the call to bdwrite, not bwrite. Investigations ongoing... Dave ext Bruce Evans wrote: > On Fri, 30 Nov 2007, David Cecil wrote: > >> Thanks Bruce. >> >> Actually, I had found the same problem, and I came up with the first >> line of your patch (adding IN_MODIFIED) myself, but I still saw the >> problem. I > > Yes, it's not that. Testing reminded me that there is normally a > VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long > for unlink (it can only live long for open files). > > Testing shows that the problem is easy to reproduce and often partially > detected before it becomes fatal. I saw something like the following: > > after touch a; ln a b; rm a; unmount -- no problem with 1 link > remaining > after touch a; rm a; unmount -- no problem with unmount > after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 > link... > after touch a; ; rm a; mount -u o ro -- worked once without > soft > updates but seemed to be responsible for a soft update panic later > after touch a; ; rm a; mount -u o ro -- usually fails with soft > updates; the error is detected in various ways: > under ~5.2, mount -u prints "/f: update error: blocks 0 > files 1" > but succeeds > under -current, mount -u fails and a subroutine prints > "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c" > However, mount -u apparently cannot afford to fail at this > poing since it has committed to succeeding -- further > mount -u's and unmounts fail and it takes a reboot to reach > an fsck that can fix the problem. > > mount -u seems to do some things right: at least under -current: > - it calls ffs_sync() and thus ffs_update() with waitfor != 0. > - IN_MODIFIED is usually already set in ffs_update(). > - softdep_update_inode_inodeblock() in ffs_update() seems to > make null changes. That doesn't seem right -- shouldn't it > update the link count and finish removing the file?... I > just noticed that ufs_inactive() handles some of this. > - it calls softdep_flushfiles() after doing the sync. This > doesn't seem to touch the inode. > - apparently, softdep_flushfiles() fails in -current, while in > ~5.2 it bogusly succeeds and then code just after it is called > detects a problem but doesn't handle it. > >> didn't pick up on the need for the second line (else if >> (DOINGASYNC(dvp)) {) though. It's a default mount, so I don't >> understand how that will help, i.e. it won't be an async mount, right? > > Ignore that. It is for async mounts, to make them unconditionally async. > >> One more point to address Julian's question, the partition is not >> mounted with soft updates. > > Interesting. I saw no sign of the problem without soft updates except a > panic later after enabling soft updates. I was running fsck a lot but > may have forgotten one since no error was detected. The problem should > be easier to understand if it affects non-soft-updates. > > [Context lost to top posting] > > Bruce >