From owner-freebsd-fs@FreeBSD.ORG Sat Dec 1 18:33:44 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB0F716A417 for ; Sat, 1 Dec 2007 18:33:44 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id 5372E13C44B for ; Sat, 1 Dec 2007 18:33:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lB1IXdNs000991 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 2 Dec 2007 05:33:41 +1100 Date: Sun, 2 Dec 2007 05:33:39 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20071202020213.I2849@besplex.bde.org> Message-ID: <20071202043649.I1094@besplex.bde.org> References: <474F4E46.8030109@nokia.com> <20071130112043.H7217@besplex.bde.org> <474F69A7.9090404@nokia.com> <20071130151606.F12094@delplex.bde.org> <20071130052840.GH83121@deviant.kiev.zoral.com.ua> <20071201030305.G1170@besplex.bde.org> <20071202020213.I2849@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: File remove problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Dec 2007 18:33:44 -0000 On Sun, 2 Dec 2007, Bruce Evans wrote: > Here is a non-hackish patch which explains why ignoring MNT_RDONLY in > the above or in ffs_mount() helps. It just fixes the confusion between > IN_MODIFIED and IN_CHANGE in critical places. > > % Index: ffs_softdep.c > ... > > Without this change, soft updates depends on IN_CHANGE being converted > to IN_MODIFIED by ufs_itimes(), but this conversion doesn't happen > when MNT_RDONLY is set. With soft updates, changes are often delayed > until sync time, and when the sync is for mount-update it is done after > setting MNT_RDONLY so the above doesn't work. > ... > Without the above patch, after "mv a b", "fsync b" is insufficient to > make "mount -u -o ro /f" not corrupt the fs (because the mv generates > 3 or 4 entries in the worklist, and the fsync generates 0 or 1 more > and always ends up with 1 or 2 not handled, and then the sync for > mount-update usually doesn't make any more progress). I didn't check > if a short delay after the fsync is sufficient, but expect that it is. > With the above patch, a loop doing "mv a b; mount -u -o ro /f; fsck > -n /f; mount -u -o rw /f; mv b a" shows no problems. However, the > original problem was without soft updates, so there must be more bugs > here. The above seems to fix "mv a b", but for "rm a" it only restores the previous symptoms of the bug -- the "failed to flush worklist" error is gone but the "update error" is back. The following simiilar patch seems to fix "rm a" in 5.2: % Index: ufs_inode.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_inode.c,v % retrieving revision 1.53 % diff -u -2 -r1.53 ufs_inode.c % --- ufs_inode.c 7 Apr 2004 03:47:20 -0000 1.53 % +++ ufs_inode.c 1 Dec 2007 17:26:30 -0000 % @@ -108,5 +111,5 @@ % ip->i_mode = 0; % DIP(ip, i_mode) = 0; % - ip->i_flag |= IN_CHANGE | IN_UPDATE; % + ip->i_flag |= IN_MODIFIED; % if (DOINGSOFTDEP(vp)) % softdep_change_linkcnt(ip); This is in ufs_inactive(). Soft updates calls back here from the sync. softdep_change_linkcount() just creates a dependency and depends on its callers to set IN_* flags which will cause i/o to flush the dependency. Here the caller sets flags that had no effect in the MNT_RDONLY case, the same as in the previous patch. I removed the IN_CHANGE and IN_UPDATE flags completely since they just clobber the previous times (set by a non-delayed ufs_inactive but probably already clobbered by a delayed truncation). (We are clearing out parts of the inode, so the times don't really matter, but unless something else clears the times it is better to leave the non-delayed times which record when the update marks for the userland remove were converted to timestamps.) Just before the above, the file is truncated. ffs_truncate() has the usual confusion between IN_CHANGE and IN_MODIFIED. It sets (IN_CHANGE | IN_UPDATE) in 7 different places, but never sets IN_MODIFIED. In my test case of "rm a; mount -u -o ro /f", this isn't a problem, because IN_MODIFIED is usually already set set ffs_truncate() doesn't forget to start i/o. ffs_truncate() calls ffs_update() with waitfor == 1 in many cases, but in my test case it always calls ffs_update() with waitfor == 0. Any call to ffs_update() in it cancels any previous setting of IN_MODIFIED when modifications are moved to the buffer. Thus when we forget to set the IN_MODIFIED in the above code, in the MNT_RDONLY case we always have inconsistencies: - non-soft updates case: we have at least a cleared i_rdev and i_mode with no means to write them. Is this enough to cause the original problem? - soft updates case: we also have a dependency with no means to flush it. I think this change doesn't help for -current yet, since this code is unreachable when MNT_RDONLY is set. The truncation code is also unreachable. kib's patch makes both reachable again. I observe/predict/explain the following error behaviours for soft updates (still some guesses): -current the first patch: "/f: update error: blocks 32 files 1" (because the truncation and other things aren't done, but worklist items aren't created for them so the problem isn't detected in softdep_waitidle). -5.2 with first patch: "/f: update error: blocks 0 files 1" (because the truncation works accidentally, and softdep_waitidle() doesn't exist to detect the unflushable worklist item created by the reachable softdep_change_linkcnt() call). Panic later (due to broken worklist after a buggy remount is permitted?). -current with first patch and kib's patch: "softdep_waitidle: ..." (because the bug fixed in the second patch is exposed again and softdep_waitidle() detects it). Bruce