From owner-freebsd-current@FreeBSD.ORG Sat Sep 24 21:23:16 2005 Return-Path: X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F45616A41F; Sat, 24 Sep 2005 21:23:16 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id DBFEA43D48; Sat, 24 Sep 2005 21:23:15 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id j8OLJp5P091812; Sat, 24 Sep 2005 14:22:59 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200509242122.j8OLJp5P091812@gw.catspoiler.org> Date: Sat, 24 Sep 2005 14:19:51 -0700 (PDT) From: Don Lewis To: Tor.Egge@cvsup.no.freebsd.org In-Reply-To: <20050924.190810.74675111.Tor.Egge@cvsup.no.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: tegge@FreeBSD.org, scottl@FreeBSD.org, current@FreeBSD.org, mckusick@FreeBSD.org Subject: Re: soft updates / background fsck directory link count bug X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Sep 2005 21:23:16 -0000 On 24 Sep, Tor Egge wrote: >> Perhaps the dirrem should be put on the inowait list before the call to >> ffs_truncate(). > > If softdep_slowdown() returns a nonzero value then ffs_truncate() can call > ffs_syncvnode() before di_size has been set to 0. If the inodeblock is written > due to fsync() operations on other inodes in the same inodeblock then the > dirrem dependency would be moved to the global work list too early. That's one of the subtle points in this code that I figured was likely suprise me. > Enclosed is a patch that forces an ffs_update() call from ufs_inactive() by > setting the IN_CHANGE flag if i_effnlink is larger than 0 right before the call > to vput(). An alternative is checking i_nlink instead of i_effnlink for faster > rundown. Relying on ufs_inactive() is probably the wrong thing to do because the ufs_inactive() call can be deferred indefinitely if another process holds a reference to the vnode. This is sufficient to cause background fsck to do the wrong thing even in the normal case. scratch:dl 114#fsck -fp /dev/da0s2a /dev/da0s2a: 407146 files, 2563556 used, 2881549 free (23605 frags, 357243 blocks, 0.4% fragmentation) scratch:dl 115#mount /dev/da0s2a /mnt scratch:dl 116#mkdir /mnt/tmp/a /mnt/tmp/a/b scratch:dl 118#(cd /mnt/tmp/a/b && sleep 600) & [1] 3770 scratch:dl 121#sync scratch:dl 122#sleep 60 scratch:dl 123#fsck -fBp /dev/da0s2a /dev/da0s2a: LINK COUNT DIR I=307824 OWNER=root MODE=40755 /dev/da0s2a: SIZE=512 MTIME=Sep 24 13:52 2005 COUNT 3 SHOULD BE 2 (ADJUSTED) /dev/da0s2a: 407148 files, 2563557 used, 2881548 free (23604 frags, 357243 blocks, 0.4% fragmentation) scratch:dl 124#wait [1] Done ( cd /mnt/tmp/a/b && sleep 600 ) scratch:dl 125#ls -lid /mnt/tmp/a 307824 drwxr-xr-x 1 root wheel 512 Sep 24 13:52 /mnt/tmp/a Oops! I think the cleanest fix would be for handle_workitem_remove() to explicity call ffs_update(). Another subtle point is that ufs_inactive() calls vn_write_suspend_wait() before calling UFS_UPDATE(), but I don't think we want to call vn_write_suspend_wait() here.