From owner-freebsd-current@FreeBSD.ORG Sat Sep 24 19:08:14 2005 Return-Path: X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 142DD16A41F; Sat, 24 Sep 2005 19:08:14 +0000 (GMT) (envelope-from Tor.Egge@cvsup.no.freebsd.org) Received: from pil.idi.ntnu.no (pil.idi.ntnu.no [129.241.107.93]) by mx1.FreeBSD.org (Postfix) with ESMTP id 82C8843D48; Sat, 24 Sep 2005 19:08:13 +0000 (GMT) (envelope-from Tor.Egge@cvsup.no.freebsd.org) Received: from cvsup.no.freebsd.org (c2h5oh.idi.ntnu.no [129.241.103.69]) by pil.idi.ntnu.no (8.13.1/8.13.1) with ESMTP id j8OJ8BRV001375 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 24 Sep 2005 21:08:11 +0200 (MEST) Received: from localhost (localhost [127.0.0.1]) by cvsup.no.freebsd.org (8.13.1/8.13.1) with ESMTP id j8OJ8Abc081571; Sat, 24 Sep 2005 19:08:10 GMT (envelope-from Tor.Egge@cvsup.no.freebsd.org) Date: Sat, 24 Sep 2005 19:08:10 +0000 (UTC) Message-Id: <20050924.190810.74675111.Tor.Egge@cvsup.no.freebsd.org> To: truckman@FreeBSD.org From: Tor Egge In-Reply-To: <200509240853.j8O8r1O7090514@gw.catspoiler.org> References: <200509240741.j8O7fZca090425@gw.catspoiler.org> <200509240853.j8O8r1O7090514@gw.catspoiler.org> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Sat_Sep_24_19:08:10_2005_697)--" Content-Transfer-Encoding: 7bit X-Virus-Scanned-By: mimedefang.idi.ntnu.no, using CLAMD X-SMTP-From: Sender=, Relay/Client=c2h5oh.idi.ntnu.no [129.241.103.69], EHLO=cvsup.no.freebsd.org X-Scanned-By: MIMEDefang 2.48 on 129.241.107.38 X-Scanned-By: mimedefang.idi.ntnu.no, using MIMEDefang 2.48 with local filter 16.42-idi X-Filter-Time: 1 seconds Cc: scottl@FreeBSD.org, tegge@FreeBSD.org, current@FreeBSD.org, mckusick@FreeBSD.org Subject: Re: soft updates / background fsck directory link count bug X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Sep 2005 19:08:14 -0000 ----Next_Part(Sat_Sep_24_19:08:10_2005_697)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit > I believe the problem is that handle_workitem_remove() is putting the > the dirrem on the inodep inowait list, but it is never getting moved to > the inodep bufwait list because ffs_update() and > softdep_update_inodeblock() are not getting called for the leaf > directory after the dirrem is put on the inowait list if the link count > is too large. Correct. Running the commands (on an idle system) levels=280 dirchain=`jot $levels | tr '\n' '/'` mkdir -p $dirchain fsync $dirchain rm -rf 1 and monitoring the number of dirrem structures allocated in the kernel (while sleep 1; do vmstat -m | grep dirrem; done) shows that the number of dirrem structures slowly decreases. In this scenario, the rundown still happens since the link count on the inodes are normal. When the rundown doesn't start due to an elevated link count on the leaf inode then a situation might occur where there are no dirty blocks and no softupdate depdendecies for the file system on the global work list while some inodedep and dirrem dependencies for that file system are still lingering. ffs_sync() doesn't detect these lingering dependencies, and vfs_write_suspend() returns without any errors, indicating that the file system has been suspended. > In the normal case, it appears that the dirrem migration is triggered > when the inode is zeroed in ufs_inactive(), which happens when the first > call to handle_workitem_remove() calls vput(). Intermediate nodes ends up waiting for the child inode being zeroed and then written to disk. > Perhaps the dirrem should be put on the inowait list before the call to > ffs_truncate(). If softdep_slowdown() returns a nonzero value then ffs_truncate() can call ffs_syncvnode() before di_size has been set to 0. If the inodeblock is written due to fsync() operations on other inodes in the same inodeblock then the dirrem dependency would be moved to the global work list too early. Enclosed is a patch that forces an ffs_update() call from ufs_inactive() by setting the IN_CHANGE flag if i_effnlink is larger than 0 right before the call to vput(). An alternative is checking i_nlink instead of i_effnlink for faster rundown. - Tor Egge ----Next_Part(Sat_Sep_24_19:08:10_2005_697)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="softdep.diff5" Index: sys/ufs/ffs/ffs_softdep.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep.c,v retrieving revision 1.184 diff -u -r1.184 ffs_softdep.c --- sys/ufs/ffs/ffs_softdep.c 5 Sep 2005 22:14:33 -0000 1.184 +++ sys/ufs/ffs/ffs_softdep.c 24 Sep 2005 18:31:04 -0000 @@ -3477,6 +3477,8 @@ } WORKLIST_INSERT(&inodedep->id_inowait, &dirrem->dm_list); FREE_LOCK(&lk); + if (ip->i_effnlink > 0) + ip->i_flag |= IN_CHANGE; vput(vp); } ----Next_Part(Sat_Sep_24_19:08:10_2005_697)----