From owner-freebsd-current@FreeBSD.ORG Sun Jan 8 01:44:04 2006 Return-Path: X-Original-To: freebsd-current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 92FD916A424; Sun, 8 Jan 2006 01:44:03 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7329543D4C; Sun, 8 Jan 2006 01:44:01 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id k081hXO1023736; Sat, 7 Jan 2006 17:43:38 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200601080143.k081hXO1023736@gw.catspoiler.org> Date: Sat, 7 Jan 2006 17:43:33 -0800 (PST) From: Don Lewis To: dillon@apollo.backplane.com In-Reply-To: <200512050440.jB54eghJ043706@apollo.backplane.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-current@FreeBSD.org, obrien@FreeBSD.org, kris@obsecurity.org Subject: Re: [PANIC] ufs_dirbad: bad dir X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2006 01:44:05 -0000 On 4 Dec, Matthew Dillon wrote: > I did see one possible issue, and that is the fact that > softdep_setup_freeblocks() is called BEFORE vinvalbuf() in ffs_truncate(). > If softdeps somehow is able to execute *ANY* of those workitems before > the vinvalbuf runs, in particular if there is already I/O in progress > on any of those buffers (and thus doesn't get invalidated by vinvalbuf > until after the I/O is complete), then that could result in the above > scenario. The case occurs in several situations but the main one that > DragonFly and FreeBSD shares is when a file is truncated to 0-length. That does look suspicious, especially since softdep_setup_freeblocks() can execute the workitem immediately if the inode was never written, though it is supposed to wait for any pending writes to complete and invalidate any dirty bufs. /* * Add the freeblks structure to the list of operations that * must await the zero'ed inode being written to disk. If we * still have a bitmap dependency (delay == 0), then the inode * has never been written to disk, so we can process the * freeblks below once we have deleted the dependencies. */ delay = (inodedep->id_state & DEPCOMPLETE); if (delay) WORKLIST_INSERT(&inodedep->id_bufwait, &freeblks->fb_list); [snip] /* * We must wait for any I/O in progress to finish so that * all potential buffers on the dirty list will be visible. * Once they are all there, walk the list and get rid of * any dependencies. */ vp = ITOV(ip); VI_LOCK(vp); drain_output(vp); restart: TAILQ_FOREACH(bp, &vp->v_bufobj.bo_dirty.bv_hd, b_bobufs) { if (((flags & IO_EXT) == 0 && (bp->b_xflags & BX_ALTDATA)) || ((flags & IO_NORMAL) == 0 && (bp->b_xflags & BX_ALTDATA) == 0)) continue; if ((bp = getdirtybuf(bp, VI_MTX(vp), MNT_WAIT)) == NULL) goto restart; VI_UNLOCK(vp); ACQUIRE_LOCK(&lk); (void) inodedep_lookup(fs, ip->i_number, 0, &inodedep); deallocate_dependencies(bp, inodedep); FREE_LOCK(&lk); bp->b_flags |= B_INVAL | B_NOCACHE; brelse(bp); VI_LOCK(vp); goto restart; } [snip] /* * If the inode has never been written to disk (delay == 0), * then we can process the freeblks now that we have deleted * the dependencies. */ if (!delay) handle_workitem_freeblocks(freeblks, 0); Hmn, I'm actually wondering if it might be the "delay" case: delay = (inodedep->id_state & DEPCOMPLETE); if (delay) WORKLIST_INSERT(&inodedep->id_bufwait, &freeblks->fb_list); /* * Because the file length has been truncated to zero, any * pending block allocation dependency structures associated * with this inode are obsolete and can simply be de-allocated. * We must first merge the two dependency lists to get rid of * any duplicate freefrag structures, then purge the merged list. * If we still have a bitmap dependency, then the inode has never * been written to disk, so we can free any fragments without delay. */ if (flags & IO_NORMAL) { merge_inode_lists(&inodedep->id_newinoupdt, &inodedep->id_inoupdt); while ((adp = TAILQ_FIRST(&inodedep->id_inoupdt)) != 0) free_allocdirect(&inodedep->id_inoupdt, adp, delay); } if (flags & IO_EXT) { merge_inode_lists(&inodedep->id_newextupdt, &inodedep->id_extupdt); while ((adp = TAILQ_FIRST(&inodedep->id_extupdt)) != 0) free_allocdirect(&inodedep->id_extupdt, adp, delay); } FREE_LOCK(&lk); bdwrite(bp); [what happens if the inode block write completes here, handle_workitem_freeblocks() is called, and the block is reallocated, before the call to drain_output()?] /* * We must wait for any I/O in progress to finish so that * all potential buffers on the dirty list will be visible. * Once they are all there, walk the list and get rid of * any dependencies. */ vp = ITOV(ip); VI_LOCK(vp); drain_output(vp); ... Maybe the proper fix is to do the I/O drain code earlier.