From owner-freebsd-current@FreeBSD.ORG Sun Dec 14 10:41:17 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B842016A4CE; Sun, 14 Dec 2003 10:41:17 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id CB0E343D1F; Sun, 14 Dec 2003 10:41:15 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id hBEIf8u89347; Sun, 14 Dec 2003 13:41:08 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sun, 14 Dec 2003 13:41:08 -0500 (EST) From: Jeff Roberson To: dwhite@gumbysoft.com, Don Lewis In-Reply-To: <20031214131819.W4201-100000@mail.chesapeake.net> Message-ID: <20031214134007.F4201-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: mckusick@mckusick.com cc: alc@freebsd.org cc: mb@imp.ch cc: freebsd-current@freebsd.org Subject: Re: HAVE TRACE & DDB Re: FreeBSD 5.2-RC1 released X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Dec 2003 18:41:18 -0000 On Sun, 14 Dec 2003, Jeff Roberson wrote: > > On Sun, 14 Dec 2003, Jeff Roberson wrote: > > > On Sat, 13 Dec 2003, Don Lewis wrote: > > > > > On 13 Dec, Don Lewis wrote: > > > > On 12 Dec, Jeff Roberson wrote: > > > > > > > > > > > >> fsync: giving up on dirty: 0xc4e18000: tag devfs, type VCHR, usecount 44, > > > >> writecount 0, refcount 14, flags (VI_XLOCK|VV_OBJBUF), lock type devfs: EXCL > > > >> (count 1) by thread 0xc20ff500 > > > > > > > > Why are we trying to reuse a vnode with a usecount of 44 and a refcount > > > > of 14? What is thread 0xc20ff500 doing? > > > > > > Following up to myself ... > > > > > > It looks like we're trying to recycle this vnode because of the > > > following sysinstall code, in distExtractTarball(): > > > > > > if (is_base && RunningAsInit && !Fake) { > > > unmounted_dev = 1; > > > unmount("/dev", MNT_FORCE); > > > } else > > > unmounted_dev = 0; > > > > > > What happens if we forceably umount /dev while /dev/whatever holds a > > > mounted file system? It looks like this is handled by vgonechrl(). It > > > looks to me like vclean() is going to do some scary stuff to this vnode. > > > > > > > Excellent work! I think I may know what's wrong. If you look at rev > > 1.461 of vfs_subr.c I changed the semantics of cleaning a VCHR that was > > being unmounted. I now acquire the xlock around the operation. This may > > be the culprit. I'm too tired to debug this right now, but I can look at > > it in the am. > > > > Ok, I think I understand what happens.. The syncer runs, and at the same > time, we're doing the forced unmount. This causes the sync of the device > vnode to fail. This isn't really a problem. After this, while syncing > a ffs volume that is mounted on a VCHR from /dev, we bread() and get a > buffer for this device and then immediately block. The forced unmount > then proceeds, calling vclean() on the device, which goes into the VM via > DESTROYVOBJECT. The VM frees all of the pages associated with the object > etc. Then, the ffs_update() is allowed to run again with a pointer to a > buffer that has pointers to pages that have been freed. This is where > vfs_setdirty() comes in and finds a NULL object. > > The wired counts on the pages are 1, which is consistent with a page in > the bufcache. Also the object is NULL which is the only indication we > have that this is a free page. > > I think that if we want to allow unmounting of the underlying device for > VCHR, we need to not call vclean() from vgonechr(). We need to just lock, > VOP_RECLAIM, cache_purge(), and insmntque to NULL. > > I've looked through my changes here, and I don't see how I could have > introduced this bug. Were we vclean()ing before, and that seems to be the > main problem. There have been some changes to device aliasing that could > have impacted this. I'm trying to get the scoop from phk now. > > I'm going to change the way vgonechrl() works, but I'd really like to know > what changed that broke this.. > Please test the patch at: http://www.chesapeake.net/~jroberson/forcevchr.diff If this works I'll come up with a more compact arrangement for the code so that we can avoid all of this duplication. Cheers, Jeff > Cheers, > Jeff > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >