From owner-freebsd-stable@FreeBSD.ORG Mon Nov 27 14:54:07 2006 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4106B16A492; Mon, 27 Nov 2006 14:54:07 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (meestal.stack.nl [131.155.140.141]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9C65C43D45; Mon, 27 Nov 2006 14:52:55 +0000 (GMT) (envelope-from jilles@stack.nl) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 3AB7C4B392; Mon, 27 Nov 2006 15:53:52 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 2ED4122893; Mon, 27 Nov 2006 15:53:52 +0100 (CET) Date: Mon, 27 Nov 2006 15:53:52 +0100 From: Jilles Tjoelker To: Rink Springer Message-ID: <20061127145352.GB13284@stack.nl> References: <20061116082407.GB33390@rink.nu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20061116082407.GB33390@rink.nu> X-Operating-System: FreeBSD 5.5-RELEASE-p1 i386 User-Agent: Mutt/1.5.13 (2006-08-11) Cc: stable@FreeBSD.org Subject: Re: panic(): vinvalbuf: dirty bufs: perhaps a ffs_syncvnode bug? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Nov 2006 14:54:07 -0000 On Thu, Nov 16, 2006 at 09:24:07AM +0100, Rink Springer wrote: > Over the night, we reset the shelf in order to activate its new > management IP address, causing the /dev/da[12] devices to be temporarily > unavailable. This resulted in the following panic on the rather busy > mailstorage server (the other server has minor load and was fine): > --- > (da0:isp0:0:1:0): lost device > (da0:isp0:0:1:0): removing device entry > (da1:isp0:0:2:0): lost device > g_vfs_done():da1s1[WRITE(offset=292316823552, length=16384)]error = 6 > g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 > g_vfs_done():da1s1[READ(offset=12175362048, length=2048)]error = 6 > g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6 > g_vfs_done():da1s1[READ(offset=18370689024, length=2048)]error = 6 > g_vfs_done():da1s1[READ(offset=25829486592, length=512)]error = 6 > vnode_pager_getpages: I/O read error > vm_fault: pager read error, pid 78035 (lmtpd) > g_vfs_done():da1s1[WRITE(offset=240287318016, > length=1638(da1:isp0:0:2:0): Invalidating pack > 4)]error = 6 > g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 > g_vfs_done():da1s1[READ(offset=102126977024, length=16384)]error = 6 > g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6 > g_vfs_dpone():da1s1[READ(offset=102319669248, length=16384)]error = 6a > nic: vinvalbuf: dirty bufs > cpuid = 2 > Uptime: 54d15h48m38s > When looking at the source code of vinvalbuf(), which calls > bufobj_invalbuf(), it seems that this panic is raised after a bufobj > still contains dirty data after waiting for it to complete without > error. The code can be found at /sys/kern/vfs_subr.c Note that this panic can only occur if vinvalbuf() is called with V_SAVE (save modified data first). The exact condition for the panic is better described as: a bufobj still contains dirty data or still has output in progress after a successful synchronous BO_SYNC operation. bufobj_wwait() cannot return an error unless msleep() fails (e.g. interruptible sleep requested via slpflag and signal occured). If the I/O has failed, bufobj_wwait() will return success. > The sync routine called eventually translates to bufsync(), as in > /sys/kern/vfs_bio.c, which calls the filesystem's sync routine. It seems > as if the return status of vfs_bio_awrite() in ffs_syncvnode() is not > checked; all the other parts are checked. I believe this could provoke > this panic. There does not seem much point in checking an asynchronous write result anyway, as the I/O is not completed yet. I don't understand well what the code is doing with async writes. For all but the last pass (see further), it will call bawrite() on the buffer, which sets B_ASYNC then calls bwrite(). For the last pass, it calls bwrite() directly (has something cleared B_ASYNC?), and returns an error if it fails. bwrite() itself is an inline function defined in /sys/sys/buf.h, which calls BO_WRITE after some KASSERTs. > As the machine is in production use, it was instantly rebooted by a > collegue and thus I have no vmcore, backtrace or anything. I therefore > hope the information provided here is adequate. > Can someone with more FreeBSD-VFS knowledge please look at this? There is another possible problem, from this comment in /sys/ufs/ffs/ffs_vnops.c ffs_syncvnode(): /* * Block devices associated with filesystems may * have new I/O requests posted for them even if * the vnode is locked, so no amount of trying will * get them clean. Thus we give block devices a * good effort, then just give up. For all other file * types, go around and try again until it is clean. */ Actually it just does NIADDR + 1 (four) passes and then gives up. If DIAGNOSTIC is enabled, it will then print the affected vnode, if it is not a disk. This failure is not reflected in ffs_syncvnode()'s return value, so if it occurs when ffs_syncvnode() is called from bufobj_invalbuf(), a panic will result. Suppose ffs_syncvnode() would be changed to return some error in this case. bufobj_invalbuf()/vinvalbuf() will handle a BO_SYNC/ffs_syncvnode() error by aborting with an error return. It seems that in most cases this will cause the operation invoking the vinvalbuf() to fail. However, in at least one case (vm_object_terminate()), the error will be ignored; this may cause old garbage/dangling references? -- Jilles Tjoelker