From owner-freebsd-stable@FreeBSD.ORG  Mon Nov 27 14:54:07 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: stable@FreeBSD.org
Delivered-To: freebsd-stable@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 4106B16A492;
	Mon, 27 Nov 2006 14:54:07 +0000 (UTC) (envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (meestal.stack.nl [131.155.140.141])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9C65C43D45;
	Mon, 27 Nov 2006 14:52:55 +0000 (GMT) (envelope-from jilles@stack.nl)
Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131])
	by mx1.stack.nl (Postfix) with ESMTP id 3AB7C4B392;
	Mon, 27 Nov 2006 15:53:52 +0100 (CET)
Received: by snail.stack.nl (Postfix, from userid 1677)
	id 2ED4122893; Mon, 27 Nov 2006 15:53:52 +0100 (CET)
Date: Mon, 27 Nov 2006 15:53:52 +0100
From: Jilles Tjoelker <jilles@stack.nl>
To: Rink Springer <rink@FreeBSD.org>
Message-ID: <20061127145352.GB13284@stack.nl>
References: <20061116082407.GB33390@rink.nu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20061116082407.GB33390@rink.nu>
X-Operating-System: FreeBSD 5.5-RELEASE-p1 i386
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: stable@FreeBSD.org
Subject: Re: panic(): vinvalbuf: dirty bufs: perhaps a ffs_syncvnode bug?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Nov 2006 14:54:07 -0000

On Thu, Nov 16, 2006 at 09:24:07AM +0100, Rink Springer wrote:
> Over the night, we reset the shelf in order to activate its new
> management IP address, causing the /dev/da[12] devices to be temporarily
> unavailable. This resulted in the following panic on the rather busy
> mailstorage server (the other server has minor load and was fine):

> ---
> (da0:isp0:0:1:0): lost device
> (da0:isp0:0:1:0): removing device entry
> (da1:isp0:0:2:0): lost device
> g_vfs_done():da1s1[WRITE(offset=292316823552, length=16384)]error = 6
> g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6
> g_vfs_done():da1s1[READ(offset=12175362048, length=2048)]error = 6
> g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6
> g_vfs_done():da1s1[READ(offset=18370689024, length=2048)]error = 6
> g_vfs_done():da1s1[READ(offset=25829486592, length=512)]error = 6
> vnode_pager_getpages: I/O read error
> vm_fault: pager read error, pid 78035 (lmtpd)
> g_vfs_done():da1s1[WRITE(offset=240287318016,
> length=1638(da1:isp0:0:2:0): Invalidating pack
> 4)]error = 6
> g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6
> g_vfs_done():da1s1[READ(offset=102126977024, length=16384)]error = 6
> g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6
> g_vfs_dpone():da1s1[READ(offset=102319669248, length=16384)]error = 6a
> nic: vinvalbuf: dirty bufs
> cpuid = 2
> Uptime: 54d15h48m38s

> When looking at the source code of vinvalbuf(), which calls
> bufobj_invalbuf(), it seems that this panic is raised after a bufobj
> still contains dirty data after waiting for it to complete without
> error. The code can be found at /sys/kern/vfs_subr.c

Note that this panic can only occur if vinvalbuf() is called with
V_SAVE (save modified data first).

The exact condition for the panic is better described as: a bufobj still
contains dirty data or still has output in progress after a successful
synchronous BO_SYNC operation.  bufobj_wwait() cannot return an error
unless msleep() fails (e.g. interruptible sleep requested via slpflag
and signal occured).  If the I/O has failed, bufobj_wwait() will return
success.

> The sync routine called eventually translates to bufsync(), as in
> /sys/kern/vfs_bio.c, which calls the filesystem's sync routine. It seems
> as if the return status of vfs_bio_awrite() in ffs_syncvnode() is not
> checked; all the other parts are checked. I believe this could provoke
> this panic.

There does not seem much point in checking an asynchronous write result
anyway, as the I/O is not completed yet.

I don't understand well what the code is doing with async writes.  For
all but the last pass (see further), it will call bawrite() on the
buffer, which sets B_ASYNC then calls bwrite(). For the last pass, it
calls bwrite() directly (has something cleared B_ASYNC?), and returns an
error if it fails. bwrite() itself is an inline function defined in
/sys/sys/buf.h, which calls BO_WRITE after some KASSERTs.

> As the machine is in production use, it was instantly rebooted by a
> collegue and thus I have no vmcore, backtrace or anything. I therefore
> hope the information provided here is adequate.

> Can someone with more FreeBSD-VFS knowledge please look at this?

There is another possible problem, from this comment in
/sys/ufs/ffs/ffs_vnops.c ffs_syncvnode():
/*
 * Block devices associated with filesystems may
 * have new I/O requests posted for them even if
 * the vnode is locked, so no amount of trying will
 * get them clean. Thus we give block devices a
 * good effort, then just give up. For all other file
 * types, go around and try again until it is clean.
 */
Actually it just does NIADDR + 1 (four) passes and then gives up.  If
DIAGNOSTIC is enabled, it will then print the affected vnode, if it is
not a disk.  This failure is not reflected in ffs_syncvnode()'s return
value, so if it occurs when ffs_syncvnode() is called from
bufobj_invalbuf(), a panic will result.

Suppose ffs_syncvnode() would be changed to return some error in this
case.  bufobj_invalbuf()/vinvalbuf() will handle a BO_SYNC/ffs_syncvnode()
error by aborting with an error return.  It seems that in most cases
this will cause the operation invoking the vinvalbuf() to fail.
However, in at least one case (vm_object_terminate()), the error will be
ignored; this may cause old garbage/dangling references?

-- 
Jilles Tjoelker