From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 29 19:19:46 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A792C106564A;
	Wed, 29 Feb 2012 19:19:46 +0000 (UTC)
	(envelope-from mckusick@mckusick.com)
Received: from chez.mckusick.com (unknown
	[IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452])
	by mx1.freebsd.org (Postfix) with ESMTP id 8B0018FC12;
	Wed, 29 Feb 2012 19:19:46 +0000 (UTC)
Received: from chez.mckusick.com (localhost [127.0.0.1])
	by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id q1TJJfCP082878;
	Wed, 29 Feb 2012 11:19:41 -0800 (PST)
	(envelope-from mckusick@chez.mckusick.com)
Message-Id: <201202291919.q1TJJfCP082878@chez.mckusick.com>
To: Ivan Voras <ivoras@freebsd.org>
In-reply-to: <jil10d$52d$1@dough.gmane.org> 
Date: Wed, 29 Feb 2012 11:19:41 -0800
From: Kirk McKusick <mckusick@mckusick.com>
X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY
	autolearn=failed version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com
Cc: freebsd-fs@freebsd.org
Subject: Re: fsync: giving up on dirty 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Feb 2012 19:19:46 -0000

> To: freebsd-fs@freebsd.org
> From: Ivan Voras <ivoras@freebsd.org>
> Date: Wed, 29 Feb 2012 12:08:51 +0100
> Subject: fsync: giving up on dirty
> 
> Hi,
> 
> One of the machines I take care of started recently started recording
> messages such as these in the logs:
> 
> Feb 28 04:02:09 skynet kernel: fsync: giving up on dirty
> Feb 28 04:02:09 skynet kernel: 0xfffffe000fef2780: tag devfs, type VCHR
> Feb 28 04:02:09 skynet kernel: usecount 1, writecount 0, refcount 2400
> mountedhere 0xfffffe000fd25200
> Feb 28 04:02:09 skynet kernel: flags ()
> Feb 28 04:02:09 skynet kernel: v_object 0xfffffe000fe96d98 ref 0 pages 23990
> Feb 28 04:02:09 skynet kernel: lock type devfs: EXCL by thread
> 0xfffffe000feb5000 (pid 44968)
> Feb 28 04:02:09 skynet kernel: dev multipath/hpdisk4-web
> 
> I'm leaning towards suspecting bad hardware - the device behind
> "multipath/hpdisk4-web" is an old HP MSA FC storage (which I wouldn't
> recommend to anyone), but though verbose, the message lacks device-level
> specifics. I would expect messages to be logged by the isp driver, or
> the CAM layer, or even GEOM, but there is nothing there - only this
> "fsync" message.
> 
> From the code, it looks like an transient condition (printed in the case
> of EAGAIN), but the hardware here is behaving a bit suspect so I'd like
> to make sure - should I ignore this message?

I have seen this message during some of Peter Holm's stress tests when
running with journaled soft updates. Note that soft updates without
journaling do not show this issue. The "giving up" message comes when
trying to flush the filesystem metadata blocks associated with the mount
device (hence the VCHR type of the vnode). The problem has become more
evident with my recent changes to the way that sync'ing is done.

I believe that the problem is because the soft updates worklist needs
to be flushed before some of the dirty blocks can be successfully written.
If you are running a 9-stable system on this machine, are using journaled
soft updates on the filesystem in question, and are willing to try out
my first attempt at a fix, let me know and I'll send you the diffs for it.

	Kirk McKusick