From owner-freebsd-fs@FreeBSD.ORG Wed Feb 29 19:19:46 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A792C106564A; Wed, 29 Feb 2012 19:19:46 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (unknown [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 8B0018FC12; Wed, 29 Feb 2012 19:19:46 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id q1TJJfCP082878; Wed, 29 Feb 2012 11:19:41 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201202291919.q1TJJfCP082878@chez.mckusick.com> To: Ivan Voras In-reply-to: Date: Wed, 29 Feb 2012 11:19:41 -0800 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@freebsd.org Subject: Re: fsync: giving up on dirty X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Feb 2012 19:19:46 -0000 > To: freebsd-fs@freebsd.org > From: Ivan Voras > Date: Wed, 29 Feb 2012 12:08:51 +0100 > Subject: fsync: giving up on dirty > > Hi, > > One of the machines I take care of started recently started recording > messages such as these in the logs: > > Feb 28 04:02:09 skynet kernel: fsync: giving up on dirty > Feb 28 04:02:09 skynet kernel: 0xfffffe000fef2780: tag devfs, type VCHR > Feb 28 04:02:09 skynet kernel: usecount 1, writecount 0, refcount 2400 > mountedhere 0xfffffe000fd25200 > Feb 28 04:02:09 skynet kernel: flags () > Feb 28 04:02:09 skynet kernel: v_object 0xfffffe000fe96d98 ref 0 pages 23990 > Feb 28 04:02:09 skynet kernel: lock type devfs: EXCL by thread > 0xfffffe000feb5000 (pid 44968) > Feb 28 04:02:09 skynet kernel: dev multipath/hpdisk4-web > > I'm leaning towards suspecting bad hardware - the device behind > "multipath/hpdisk4-web" is an old HP MSA FC storage (which I wouldn't > recommend to anyone), but though verbose, the message lacks device-level > specifics. I would expect messages to be logged by the isp driver, or > the CAM layer, or even GEOM, but there is nothing there - only this > "fsync" message. > > From the code, it looks like an transient condition (printed in the case > of EAGAIN), but the hardware here is behaving a bit suspect so I'd like > to make sure - should I ignore this message? I have seen this message during some of Peter Holm's stress tests when running with journaled soft updates. Note that soft updates without journaling do not show this issue. The "giving up" message comes when trying to flush the filesystem metadata blocks associated with the mount device (hence the VCHR type of the vnode). The problem has become more evident with my recent changes to the way that sync'ing is done. I believe that the problem is because the soft updates worklist needs to be flushed before some of the dirty blocks can be successfully written. If you are running a 9-stable system on this machine, are using journaled soft updates on the filesystem in question, and are willing to try out my first attempt at a fix, let me know and I'll send you the diffs for it. Kirk McKusick