From owner-freebsd-fs@freebsd.org  Fri Sep  8 13:42:25 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B1F3E18A85
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri,  8 Sep 2017 13:42:25 +0000 (UTC)
 (envelope-from longwitz@incore.de)
Received: from dss.incore.de (dss.incore.de [195.145.1.138])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E280D7FD44
 for <freebsd-fs@freebsd.org>; Fri,  8 Sep 2017 13:42:24 +0000 (UTC)
 (envelope-from longwitz@incore.de)
Received: from inetmail.dmz (inetmail.dmz [10.3.0.3])
 by dss.incore.de (Postfix) with ESMTP id E7718679D8
 for <freebsd-fs@freebsd.org>; Fri,  8 Sep 2017 15:42:15 +0200 (CEST)
X-Virus-Scanned: amavisd-new at incore.de
Received: from dss.incore.de ([10.3.0.3])
 by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024)
 with LMTP id O42wkulTjw4a for <freebsd-fs@freebsd.org>;
 Fri,  8 Sep 2017 15:42:14 +0200 (CEST)
Received: from mail.local.incore (fwintern.dmz [10.0.0.253])
 by dss.incore.de (Postfix) with ESMTP id 8BD4A6798C
 for <freebsd-fs@freebsd.org>; Fri,  8 Sep 2017 15:42:13 +0200 (CEST)
Received: from bsdlo.incore (bsdlo.incore [192.168.0.84])
 by mail.local.incore (Postfix) with ESMTP id 73D05508A9
 for <freebsd-fs@freebsd.org>; Fri,  8 Sep 2017 15:42:13 +0200 (CEST)
Message-ID: <59B29E35.9000506@incore.de>
Date: Fri, 08 Sep 2017 15:42:13 +0200
From: Andreas Longwitz <longwitz@incore.de>
User-Agent: Thunderbird 2.0.0.19 (X11/20090113)
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: fsync: giving up on dirty on ufs partitions running
 vfs_write_suspend()
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Sep 2017 13:42:25 -0000

I try to describe the cause for the "fsync: given up on dirty" problem
described in

https://lists.freebsd.org/pipermail/freebsd-fs/2012-February/013804.html
or
https://lists.freebsd.org/pipermail/freebsd-fs/2013-August/018163.html

Now I run FreeBSD 10.3 Stable r317936 and sometimes I see messages like

 <kern.crit> dssbkp4 kernel: fsync: giving up on dirty
 <kern.crit> dssbkp4 kernel: 0xfffff80040d6c938: tag devfs, type VCHR
 <kern.crit> dssbkp4 kernel: usecount 1, writecount 0, refcount 47
mountedhere 0xfffff8004083a200
 <kern.crit> dssbkp4 kernel: flags (VI_ACTIVE)
 <kern.crit> dssbkp4 kernel: v_object 0xfffff800409b3500 ref 0 pages
1138 cleanbuf 42 dirtybuf 4
 <kern.crit> dssbkp4 kernel: lock type devfs: EXCL by thread
0xfffff800403a8a00 (pid 26, g_journal switcher, tid 100181)
 <kern.crit> dssbkp4 kernel: dev mirror/gmbkp4p5.journal
 <kern.crit> dssbkp4 kernel: GEOM_JOURNAL: Cannot suspend file system
/home (error=35).

on all of my servers running gjournal. Similar messages can be seen when
a snapshot is taken (e.g. dump -L) on a arbitrary ufs partition. In all
these cases the function vfs_write_suspend() was called which returned
EAGAIN. This error code is set in vop_stdfsync(), when the above
messages are created.

First I was confused about the "mountedhere" address, because the given
address does not point to a "struct mount" but (as type = VCHR
indicates) to a "struct cdev". Threfore I suggest the following patch to
improve the output of vn_printf() using the textstrings from defines in
/sys/sys/vnode.h:

--- vfs_subr.c.orig     2017-05-08 14:17:38.000000000 +0200
+++ vfs_subr.c  2017-08-30 10:45:47.549740000 +0200
@@ -3003,6 +3003,8 @@
 static char *typename[] =
 {"VNON", "VREG", "VDIR", "VBLK", "VCHR", "VLNK", "VSOCK", "VFIFO", "VBAD",
  "VMARKER"};
+static char *typetext[] =
+{"", "", "mountedhere", "", "rdev", "", "socket", "fifoinfo", "", ""};

 void
 vn_printf(struct vnode *vp, const char *fmt, ...)
@@ -3016,8 +3018,9 @@
        va_end(ap);
        printf("%p: ", (void *)vp);
        printf("tag %s, type %s\n", vp->v_tag, typename[vp->v_type]);
-       printf("    usecount %d, writecount %d, refcount %d mountedhere
%p\n",
-           vp->v_usecount, vp->v_writecount, vp->v_holdcnt,
vp->v_mountedhere);
+       printf("    usecount %d, writecount %d, refcount %d %s %p\n",
+           vp->v_usecount, vp->v_writecount, vp->v_holdcnt,
typetext[vp->v_type],
+           vp->v_mountedhere);
        buf[0] = '\0';
        buf[1] = '\0';
        if (vp->v_vflag & VV_ROOT)

Second I found, that the "dirty" situation during vfs_write_suspend()
only occurs when a big file (more than 10G on a partition of 116G) is
removed. If vfs_write_suspend() is called immediately after "rm
bigfile", then in vop_stdfsync() 1000 tries (maxretry) are done to wait
for the "rm bigfile" to complete. Because a lot of bitmap writes must be
done, the value 1000 is not sufficient on my servers. I have increased
maxretry and in the worst case I saw 8650 tries to complete without
"dirty". In this case the time spent in vop_stdfsync() was about 0,5
seconds. The following patch solves the "dirty problem" for me:

--- vfs_default.c.orig  2016-10-24 12:26:57.000000000 +0200
+++ vfs_default.c       2017-09-08 12:49:18.059970000 +0200
@@ -644,7 +644,7 @@
        struct bufobj *bo;
        struct buf *nbp;
        int error = 0;
-       int maxretry = 1000;     /* large, arbitrarily chosen */
+       int maxretry = 100000;   /* large, arbitrarily chosen */

        bo = &vp->v_bufobj;
        BO_LOCK(bo);

---
Andreas Longwitz