Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Sep 2017 22:19:31 -0700
From:      Kirk McKusick <mckusick@mckusick.com>
To:        Andreas Longwitz <longwitz@incore.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: fsync: giving up on dirty on ufs partitions running vfs_write_suspend()
Message-ID:  <201709110519.v8B5JVmf060773@chez.mckusick.com>
In-Reply-To: <59B29E35.9000506@incore.de>

next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Fri, 08 Sep 2017 15:42:13 +0200
> From: Andreas Longwitz <longwitz@incore.de>
> To: freebsd-fs@freebsd.org
> Subject: fsync: giving up on dirty on ufs partitions running vfs_write_s=
uspend()
> =

> I try to describe the cause for the "fsync: given up on dirty" problem
> described in
> =

> https://lists.freebsd.org/pipermail/freebsd-fs/2012-February/013804.html
> or
> https://lists.freebsd.org/pipermail/freebsd-fs/2013-August/018163.html
>
> Now I run FreeBSD 10.3 Stable r317936 and sometimes I see messages like
> =

>  <kern.crit> dssbkp4 kernel: fsync: giving up on dirty
>  <kern.crit> dssbkp4 kernel: 0xfffff80040d6c938: tag devfs, type VCHR
>  <kern.crit> dssbkp4 kernel: usecount 1, writecount 0, refcount 47
> mountedhere 0xfffff8004083a200
>  <kern.crit> dssbkp4 kernel: flags (VI_ACTIVE)
>  <kern.crit> dssbkp4 kernel: v_object 0xfffff800409b3500 ref 0 pages
> 1138 cleanbuf 42 dirtybuf 4
>  <kern.crit> dssbkp4 kernel: lock type devfs: EXCL by thread
> 0xfffff800403a8a00 (pid 26, g_journal switcher, tid 100181)
>  <kern.crit> dssbkp4 kernel: dev mirror/gmbkp4p5.journal
>  <kern.crit> dssbkp4 kernel: GEOM_JOURNAL: Cannot suspend file system
> /home (error=3D35).
> =

> on all of my servers running gjournal. Similar messages can be seen when
> a snapshot is taken (e.g. dump -L) on a arbitrary ufs partition. In all
> these cases the function vfs_write_suspend() was called which returned
> EAGAIN. This error code is set in vop_stdfsync(), when the above
> messages are created.
> =

> First I was confused about the "mountedhere" address, because the given
> address does not point to a "struct mount" but (as type =3D VCHR
> indicates) to a "struct cdev". Threfore I suggest the following patch to
> improve the output of vn_printf() using the textstrings from defines in
> /sys/sys/vnode.h:
> =

> --- vfs_subr.c.orig     2017-05-08 14:17:38.000000000 +0200
> +++ vfs_subr.c  2017-08-30 10:45:47.549740000 +0200
> @@ -3003,6 +3003,8 @@
>  static char *typename[] =3D
>  {"VNON", "VREG", "VDIR", "VBLK", "VCHR", "VLNK", "VSOCK", "VFIFO", "VBA=
D",
>   "VMARKER"};
> +static char *typetext[] =3D
> +{"", "", "mountedhere", "", "rdev", "", "socket", "fifoinfo", "", ""};
> =

>  void
>  vn_printf(struct vnode *vp, const char *fmt, ...)
> @@ -3016,8 +3018,9 @@
>         va_end(ap);
>         printf("%p: ", (void *)vp);
>         printf("tag %s, type %s\n", vp->v_tag, typename[vp->v_type]);
> -       printf("    usecount %d, writecount %d, refcount %d mountedhere
> %p\n",
> -           vp->v_usecount, vp->v_writecount, vp->v_holdcnt,
> vp->v_mountedhere);
> +       printf("    usecount %d, writecount %d, refcount %d %s %p\n",
> +           vp->v_usecount, vp->v_writecount, vp->v_holdcnt,
> typetext[vp->v_type],
> +           vp->v_mountedhere);
>         buf[0] =3D '\0';
>         buf[1] =3D '\0';
>         if (vp->v_vflag & VV_ROOT)

I concur with the above change and will make it.

> Second I found, that the "dirty" situation during vfs_write_suspend()
> only occurs when a big file (more than 10G on a partition of 116G) is
> removed. If vfs_write_suspend() is called immediately after "rm
> bigfile", then in vop_stdfsync() 1000 tries (maxretry) are done to wait
> for the "rm bigfile" to complete. Because a lot of bitmap writes must be
> done, the value 1000 is not sufficient on my servers. I have increased
> maxretry and in the worst case I saw 8650 tries to complete without
> "dirty". In this case the time spent in vop_stdfsync() was about 0,5
> seconds. The following patch solves the "dirty problem" for me:
> =

> --- vfs_default.c.orig  2016-10-24 12:26:57.000000000 +0200
> +++ vfs_default.c       2017-09-08 12:49:18.059970000 +0200
> @@ -644,7 +644,7 @@
>         struct bufobj *bo;
>         struct buf *nbp;
>         int error =3D 0;
> -       int maxretry =3D 1000;     /* large, arbitrarily chosen */
> +       int maxretry =3D 100000;   /* large, arbitrarily chosen */
> =

>         bo =3D &vp->v_bufobj;
>         BO_LOCK(bo);
> =

> ---
> Andreas Longwitz

This message has plagued me for years. It started out as a panic,
then got changed to a printf because I could not get rid of it. I
was never able to figure out why it should take more than five
iterations to finish, but obviously it takes more. The 1000 number
was picked because that just seemed insanely large and I did not
want to iterate forever. I have no problem with bumping up the
iteration count if there is some way to figure out that each iteration
is making forward progress (so we know that we are not in an infinite
loop). Can you come up with a scheme that can measure forward progress?
I would much prefer that to just making this number ever bigger.

	Kirk McKusick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201709110519.v8B5JVmf060773>