Date: Fri, 08 Sep 2017 15:42:13 +0200 From: Andreas Longwitz <longwitz@incore.de> To: freebsd-fs@freebsd.org Subject: fsync: giving up on dirty on ufs partitions running vfs_write_suspend() Message-ID: <59B29E35.9000506@incore.de>
next in thread | raw e-mail | index | archive | help
I try to describe the cause for the "fsync: given up on dirty" problem described in https://lists.freebsd.org/pipermail/freebsd-fs/2012-February/013804.html or https://lists.freebsd.org/pipermail/freebsd-fs/2013-August/018163.html Now I run FreeBSD 10.3 Stable r317936 and sometimes I see messages like <kern.crit> dssbkp4 kernel: fsync: giving up on dirty <kern.crit> dssbkp4 kernel: 0xfffff80040d6c938: tag devfs, type VCHR <kern.crit> dssbkp4 kernel: usecount 1, writecount 0, refcount 47 mountedhere 0xfffff8004083a200 <kern.crit> dssbkp4 kernel: flags (VI_ACTIVE) <kern.crit> dssbkp4 kernel: v_object 0xfffff800409b3500 ref 0 pages 1138 cleanbuf 42 dirtybuf 4 <kern.crit> dssbkp4 kernel: lock type devfs: EXCL by thread 0xfffff800403a8a00 (pid 26, g_journal switcher, tid 100181) <kern.crit> dssbkp4 kernel: dev mirror/gmbkp4p5.journal <kern.crit> dssbkp4 kernel: GEOM_JOURNAL: Cannot suspend file system /home (error=35). on all of my servers running gjournal. Similar messages can be seen when a snapshot is taken (e.g. dump -L) on a arbitrary ufs partition. In all these cases the function vfs_write_suspend() was called which returned EAGAIN. This error code is set in vop_stdfsync(), when the above messages are created. First I was confused about the "mountedhere" address, because the given address does not point to a "struct mount" but (as type = VCHR indicates) to a "struct cdev". Threfore I suggest the following patch to improve the output of vn_printf() using the textstrings from defines in /sys/sys/vnode.h: --- vfs_subr.c.orig 2017-05-08 14:17:38.000000000 +0200 +++ vfs_subr.c 2017-08-30 10:45:47.549740000 +0200 @@ -3003,6 +3003,8 @@ static char *typename[] = {"VNON", "VREG", "VDIR", "VBLK", "VCHR", "VLNK", "VSOCK", "VFIFO", "VBAD", "VMARKER"}; +static char *typetext[] = +{"", "", "mountedhere", "", "rdev", "", "socket", "fifoinfo", "", ""}; void vn_printf(struct vnode *vp, const char *fmt, ...) @@ -3016,8 +3018,9 @@ va_end(ap); printf("%p: ", (void *)vp); printf("tag %s, type %s\n", vp->v_tag, typename[vp->v_type]); - printf(" usecount %d, writecount %d, refcount %d mountedhere %p\n", - vp->v_usecount, vp->v_writecount, vp->v_holdcnt, vp->v_mountedhere); + printf(" usecount %d, writecount %d, refcount %d %s %p\n", + vp->v_usecount, vp->v_writecount, vp->v_holdcnt, typetext[vp->v_type], + vp->v_mountedhere); buf[0] = '\0'; buf[1] = '\0'; if (vp->v_vflag & VV_ROOT) Second I found, that the "dirty" situation during vfs_write_suspend() only occurs when a big file (more than 10G on a partition of 116G) is removed. If vfs_write_suspend() is called immediately after "rm bigfile", then in vop_stdfsync() 1000 tries (maxretry) are done to wait for the "rm bigfile" to complete. Because a lot of bitmap writes must be done, the value 1000 is not sufficient on my servers. I have increased maxretry and in the worst case I saw 8650 tries to complete without "dirty". In this case the time spent in vop_stdfsync() was about 0,5 seconds. The following patch solves the "dirty problem" for me: --- vfs_default.c.orig 2016-10-24 12:26:57.000000000 +0200 +++ vfs_default.c 2017-09-08 12:49:18.059970000 +0200 @@ -644,7 +644,7 @@ struct bufobj *bo; struct buf *nbp; int error = 0; - int maxretry = 1000; /* large, arbitrarily chosen */ + int maxretry = 100000; /* large, arbitrarily chosen */ bo = &vp->v_bufobj; BO_LOCK(bo); --- Andreas Longwitz
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59B29E35.9000506>