Date: Sat, 16 Sep 2017 21:31:17 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Andreas Longwitz <longwitz@incore.de> Cc: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org Subject: Re: fsync: giving up on dirty on ufs partitions running vfs_write_suspend() Message-ID: <20170916183117.GF78693@kib.kiev.ua> In-Reply-To: <59BD0EAC.8030206@incore.de> References: <201709110519.v8B5JVmf060773@chez.mckusick.com> <59BD0EAC.8030206@incore.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 16, 2017 at 01:44:44PM +0200, Andreas Longwitz wrote: > Ok, I understand your thoughts about the "big loop" and I agree. On the > other side it is not easy to measure the progress of the dirty buffers > because these buffers a created from another process at the same time we > loop in vop_stdfsync(). I can explain from my tests, where I use the > following loop on a gjournaled partition: > > while true; do > cp -p bigfile bigfile.tmp > rm bigfile > mv bigfile.tmp bigfile > done > > When g_journal_switcher starts vfs_write_suspend() immediately after the > rm command has started to do his "rm stuff" (ufs_inactive, ffs_truncate, > ffs_indirtrunc at different levels, ffs_blkfree, ...) the we must loop > (that means wait) in vop_stdfsync() until the rm process has finished > his work. A lot of locking overhead is needed for coordination. > Returning from bufobj_wwait() we always see one left dirty buffer (very > seldom two), that is not optimal. Therefore I have tried the following > patch (instead of bumping maxretry): > > --- vfs_default.c.orig 2016-10-24 12:26:57.000000000 +0200 > +++ vfs_default.c 2017-09-15 12:30:44.792274000 +0200 > @@ -688,6 +688,8 @@ > bremfree(bp); > bawrite(bp); > } > + if( maxretry < 1000) > + DELAY(waitns); > BO_LOCK(bo); > goto loop2; > } > > with different values for waitns. If I run the testloop 5000 times on my > testserver, the problem is triggered always round about 10 times. The > results from several runs are given in the following table: > > waitns max time max loops > ------------------------------- > no DELAY 0,5 sec 8650 (maxres = 100000) > 1000 0,2 sec 24 > 10000 0,8 sec 3 > 100000 7,2 sec 3 > > "time" means spent time in vop_stdfsync() measured from entry to return > by a dtrace script. "loops" means the number of times "--maxretry" is > executed. I am not sure if DELAY() is the best way to wait or if waiting > has other drawbacks. Anyway with DELAY() it does not take more than five > iterazions to finish. This is not explicitly stated in your message, but I suppose that the vop_stdfsync() is called due to VOP_FSYNC(devvp, MNT_SUSPEND) call in ffs_sync(). Am I right ? If yes, then the solution is most likely to continue looping in the vop_stdfsync() until there is no dirty buffers or the mount point mnt_secondary_writes counter is zero. The pauses trick you tried might be still useful, e.g. after some threshold of the performed loop iterations. Some problem with this suggestion is that vop_stdfsync(devvp) needs to know that the vnode is devvp for some UFS mount. The struct cdev, acessible as v_rdev, has the pointer to struct mount. You should be accurate to not access freed or reused struct mount.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170916183117.GF78693>