Date: Sat, 16 Sep 2017 13:20:26 -0700 From: Kirk McKusick <mckusick@mckusick.com> To: Andreas Longwitz <longwitz@incore.de> Cc: freebsd-fs@freebsd.org Subject: Re: fsync: giving up on dirty on ufs partitions running vfs_write_suspend() Message-ID: <201709162020.v8GKKQj0033706@chez.mckusick.com> In-Reply-To: <59BD0EAC.8030206@incore.de>
next in thread | previous in thread | raw e-mail | index | archive | help
> From: Konstantin Belousov <kostikbel@gmail.com> > Date: Sat, 16 Sep 2017 21:31:17 +0300 > To: Andreas Longwitz <longwitz@incore.de> > Subject: Re: fsync: giving up on dirty on ufs partitions running > vfs_write_suspend() > Cc: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org > > On Sat, Sep 16, 2017 at 01:44:44PM +0200, Andreas Longwitz wrote: >> Ok, I understand your thoughts about the "big loop" and I agree. On the >> other side it is not easy to measure the progress of the dirty buffers >> because these buffers a created from another process at the same time we >> loop in vop_stdfsync(). I can explain from my tests, where I use the >> following loop on a gjournaled partition: >> >> while true; do >> cp -p bigfile bigfile.tmp >> rm bigfile >> mv bigfile.tmp bigfile >> done >> >> When g_journal_switcher starts vfs_write_suspend() immediately after the >> rm command has started to do his "rm stuff" (ufs_inactive, ffs_truncate, >> ffs_indirtrunc at different levels, ffs_blkfree, ...) the we must loop >> (that means wait) in vop_stdfsync() until the rm process has finished >> his work. A lot of locking overhead is needed for coordination. >> Returning from bufobj_wwait() we always see one left dirty buffer (very >> seldom two), that is not optimal. Therefore I have tried the following >> patch (instead of bumping maxretry): >> >> --- vfs_default.c.orig 2016-10-24 12:26:57.000000000 +0200 >> +++ vfs_default.c 2017-09-15 12:30:44.792274000 +0200 >> @@ -688,6 +688,8 @@ >> bremfree(bp); >> bawrite(bp); >> } >> + if( maxretry < 1000) >> + DELAY(waitns); >> BO_LOCK(bo); >> goto loop2; >> } >> >> with different values for waitns. If I run the testloop 5000 times on my >> testserver, the problem is triggered always round about 10 times. The >> results from several runs are given in the following table: >> >> waitns max time max loops >> ------------------------------- >> no DELAY 0,5 sec 8650 (maxres = 100000) >> 1000 0,2 sec 24 >> 10000 0,8 sec 3 >> 100000 7,2 sec 3 >> >> "time" means spent time in vop_stdfsync() measured from entry to return >> by a dtrace script. "loops" means the number of times "--maxretry" is >> executed. I am not sure if DELAY() is the best way to wait or if waiting >> has other drawbacks. Anyway with DELAY() it does not take more than five >> iterazions to finish. > > This is not explicitly stated in your message, but I suppose that the > vop_stdfsync() is called due to VOP_FSYNC(devvp, MNT_SUSPEND) call in > ffs_sync(). Am I right ? > > If yes, then the solution is most likely to continue looping in the > vop_stdfsync() until there is no dirty buffers or the mount point > mnt_secondary_writes counter is zero. The pauses trick you tried might > be still useful, e.g. after some threshold of the performed loop > iterations. > > Some problem with this suggestion is that vop_stdfsync(devvp) needs to > know that the vnode is devvp for some UFS mount. The struct cdev, > acessible as v_rdev, has the pointer to struct mount. You should be > accurate to not access freed or reused struct mount. I concur with Kostik's comments. It would be helpful if you could try out his suggestions and see if the produces a better result. Once you converge on a solution, I will ensure that it gets checked in. ~Kirk
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201709162020.v8GKKQj0033706>