From owner-freebsd-fs@freebsd.org Sat Sep 16 18:31:23 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1F8ADE005D3 for ; Sat, 16 Sep 2017 18:31:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B1AB483F1A for ; Sat, 16 Sep 2017 18:31:22 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v8GIVHnQ094010 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 16 Sep 2017 21:31:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v8GIVHnQ094010 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v8GIVHIA094009; Sat, 16 Sep 2017 21:31:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 16 Sep 2017 21:31:17 +0300 From: Konstantin Belousov To: Andreas Longwitz Cc: Kirk McKusick , freebsd-fs@freebsd.org Subject: Re: fsync: giving up on dirty on ufs partitions running vfs_write_suspend() Message-ID: <20170916183117.GF78693@kib.kiev.ua> References: <201709110519.v8B5JVmf060773@chez.mckusick.com> <59BD0EAC.8030206@incore.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59BD0EAC.8030206@incore.de> User-Agent: Mutt/1.9.0 (2017-09-02) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Sep 2017 18:31:23 -0000 On Sat, Sep 16, 2017 at 01:44:44PM +0200, Andreas Longwitz wrote: > Ok, I understand your thoughts about the "big loop" and I agree. On the > other side it is not easy to measure the progress of the dirty buffers > because these buffers a created from another process at the same time we > loop in vop_stdfsync(). I can explain from my tests, where I use the > following loop on a gjournaled partition: > > while true; do > cp -p bigfile bigfile.tmp > rm bigfile > mv bigfile.tmp bigfile > done > > When g_journal_switcher starts vfs_write_suspend() immediately after the > rm command has started to do his "rm stuff" (ufs_inactive, ffs_truncate, > ffs_indirtrunc at different levels, ffs_blkfree, ...) the we must loop > (that means wait) in vop_stdfsync() until the rm process has finished > his work. A lot of locking overhead is needed for coordination. > Returning from bufobj_wwait() we always see one left dirty buffer (very > seldom two), that is not optimal. Therefore I have tried the following > patch (instead of bumping maxretry): > > --- vfs_default.c.orig 2016-10-24 12:26:57.000000000 +0200 > +++ vfs_default.c 2017-09-15 12:30:44.792274000 +0200 > @@ -688,6 +688,8 @@ > bremfree(bp); > bawrite(bp); > } > + if( maxretry < 1000) > + DELAY(waitns); > BO_LOCK(bo); > goto loop2; > } > > with different values for waitns. If I run the testloop 5000 times on my > testserver, the problem is triggered always round about 10 times. The > results from several runs are given in the following table: > > waitns max time max loops > ------------------------------- > no DELAY 0,5 sec 8650 (maxres = 100000) > 1000 0,2 sec 24 > 10000 0,8 sec 3 > 100000 7,2 sec 3 > > "time" means spent time in vop_stdfsync() measured from entry to return > by a dtrace script. "loops" means the number of times "--maxretry" is > executed. I am not sure if DELAY() is the best way to wait or if waiting > has other drawbacks. Anyway with DELAY() it does not take more than five > iterazions to finish. This is not explicitly stated in your message, but I suppose that the vop_stdfsync() is called due to VOP_FSYNC(devvp, MNT_SUSPEND) call in ffs_sync(). Am I right ? If yes, then the solution is most likely to continue looping in the vop_stdfsync() until there is no dirty buffers or the mount point mnt_secondary_writes counter is zero. The pauses trick you tried might be still useful, e.g. after some threshold of the performed loop iterations. Some problem with this suggestion is that vop_stdfsync(devvp) needs to know that the vnode is devvp for some UFS mount. The struct cdev, acessible as v_rdev, has the pointer to struct mount. You should be accurate to not access freed or reused struct mount.