From owner-freebsd-fs@freebsd.org  Sat Sep 16 18:31:23 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1F8ADE005D3
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 16 Sep 2017 18:31:23 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B1AB483F1A
 for <freebsd-fs@freebsd.org>; Sat, 16 Sep 2017 18:31:22 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v8GIVHnQ094010
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 16 Sep 2017 21:31:17 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v8GIVHnQ094010
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id v8GIVHIA094009;
 Sat, 16 Sep 2017 21:31:17 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 16 Sep 2017 21:31:17 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andreas Longwitz <longwitz@incore.de>
Cc: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org
Subject: Re: fsync: giving up on dirty on ufs partitions running
 vfs_write_suspend()
Message-ID: <20170916183117.GF78693@kib.kiev.ua>
References: <201709110519.v8B5JVmf060773@chez.mckusick.com>
 <59BD0EAC.8030206@incore.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <59BD0EAC.8030206@incore.de>
User-Agent: Mutt/1.9.0 (2017-09-02)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Sep 2017 18:31:23 -0000

On Sat, Sep 16, 2017 at 01:44:44PM +0200, Andreas Longwitz wrote:
> Ok, I understand your thoughts about the "big loop" and I agree. On the
> other side it is not easy to measure the progress of the dirty buffers
> because these buffers a created from another process at the same time we
> loop in vop_stdfsync(). I can explain from my tests, where I use the
> following loop on a gjournaled partition:
> 
>    while true; do
>       cp -p bigfile bigfile.tmp
>       rm bigfile
>       mv bigfile.tmp bigfile
>    done
> 
> When g_journal_switcher starts vfs_write_suspend() immediately after the
> rm command has started to do his "rm stuff" (ufs_inactive, ffs_truncate,
> ffs_indirtrunc at different levels, ffs_blkfree, ...) the we must loop
> (that means wait) in vop_stdfsync() until the rm process has finished
> his work. A lot of locking overhead is needed for coordination.
> Returning from bufobj_wwait() we always see one left dirty buffer (very
> seldom two), that is not optimal. Therefore I have tried the following
> patch (instead of bumping maxretry):
> 
> --- vfs_default.c.orig  2016-10-24 12:26:57.000000000 +0200
> +++ vfs_default.c       2017-09-15 12:30:44.792274000 +0200
> @@ -688,6 +688,8 @@
>                         bremfree(bp);
>                         bawrite(bp);
>                 }
> +               if( maxretry < 1000)
> +                       DELAY(waitns);
>                 BO_LOCK(bo);
>                 goto loop2;
>         }
> 
> with different values for waitns. If I run the testloop 5000 times on my
> testserver, the problem is triggered always round about 10 times. The
> results from several runs are given in the following table:
> 
>     waitns    max time   max loops
>     -------------------------------
>   no DELAY     0,5 sec    8650  (maxres = 100000)
>       1000     0,2 sec      24
>      10000     0,8 sec       3
>     100000     7,2 sec       3
> 
> "time" means spent time in vop_stdfsync() measured from entry to return
> by a dtrace script. "loops" means the number of times "--maxretry" is
> executed. I am not sure if DELAY() is the best way to wait or if waiting
> has other drawbacks. Anyway with DELAY() it does not take more than five
> iterazions to finish.
This is not explicitly stated in your message, but I suppose that the
vop_stdfsync() is called due to VOP_FSYNC(devvp, MNT_SUSPEND) call in
ffs_sync().  Am I right ?

If yes, then the solution is most likely to continue looping in the
vop_stdfsync() until there is no dirty buffers or the mount point
mnt_secondary_writes counter is zero. The pauses trick you tried might
be still useful, e.g. after some threshold of the performed loop
iterations.

Some problem with this suggestion is that vop_stdfsync(devvp) needs to
know that the vnode is devvp for some UFS mount.  The struct cdev,
acessible as v_rdev, has the pointer to struct mount.  You should be
accurate to not access freed or reused struct mount.