Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Dec 2021 14:58:23 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>, Rick Macklem <rmacklem@freebsd.org>
Cc:        "src-committers@freebsd.org" <src-committers@freebsd.org>, "dev-commits-src-all@freebsd.org" <dev-commits-src-all@freebsd.org>, "dev-commits-src-main@freebsd.org" <dev-commits-src-main@freebsd.org>
Subject:   Re: git: 867c27c23a5c - main - nfscl: Change IO_APPEND writes to direct I/O
Message-ID:  <YQXPR0101MB09680896DE973A2A414FC83BDD779@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <Ybq/1Iz4/Yu9Ibil@kib.kiev.ua>
References:  <202112151639.1BFGdS2v011996@gitrepo.freebsd.org> <Ybq/1Iz4/Yu9Ibil@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote:=0A=
>On Wed, Dec 15, 2021 at 04:39:28PM +0000, Rick Macklem wrote:=0A=
>> The branch main has been updated by rmacklem:=0A=
>>=0A=
>> URL: https://cgit.FreeBSD.org/src/commit/?id=3D867c27c23a5c469b27611cf53=
cc2390b5a193fa5=0A=
>>=0A=
>> commit 867c27c23a5c469b27611cf53cc2390b5a193fa5=0A=
>> Author:     Rick Macklem <rmacklem@FreeBSD.org>=0A=
>> AuthorDate: 2021-12-15 16:35:48 +0000=0A=
>> Commit:     Rick Macklem <rmacklem@FreeBSD.org>=0A=
>> CommitDate: 2021-12-15 16:35:48 +0000=0A=
>>=0A=
>>     nfscl: Change IO_APPEND writes to direct I/O=0A=
>>=0A=
>>     IO_APPEND writes have always been very slow over NFS, due to=0A=
>>     the need to acquire an up to date file size after flushing=0A=
>>     all writes to the NFS server.=0A=
>>=0A=
>>     This patch switches the IO_APPEND writes to use direct I/O,=0A=
>>     bypassing the buffer cache.  As such, flushing of writes=0A=
>>     normally only occurs when the open(..O_APPEND..) is done.=0A=
>>     It does imply that all writes must be done synchronously=0A=
>>     and must be committed to stable storage on the file server=0A=
>>     (NFSWRITE_FILESYNC).=0A=
>>=0A=
>>     For a simple test program that does 10,000 IO_APPEND writes=0A=
>>     in a loop, performance improved significantly with this patch.=0A=
>>=0A=
>>     For a UFS exported file system, the test ran 12x faster.=0A=
>>     This drops to 3x faster when the open(2)/close(2) are done=0A=
>>     for each loop iteration.=0A=
>>     For a ZFS exported file system, the test ran 40% faster.=0A=
>>=0A=
>>     The much smaller improvement may have been because the ZFS=0A=
>>     file system I tested against does not have a ZIL log and=0A=
>>     does have "sync" enabled.=0A=
>>=0A=
>>     Note that IO_APPEND write performance is still much slower=0A=
>>     than when done on local file systems.=0A=
>>=0A=
>>     Although this is a simple patch, it does result in a=0A=
>>     significant semantics change, so I have given it a=0A=
>>     large MFC time.=0A=
>=0A=
>How is the buffer cache coherency is handled then?=0A=
>Imagine that other process either reads from this file, or even have it=0A=
>mapped.  What does ensure that reads and page cache see the data written=
=0A=
>by direct path?=0A=
=0A=
Well, for the buffer cache case, there is code near the beginning of=0A=
ncl_write() (the NFS VOP_WRITE()) that calls ncl_vinvalbuf() for the=0A=
IO_APPEND case. As such, any data in the buffer cache gets invalidated=0A=
whenever an Append write occurs.=0A=
=0A=
But, now that I look at it, it does not do anything w.r.t. mmap'd files.=0A=
(The direct I/O stuff has been there for a long time, but it isn't enabled=
=0A=
 by default, so it probably doesn't get tested much. Also, it has a sysctl=
=0A=
 that allows mmap for direct I/O, which is enabled by default. It causes=0A=
 getpage/putpage to fail if it is not enabled.)=0A=
=0A=
So, it looks like code to invalidate pages needs to be done along with=0A=
the ncl_vinvalbuf()?=0A=
--> I'll come up with a patch and then get you to review it.=0A=
=0A=
Thanks for pointing this out, rick=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB09680896DE973A2A414FC83BDD779>