Date: Thu, 16 Dec 2021 14:58:23 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com>, Rick Macklem <rmacklem@freebsd.org> Cc: "src-committers@freebsd.org" <src-committers@freebsd.org>, "dev-commits-src-all@freebsd.org" <dev-commits-src-all@freebsd.org>, "dev-commits-src-main@freebsd.org" <dev-commits-src-main@freebsd.org> Subject: Re: git: 867c27c23a5c - main - nfscl: Change IO_APPEND writes to direct I/O Message-ID: <YQXPR0101MB09680896DE973A2A414FC83BDD779@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <Ybq/1Iz4/Yu9Ibil@kib.kiev.ua> References: <202112151639.1BFGdS2v011996@gitrepo.freebsd.org> <Ybq/1Iz4/Yu9Ibil@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Kostik wrote:=0A= >On Wed, Dec 15, 2021 at 04:39:28PM +0000, Rick Macklem wrote:=0A= >> The branch main has been updated by rmacklem:=0A= >>=0A= >> URL: https://cgit.FreeBSD.org/src/commit/?id=3D867c27c23a5c469b27611cf53= cc2390b5a193fa5=0A= >>=0A= >> commit 867c27c23a5c469b27611cf53cc2390b5a193fa5=0A= >> Author: Rick Macklem <rmacklem@FreeBSD.org>=0A= >> AuthorDate: 2021-12-15 16:35:48 +0000=0A= >> Commit: Rick Macklem <rmacklem@FreeBSD.org>=0A= >> CommitDate: 2021-12-15 16:35:48 +0000=0A= >>=0A= >> nfscl: Change IO_APPEND writes to direct I/O=0A= >>=0A= >> IO_APPEND writes have always been very slow over NFS, due to=0A= >> the need to acquire an up to date file size after flushing=0A= >> all writes to the NFS server.=0A= >>=0A= >> This patch switches the IO_APPEND writes to use direct I/O,=0A= >> bypassing the buffer cache. As such, flushing of writes=0A= >> normally only occurs when the open(..O_APPEND..) is done.=0A= >> It does imply that all writes must be done synchronously=0A= >> and must be committed to stable storage on the file server=0A= >> (NFSWRITE_FILESYNC).=0A= >>=0A= >> For a simple test program that does 10,000 IO_APPEND writes=0A= >> in a loop, performance improved significantly with this patch.=0A= >>=0A= >> For a UFS exported file system, the test ran 12x faster.=0A= >> This drops to 3x faster when the open(2)/close(2) are done=0A= >> for each loop iteration.=0A= >> For a ZFS exported file system, the test ran 40% faster.=0A= >>=0A= >> The much smaller improvement may have been because the ZFS=0A= >> file system I tested against does not have a ZIL log and=0A= >> does have "sync" enabled.=0A= >>=0A= >> Note that IO_APPEND write performance is still much slower=0A= >> than when done on local file systems.=0A= >>=0A= >> Although this is a simple patch, it does result in a=0A= >> significant semantics change, so I have given it a=0A= >> large MFC time.=0A= >=0A= >How is the buffer cache coherency is handled then?=0A= >Imagine that other process either reads from this file, or even have it=0A= >mapped. What does ensure that reads and page cache see the data written= =0A= >by direct path?=0A= =0A= Well, for the buffer cache case, there is code near the beginning of=0A= ncl_write() (the NFS VOP_WRITE()) that calls ncl_vinvalbuf() for the=0A= IO_APPEND case. As such, any data in the buffer cache gets invalidated=0A= whenever an Append write occurs.=0A= =0A= But, now that I look at it, it does not do anything w.r.t. mmap'd files.=0A= (The direct I/O stuff has been there for a long time, but it isn't enabled= =0A= by default, so it probably doesn't get tested much. Also, it has a sysctl= =0A= that allows mmap for direct I/O, which is enabled by default. It causes=0A= getpage/putpage to fail if it is not enabled.)=0A= =0A= So, it looks like code to invalidate pages needs to be done along with=0A= the ncl_vinvalbuf()?=0A= --> I'll come up with a patch and then get you to review it.=0A= =0A= Thanks for pointing this out, rick=0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB09680896DE973A2A414FC83BDD779>