Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Aug 2022 22:19:56 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: SEEK_DATA/SEEK_HOLE with vnode locked
Message-ID:  <YT4PR01MB9736D37382DF089F784C3C82DD6E9@YT4PR01MB9736.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <YwJXuz5DsuOmyA6t@kib.kiev.ua>
References:  <YQBPR0101MB97420AD41791E544519A0A2DDD659@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM> <YvQ7MYXPl0AugojS@kib.kiev.ua> <YT4PR01MB9736B24FDE64C945C2C9EC8EDD6F9@YT4PR01MB9736.CANPRD01.PROD.OUTLOOK.COM> <YwJXuz5DsuOmyA6t@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov <kostikbel@gmail.com> wrote:=0A=
> On Sun, Aug 21, 2022 at 12:02:48AM +0000, Rick Macklem wrote:=0A=
> > Just to summarize this...=0A=
> > I was able to do a VOP_SEEK() which would be called with a=0A=
> > LK_SHARED locked vnode and it seemed to work fine.=0A=
> >=0A=
> > However, ReadPlus (which is like Read, but allows for=0A=
> > holes to be represented as <offset, length> in the reply=0A=
> > instead of a stream of 0 bytes) seems to be a performance=0A=
> > dud.=0A=
> >=0A=
> > I was surprised how poorly it performed compares to ordinary=0A=
> > Read. Typically it would take 60% longer to read a file. I tried=0A=
> > sparse and non-sparse files of various sizes and they always=0A=
> > took longer. (If I disabled SEEK_DATA/SEEK_HOLE in the server=0A=
> > code, so it never actually did holes, it worked comparably to=0A=
> > regular Read, so somehow the overhead of doing SEEK_DATA/SEEK_HOLE=0A=
> > was a big performance hit. It was using LK_SHARED locks, so=0A=
> > it wasn't serializing the reads, but I don't really know why it=0A=
> > performed so poorly?)=0A=
> What filesystem did you used on server?=0A=
The 60% slower was for tests like this with UFS:=0A=
- I created a file with a 1Gbyte hole, followed by 1Gbyte of data.=0A=
- Then I read the file with "time dd if=3D<file> of=3D/dev/null bs=3D10M"=
=0A=
  after remounting over NFS (to avoid NFS client caching).=0A=
Here's the elapsed time for 4 runs for a UFS exported fs:=0A=
Read                              ReadPlus=0A=
20.4, 4.3, 4.6, 4.3            18.7, 7.6, 7.7, 7.3=0A=
(The first run was right after booting, so there was nothing=0A=
 cached within UFS.)=0A=
--> So, as you can see, it took about 60% longer via ReadPlus.=0A=
=0A=
Now, what about the same test on an exported ZFS fs:=0A=
Read                                ReadPlus=0A=
6.4, 5.7, 5.6, 5.4                110.8, 113.3, 110.7, 110.9=0A=
--> Yep, only about 20 times (or 2000% longer).=0A=
=0A=
For a kernel build over NFS, it took about 70% longer=0A=
when on a ZFS exported fs (I can't remember the UFS=0A=
number, but it was significantly longer.)=0A=
=0A=
So, yes, ZFS is a lot worse, but UFS is bad enough that=0A=
I can't imagine anyone using ReadPlus instead of ordinary=0A=
Read?=0A=
=0A=
LANs have gobs of bandwidth these days. WANs might=0A=
benefit from the lack of long streams of 0 bytes, but some=0A=
(like my little DSL modem for my internet connection) will=0A=
compress them out anyhow, I think?=0A=
=0A=
> >=0A=
> > Anyhow, unless the performance issue gets resolved, there is=0A=
> > no reason to commit the code to FreeBSD's main.=0A=
> > (NFSv4.2 operations, like ReadPlus, are all optional and are not=0A=
> >  required for an RFC conformant implementation.)=0A=
> =0A=
> Why not commit?  It might make sense to add it, but guard under some=0A=
> knob.=0A=
Commit it with a "never use this, performance is terrible" doesn't=0A=
make a lot of sense to me, unless the ZFS performance issue=0A=
were somehow resolved.=0A=
=0A=
I am now actually concerned about copy_file_range(2), which uses=0A=
SEEK_HOLE/SEEK_DATA. There is a patch under review that at least=0A=
increases the blocksize for ZFS, but the effect of disabling the use of=0A=
SEEK_HOLE/SEEK_DATA in copy_file_range(2) also needs to be=0A=
explored.=0A=
--> Retaining holes as unallocated regions is nice, but at the very=0A=
      least, it could compare va_size with va_bytes to decide if there=0A=
      are holes worth looking for.=0A=
=0A=
rick=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YT4PR01MB9736D37382DF089F784C3C82DD6E9>