Date: Sat, 30 Oct 2010 10:53:36 +0300 From: Andriy Gapon <avg@icyb.net.ua> To: Artemiev Igor <ai@kliksys.ru> Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: 8.1-STABLE: zfs and sendfile: problem still exists Message-ID: <4CCBCF00.2030904@icyb.net.ua> In-Reply-To: <4CCADD37.7000306@icyb.net.ua> References: <3D1C350B94A44E5D95BAA1596D1EBF13@vosz.local> <20101029090417.GA17537@two.kliksys.ru> <4CCABFC2.3040701@icyb.net.ua> <4CCADD37.7000306@icyb.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
on 29/10/2010 17:41 Andriy Gapon said the following: > on 29/10/2010 15:36 Andriy Gapon said the following: >> on 29/10/2010 12:04 Artemiev Igor said the following: >>> Yep, this problem exists. You may workaround it via bumping up >>> net.inet.tcp.sendspace up to 128k. zfs sendfile is very ineffective. I have >>> made a small investigation via DTrace, it reads MAXBSIZE chunks, but map in vm >>> only one page (4K). I.e. if you have a file with size 512K, sendfile make >>> calls freebsd_zfs_read 128 times. >> >> What svn revision of FreeBSD source tree did you test? >> > > Ah, I think I see what's going on. > Either sendfile should (have an option to) use VOP_GETPAGES to request data or ZFS > mappedread should use vm_grab_page instead of vm_lookup_page for UIO_NOCOPY case. > Currently ZFS would read a whole FS block into ARC, but populate only one page > with data and for the rest it would just wastefully do uiomove(UIO_NOCOPY) from > ARC data. > So, e.g. zpool iostat would show that there are only few actual reads from a pool. > The rest of the time must be spent churning over the data already in ARC and > doing page-per-VOP_READ copies from it. Hmm, I investigated the issue some more and now I wouldn't put all the blame on ZFS. Indeed, perhaps ZFS is very inefficient here, perhaps it does extra looping and extra copying. However those operations should not lead to such a significant slowdown, but mostly to an increased CPU usage. So, it looks that sendfile spends most of the time in sbwait(). Of course, "erratic" behavior of ZFS does contribute to that. It's this code in kern_sendfile that gets triggered by ZFS: if (pg->valid && vm_page_is_valid(pg, pgoff, xfsize)) VM_OBJECT_UNLOCK(obj); else if (m != NULL) error = EAGAIN; /* send what we already got */ else ... Essentially, data is not only read from ZFS page by page, but it is also mostly sent with page-sized chunk at a time. P.S. just stating the obvious, kind of :-) -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CCBCF00.2030904>