Date: Fri, 05 Nov 2010 18:16:57 -0400 From: jhell <jhell@DataIX.net> To: Andriy Gapon <avg@freebsd.org> Cc: Alan Cox <alc@freebsd.org>, Pawel Jakub Dawidek <pjd@freebsd.org>, freebsd-fs@freebsd.org Subject: Re: vop_getpages for zfs Message-ID: <4CD48259.6030402@DataIX.net> In-Reply-To: <4CD2E5AA.1090208@freebsd.org> References: <4C91F031.1010801@freebsd.org> <20101010134717.GA59922@deviant.kiev.zoral.com.ua> <4CD2E5AA.1090208@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/04/2010 12:56, Andriy Gapon wrote: > on 10/10/2010 16:47 Kostik Belousov said the following: >> I think that readahead requests should be satisfied, if possible. Esp. >> in the case the readahead does not cause additional i/o. > > So, per Kostik's suggestion, here is patch for your review and testing: > http://people.freebsd.org/~avg/zfs_getpages.2.diff > > The patch tries to fill as many of the passed in pages as possible as long as that > can be done from a block (or even two blocks) that has to be read to fill the > requested page. > This means that the optimization kicks in only if a vnode has a block size greater > than the page size. > I mentioned "two blocks" above because I tried to optimize the case when, for > example, block size is 6KB and page size is 4KB and the requested page covers > parts of two adjacent blocks (e.g. range from 4KB to 8KB). Not sure if this was > worth the trouble. > > In other words, the code tries to avoid reading more blocks than is strictly > required, because additional blocks may be far apart on a disk and thus read > latency could get increased. > > You will also notice that I added vop_bmap method to ZFS vops. > This is needed to satisfy logic in vnode_pager_haspage(); otherwise, for example, > vm_fault() would never request any additional pages besides the required one. > zfs_bmap() implementation is really simple - it returns zeros for a_runp and > a_runb (and some reasonable-ish but unused values in other fields), and > vnode_pager_haspage() figures out number of eligible pages from the block size > (f_iosize actually[*]) and the page size. > > Now, that ZFS has vop_getpages method, it is safe to provide this dummy-ish > implementation of vop_bmap, because this operation would not be called by any > other place but vnode_pager_haspage(). Code from vfs_bio.c and vfs_cluster.c is > never called for ZFS, because it doesn't use the buffer cache layer. > > [*] There is a few interesting questions with respect to f_iosize usage in > vnode_pager_haspage() and ZFS. > For one, ZFS has a variable block size and f_iosize for ZFS has a currently > configured maximum block ('record' in ZFS terms) size. This should not result in > non-optimal behavior for files with smaller block sizes, because those files would > have smaller sizes too (within the block size). > Unless, of course, recordsize property is modified after some files are created on > a ZFS filesystem. In that case there could be large files with some smaller > (previously maximum) block size; or files with a block size larger than the > current recordsize == f_iosize. So, such situations may lead to sub-optimal > performance. > Another issue related to the above is that f_iosize for ZFS can change on the fly > when an administrators runs zfs set recordsize=NNN, and that may happen > concurrently with vnode_pager_haspage() using f_iosize for its calculations. > Not sure how to handle this correctly. > > I will appreciate your comment comments, test reports, reviews and suggestions. > Thanks a lot! Hey Andriy Thought I would give you a heads up on this after seeing the post about zfs_getpages.diff. I patched up to this before after seeing it posted to <you_know_where>@ and got reliable dumpage from it. Basically a vm_page_unwire fault or something like that. I believe the following is the backtrace from that. (kgdb) #0 doadump () at pcpu.h:231 #1 0x80675251 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419 #2 0x806754e5 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:592 #3 0x808e27ce in vm_page_unwire (m=0x816b46e0, activate=1) at /usr/src/sys/vm/vm_page.c:1564 #4 0x808d123a in vm_fault_unwire (map=0x81690088, start=2568372224, end=2568433664, fictitious=0) at /usr/src/sys/vm/vm_fault.c:1123 #5 0x808d8a33 in vm_map_delete (map=0x81690088, start=2568372224, end=2568433664) at /usr/src/sys/vm/vm_map.c:2619 #6 0x808d8d0d in vm_map_remove (map=0x81690088, start=2568372224, end=Variable "end" is not available. ) at /usr/src/sys/vm/vm_map.c:2801 #7 0x808d6360 in kmem_free (map=0x81690088, addr=2568372224, size=61440) at /usr/src/sys/vm/vm_kern.c:211 #8 0x808cb601 in page_free (mem=0x99164000, size=61440, flags=34 '"') at /usr/src/sys/vm/uma_core.c:1069 #9 0x808ccf92 in uma_large_free (slab=0x85be7e6c) at /usr/src/sys/vm/uma_core.c:3021 #10 0x8065f7a5 in free (addr=0x99164000, mtp=0x80d8e114) at /usr/src/sys/kern/kern_malloc.c:506 #11 0x80cd49db in zil_lwb_write_done () from /boot/kernel/zfs.ko #12 0x80cd99b1 in zio_done () from /boot/kernel/zfs.ko #13 0x80cd7d3a in zio_execute () from /boot/kernel/zfs.ko #14 0x80cd7f2e in zio_notify_parent () from /boot/kernel/zfs.ko #15 0x80cd9a31 in zio_done () from /boot/kernel/zfs.ko #16 0x80cd7d3a in zio_execute () from /boot/kernel/zfs.ko #17 0x80c6656b in taskq_run_safe () from /boot/kernel/zfs.ko #18 0x806af812 in taskqueue_run (queue=0x85a0ac40) at /usr/src/sys/kern/subr_taskqueue.c:239 #19 0x806afa07 in taskqueue_thread_loop (arg=0x85a3f830) at /usr/src/sys/kern/subr_taskqueue.c:360 #20 0x80647377 in fork_exit (callout=0x806af94a <taskqueue_thread_loop>, arg=0x85a3f830, frame=0xb4439d38) at /usr/src/sys/kern/kern_fork.c:845 #21 0x809126a4 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:273 (kgdb) And that coincided with the dates that I added the patch once seeing it on the list. changeset: 351:f1ca4eb51520 user: J. Hellenthal <jhell@DataIX.net> date: Sun Oct 10 22:57:24 2010 -0400 summary: Remove the zfs_getpages patch from Andriy Gapon changeset: 350:bb885c047f0a user: J. Hellenthal <jhell@DataIX.net> date: Sun Oct 10 22:27:31 2010 -0400 summary: zfs_getpages improvement from Andriy Gapon If you would like I can patch back up to this patch to provide more information if its needed, but at the moment I do not have it available nor do I have the core that was generated. -- jhell,v
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CD48259.6030402>