Date: Thu, 30 Apr 2015 17:24:08 +0300 From: Gleb Smirnoff <glebius@FreeBSD.org> To: kib@FreeBSD.org, alc@FreeBSD.org Cc: arch@FreeBSD.org Subject: more strict KPI for vm_pager_get_pages() Message-ID: <20150430142408.GS546@nginx.com>
next in thread | raw e-mail | index | archive | help
--45Z9DzgjV8m4Oswq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi! The reason to write down this patch emerges from the projects/sendfile branch, where vm_pager_get_pages() is used in the sendfile(2) system call. Although the new sendfile works flawlessly, it makes some assumptions about vnode_pager that theoretically may not be valid, however always hold in our current code. Going deeper in the problem I have found more important points, which yielded in the suggested patch. To start, let me display the current KPI assumptions: 1) vm_pager_get_pages() works on an array of consequtive array of pages. Pindex of (n+1)-th pages must be pindex of n-th + 1. One page is special, it is called reqpage. 2) vm_pager_get_pages() guarantees to swapin only the reqpage, and may skip or fail other pages for different reasons, that may vary from pager to pager. 3) There also is function vm_pager_has_page(), which reports availability of a page at given index in the pager, and also provides hints on how many consequtive pages before this one and after this one can be swapped in in single pager request. Most pagers return zeros in these hints. The vnode pager for UFS returns a strong promise, that one can later utilize in vm_pager_get_pages(). 4) All pages must be busied on enter. On exit only reqpage will be left busied. The KPI doesn't guarantee that rest of the pages is still in place. The pager usually calls vm_page_readahead_finish() on them, which can either free, or put the page on active/inactive queue, using quite a strange approach to choose a queue. 5) The pages must not be wired, since vm_page_free() may be called on them. However, this is violated by several consumers of KPI, relying on lack of errors in the pager. Moreover, the swap pager has a special function to skip wired pages, while doing the sweep, to avoid this problem. So, passing wired pages to swapper is OK, while to the reset is not. 6) Pagers may replace a page in the object with a new one. The sg_pager actually does that. To protect from this event, consumers of vm_pager_get_pages() always run vm_page_lookup() over the array of pages to relookup the pages. However, not all consumers do this. Speaking of pagers and their consumers: - 11 consumers request array of size 1, a single page - 3 consumers actually request array My suggestion is to change the KPI assumptions to the following: 1) There is no reqpage. All pages are entered busied, all pages are returned busied and validated. If pager fails to validate all pages it must return error. 2) The consumer (not the pager!) is to decide what to do with the pages: vm_page_active, vm_page_deactivate, vm_page_flash or just vm_page_free them. The consumer also unbusies pages, if it wants to. The consumer is free to wire pages before the call. 3) Consumers must first query the pager via vm_pager_has_page(), and use the after/before hints to limit the size of the requested pages array. 4) In case if pager replaces pages, it must also update the array, so that consumer doesn't need to do relookup. Doing this sweep, I also noticed that all pagers have a copy-pasted code of zeroing invalid regions of partially valid pages. Also, many pagers got a set of assertions copy and pasted from each other. So, I decided to un-inline the vm_pager_get_pages(), bring it to the vm_pager.c file and gather all these copy-pastes into one place. The suggested patch is attached. As expected, it simplifies and removes quite a lot of code. Right now it is tested on UFS only, testing NFS and ZFS is on my list. There is one panic known, but it seems unrelated, and Peter pho@ says that once it has been seen before. -- Totus tuus, Glebius. --45Z9DzgjV8m4Oswq Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="vm_pager_get_pages-new-KPI.diff" Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (revision 282213) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (working copy) @@ -5712,12 +5712,12 @@ ioflags(int ioflags) } static int -zfs_getpages(struct vnode *vp, vm_page_t *m, int count, int reqpage) +zfs_getpages(struct vnode *vp, vm_page_t *m, int count) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; objset_t *os = zp->z_zfsvfs->z_os; - vm_page_t mfirst, mlast, mreq; + vm_page_t mlast; vm_object_t object; caddr_t va; struct sf_buf *sf; @@ -5730,80 +5730,27 @@ static int ZFS_VERIFY_ZP(zp); pcount = OFF_TO_IDX(round_page(count)); - mreq = m[reqpage]; - object = mreq->object; + object = m[0]->object; + mlast = m[pcount - 1]; error = 0; - KASSERT(vp->v_object == object, ("mismatching object")); - - if (pcount > 1 && zp->z_blksz > PAGESIZE) { - startoff = rounddown(IDX_TO_OFF(mreq->pindex), zp->z_blksz); - reqstart = OFF_TO_IDX(round_page(startoff)); - if (reqstart < m[0]->pindex) - reqstart = 0; - else - reqstart = reqstart - m[0]->pindex; - endoff = roundup(IDX_TO_OFF(mreq->pindex) + PAGE_SIZE, - zp->z_blksz); - reqend = OFF_TO_IDX(trunc_page(endoff)) - 1; - if (reqend > m[pcount - 1]->pindex) - reqend = m[pcount - 1]->pindex; - reqsize = reqend - m[reqstart]->pindex + 1; - KASSERT(reqstart <= reqpage && reqpage < reqstart + reqsize, - ("reqpage beyond [reqstart, reqstart + reqsize[ bounds")); - } else { - reqstart = reqpage; - reqsize = 1; - } - mfirst = m[reqstart]; - mlast = m[reqstart + reqsize - 1]; - - zfs_vmobject_wlock(object); - - for (i = 0; i < reqstart; i++) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - for (i = reqstart + reqsize; i < pcount; i++) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - - if (mreq->valid && reqsize == 1) { - if (mreq->valid != VM_PAGE_BITS_ALL) - vm_page_zero_invalid(mreq, TRUE); - zfs_vmobject_wunlock(object); + if (IDX_TO_OFF(mlast->pindex) >= + object->un_pager.vnp.vnp_size) { ZFS_EXIT(zfsvfs); - return (zfs_vm_pagerret_ok); + return (zfs_vm_pagerret_bad); } PCPU_INC(cnt.v_vnodein); PCPU_ADD(cnt.v_vnodepgsin, reqsize); - if (IDX_TO_OFF(mreq->pindex) >= object->un_pager.vnp.vnp_size) { - for (i = reqstart; i < reqstart + reqsize; i++) { - if (i != reqpage) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - } - zfs_vmobject_wunlock(object); - ZFS_EXIT(zfsvfs); - return (zfs_vm_pagerret_bad); - } - lsize = PAGE_SIZE; if (IDX_TO_OFF(mlast->pindex) + lsize > object->un_pager.vnp.vnp_size) - lsize = object->un_pager.vnp.vnp_size - IDX_TO_OFF(mlast->pindex); + lsize = object->un_pager.vnp.vnp_size - + IDX_TO_OFF(mlast->pindex); - zfs_vmobject_wunlock(object); - - for (i = reqstart; i < reqstart + reqsize; i++) { + for (i = 0; i < pcount; i++) { size = PAGE_SIZE; - if (i == (reqstart + reqsize - 1)) + if (i == pcount - 1) size = lsize; va = zfs_map_page(m[i], &sf); error = dmu_read(os, zp->z_id, IDX_TO_OFF(m[i]->pindex), @@ -5812,21 +5759,15 @@ static int bzero(va + size, PAGE_SIZE - size); zfs_unmap_page(sf); if (error != 0) - break; + goto out; } zfs_vmobject_wlock(object); - - for (i = reqstart; i < reqstart + reqsize; i++) { - if (!error) - m[i]->valid = VM_PAGE_BITS_ALL; - KASSERT(m[i]->dirty == 0, ("zfs_getpages: page %p is dirty", m[i])); - if (i != reqpage) - vm_page_readahead_finish(m[i]); - } - + for (i = 0; i < pcount; i++) + m[i]->valid = VM_PAGE_BITS_ALL; zfs_vmobject_wunlock(object); +out: ZFS_ACCESSTIME_STAMP(zfsvfs, zp); ZFS_EXIT(zfsvfs); return (error ? zfs_vm_pagerret_error : zfs_vm_pagerret_ok); @@ -5842,7 +5783,7 @@ zfs_freebsd_getpages(ap) } */ *ap; { - return (zfs_getpages(ap->a_vp, ap->a_m, ap->a_count, ap->a_reqpage)); + return (zfs_getpages(ap->a_vp, ap->a_m, ap->a_count)); } static int Index: sys/dev/drm2/i915/i915_gem.c =================================================================== --- sys/dev/drm2/i915/i915_gem.c (revision 282213) +++ sys/dev/drm2/i915/i915_gem.c (working copy) @@ -3174,8 +3174,7 @@ i915_gem_wire_page(vm_object_t object, vm_pindex_t m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL); if (m->valid != VM_PAGE_BITS_ALL) { if (vm_pager_has_page(object, pindex, NULL, NULL)) { - rv = vm_pager_get_pages(object, &m, 1, 0); - m = vm_page_lookup(object, pindex); + rv = vm_pager_get_pages(object, &m, 1); if (m == NULL) return (NULL); if (rv != VM_PAGER_OK) { Index: sys/dev/drm2/ttm/ttm_tt.c =================================================================== --- sys/dev/drm2/ttm/ttm_tt.c (revision 282213) +++ sys/dev/drm2/ttm/ttm_tt.c (working copy) @@ -291,7 +291,7 @@ int ttm_tt_swapin(struct ttm_tt *ttm) from_page = vm_page_grab(obj, i, VM_ALLOC_NORMAL); if (from_page->valid != VM_PAGE_BITS_ALL) { if (vm_pager_has_page(obj, i, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &from_page, 1, 0); + rv = vm_pager_get_pages(obj, &from_page, 1); if (rv != VM_PAGER_OK) { vm_page_lock(from_page); vm_page_free(from_page); Index: sys/dev/md/md.c =================================================================== --- sys/dev/md/md.c (revision 282213) +++ sys/dev/md/md.c (working copy) @@ -835,7 +835,7 @@ mdstart_swap(struct md_s *sc, struct bio *bp) if (m->valid == VM_PAGE_BITS_ALL) rv = VM_PAGER_OK; else - rv = vm_pager_get_pages(sc->object, &m, 1, 0); + rv = vm_pager_get_pages(sc->object, &m, 1); if (rv == VM_PAGER_ERROR) { vm_page_xunbusy(m); break; @@ -858,7 +858,7 @@ mdstart_swap(struct md_s *sc, struct bio *bp) } } else if (bp->bio_cmd == BIO_WRITE) { if (len != PAGE_SIZE && m->valid != VM_PAGE_BITS_ALL) - rv = vm_pager_get_pages(sc->object, &m, 1, 0); + rv = vm_pager_get_pages(sc->object, &m, 1); else rv = VM_PAGER_OK; if (rv == VM_PAGER_ERROR) { @@ -874,7 +874,7 @@ mdstart_swap(struct md_s *sc, struct bio *bp) m->valid = VM_PAGE_BITS_ALL; } else if (bp->bio_cmd == BIO_DELETE) { if (len != PAGE_SIZE && m->valid != VM_PAGE_BITS_ALL) - rv = vm_pager_get_pages(sc->object, &m, 1, 0); + rv = vm_pager_get_pages(sc->object, &m, 1); else rv = VM_PAGER_OK; if (rv == VM_PAGER_ERROR) { Index: sys/fs/fuse/fuse_vnops.c =================================================================== --- sys/fs/fuse/fuse_vnops.c (revision 282213) +++ sys/fs/fuse/fuse_vnops.c (working copy) @@ -1761,29 +1761,6 @@ fuse_vnop_getpages(struct vop_getpages_args *ap) npages = btoc(count); /* - * If the requested page is partially valid, just return it and - * allow the pager to zero-out the blanks. Partially valid pages - * can only occur at the file EOF. - */ - - VM_OBJECT_WLOCK(vp->v_object); - fuse_vm_page_lock_queues(); - if (pages[ap->a_reqpage]->valid != 0) { - for (i = 0; i < npages; ++i) { - if (i != ap->a_reqpage) { - fuse_vm_page_lock(pages[i]); - vm_page_free(pages[i]); - fuse_vm_page_unlock(pages[i]); - } - } - fuse_vm_page_unlock_queues(); - VM_OBJECT_WUNLOCK(vp->v_object); - return 0; - } - fuse_vm_page_unlock_queues(); - VM_OBJECT_WUNLOCK(vp->v_object); - - /* * We use only the kva address for the buffer, but this is extremely * convienient and fast. */ @@ -1811,17 +1788,6 @@ fuse_vnop_getpages(struct vop_getpages_args *ap) if (error && (uio.uio_resid == count)) { FS_DEBUG("error %d\n", error); - VM_OBJECT_WLOCK(vp->v_object); - fuse_vm_page_lock_queues(); - for (i = 0; i < npages; ++i) { - if (i != ap->a_reqpage) { - fuse_vm_page_lock(pages[i]); - vm_page_free(pages[i]); - fuse_vm_page_unlock(pages[i]); - } - } - fuse_vm_page_unlock_queues(); - VM_OBJECT_WUNLOCK(vp->v_object); return VM_PAGER_ERROR; } /* @@ -1862,8 +1828,6 @@ fuse_vnop_getpages(struct vop_getpages_args *ap) */ ; } - if (i != ap->a_reqpage) - vm_page_readahead_finish(m); } fuse_vm_page_unlock_queues(); VM_OBJECT_WUNLOCK(vp->v_object); Index: sys/fs/nfsclient/nfs_clbio.c =================================================================== --- sys/fs/nfsclient/nfs_clbio.c (revision 282213) +++ sys/fs/nfsclient/nfs_clbio.c (working copy) @@ -129,23 +129,6 @@ ncl_getpages(struct vop_getpages_args *ap) npages = btoc(count); /* - * Since the caller has busied the requested page, that page's valid - * field will not be changed by other threads. - */ - vm_page_assert_xbusied(pages[ap->a_reqpage]); - - /* - * If the requested page is partially valid, just return it and - * allow the pager to zero-out the blanks. Partially valid pages - * can only occur at the file EOF. - */ - if (pages[ap->a_reqpage]->valid != 0) { - vm_pager_free_nonreq(object, pages, ap->a_reqpage, npages, - FALSE); - return (VM_PAGER_OK); - } - - /* * We use only the kva address for the buffer, but this is extremely * convienient and fast. */ @@ -173,8 +156,6 @@ ncl_getpages(struct vop_getpages_args *ap) if (error && (uio.uio_resid == count)) { ncl_printf("nfs_getpages: error %d\n", error); - vm_pager_free_nonreq(object, pages, ap->a_reqpage, npages, - FALSE); return (VM_PAGER_ERROR); } @@ -218,8 +199,6 @@ ncl_getpages(struct vop_getpages_args *ap) */ ; } - if (i != ap->a_reqpage) - vm_page_readahead_finish(m); } VM_OBJECT_WUNLOCK(object); return (0); Index: sys/fs/smbfs/smbfs_io.c =================================================================== --- sys/fs/smbfs/smbfs_io.c (revision 282213) +++ sys/fs/smbfs/smbfs_io.c (working copy) @@ -424,7 +424,7 @@ smbfs_getpages(ap) #ifdef SMBFS_RWGENERIC return vop_stdgetpages(ap); #else - int i, error, nextoff, size, toff, npages, count, reqpage; + int i, error, nextoff, size, toff, npages, count; struct uio uio; struct iovec iov; vm_offset_t kva; @@ -436,7 +436,7 @@ smbfs_getpages(ap) struct smbnode *np; struct smb_cred *scred; vm_object_t object; - vm_page_t *pages, m; + vm_page_t *pages; vp = ap->a_vp; if ((object = vp->v_object) == NULL) { @@ -451,29 +451,7 @@ smbfs_getpages(ap) pages = ap->a_m; count = ap->a_count; npages = btoc(count); - reqpage = ap->a_reqpage; - /* - * If the requested page is partially valid, just return it and - * allow the pager to zero-out the blanks. Partially valid pages - * can only occur at the file EOF. - */ - m = pages[reqpage]; - - VM_OBJECT_WLOCK(object); - if (m->valid != 0) { - for (i = 0; i < npages; ++i) { - if (i != reqpage) { - vm_page_lock(pages[i]); - vm_page_free(pages[i]); - vm_page_unlock(pages[i]); - } - } - VM_OBJECT_WUNLOCK(object); - return 0; - } - VM_OBJECT_WUNLOCK(object); - scred = smbfs_malloc_scred(); smb_makescred(scred, td, cred); @@ -500,17 +478,8 @@ smbfs_getpages(ap) relpbuf(bp, &smbfs_pbuf_freecnt); - VM_OBJECT_WLOCK(object); if (error && (uio.uio_resid == count)) { printf("smbfs_getpages: error %d\n",error); - for (i = 0; i < npages; i++) { - if (reqpage != i) { - vm_page_lock(pages[i]); - vm_page_free(pages[i]); - vm_page_unlock(pages[i]); - } - } - VM_OBJECT_WUNLOCK(object); return VM_PAGER_ERROR; } @@ -544,9 +513,6 @@ smbfs_getpages(ap) */ ; } - - if (i != reqpage) - vm_page_readahead_finish(m); } VM_OBJECT_WUNLOCK(object); return 0; Index: sys/fs/tmpfs/tmpfs_subr.c =================================================================== --- sys/fs/tmpfs/tmpfs_subr.c (revision 282213) +++ sys/fs/tmpfs/tmpfs_subr.c (working copy) @@ -1320,7 +1320,7 @@ tmpfs_reg_resize(struct vnode *vp, off_t newsize, struct tmpfs_mount *tmp; struct tmpfs_node *node; vm_object_t uobj; - vm_page_t m, ma[1]; + vm_page_t m; vm_pindex_t idx, newpages, oldpages; off_t oldsize; int base, rv; @@ -1368,9 +1368,7 @@ retry: VM_OBJECT_WLOCK(uobj); goto retry; } else if (m->valid != VM_PAGE_BITS_ALL) { - ma[0] = m; - rv = vm_pager_get_pages(uobj, ma, 1, 0); - m = vm_page_lookup(uobj, idx); + rv = vm_pager_get_pages(uobj, &m, 1); } else /* A cached page was reactivated. */ rv = VM_PAGER_OK; Index: sys/kern/kern_exec.c =================================================================== --- sys/kern/kern_exec.c (revision 282213) +++ sys/kern/kern_exec.c (working copy) @@ -920,8 +920,7 @@ int exec_map_first_page(imgp) struct image_params *imgp; { - int rv, i; - int initial_pagein; + int rv, i, after, initial_pagein; vm_page_t ma[VM_INITIAL_PAGEIN]; vm_object_t object; @@ -937,9 +936,18 @@ exec_map_first_page(imgp) #endif ma[0] = vm_page_grab(object, 0, VM_ALLOC_NORMAL); if (ma[0]->valid != VM_PAGE_BITS_ALL) { - initial_pagein = VM_INITIAL_PAGEIN; - if (initial_pagein > object->size) - initial_pagein = object->size; + if (!vm_pager_has_page(object, 0, NULL, &after)) { + vm_page_xunbusy(ma[0]); + vm_page_lock(ma[0]); + vm_page_free(ma[0]); + vm_page_unlock(ma[0]); + VM_OBJECT_WUNLOCK(object); + return (EIO); + } + initial_pagein = min(after, VM_INITIAL_PAGEIN); + KASSERT(initial_pagein <= object->size, + ("%s: initial_pagein %d object->size %ju", + __func__, initial_pagein, (uintmax_t )object->size)); for (i = 1; i < initial_pagein; i++) { if ((ma[i] = vm_page_next(ma[i - 1])) != NULL) { if (ma[i]->valid) @@ -954,19 +962,21 @@ exec_map_first_page(imgp) } } initial_pagein = i; - rv = vm_pager_get_pages(object, ma, initial_pagein, 0); - ma[0] = vm_page_lookup(object, 0); - if ((rv != VM_PAGER_OK) || (ma[0] == NULL)) { - if (ma[0] != NULL) { - vm_page_lock(ma[0]); - vm_page_free(ma[0]); - vm_page_unlock(ma[0]); + rv = vm_pager_get_pages(object, ma, initial_pagein); + if (rv != VM_PAGER_OK) { + for (i = 0; i < initial_pagein; i++) { + vm_page_xunbusy(ma[i]); + vm_page_lock(ma[i]); + vm_page_free(ma[i]); + vm_page_unlock(ma[i]); } VM_OBJECT_WUNLOCK(object); return (EIO); } - } - vm_page_xunbusy(ma[0]); + } else + initial_pagein = 1; + for (i = 0; i < initial_pagein; i++) + vm_page_xunbusy(ma[i]); vm_page_lock(ma[0]); vm_page_hold(ma[0]); vm_page_activate(ma[0]); Index: sys/kern/uipc_shm.c =================================================================== --- sys/kern/uipc_shm.c (revision 282213) +++ sys/kern/uipc_shm.c (working copy) @@ -186,15 +186,7 @@ uiomove_object_page(vm_object_t obj, size_t len, s m = vm_page_grab(obj, idx, VM_ALLOC_NORMAL); if (m->valid != VM_PAGE_BITS_ALL) { if (vm_pager_has_page(obj, idx, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &m, 1, 0); - m = vm_page_lookup(obj, idx); - if (m == NULL) { - printf( - "uiomove_object: vm_obj %p idx %jd null lookup rv %d\n", - obj, idx, rv); - VM_OBJECT_WUNLOCK(obj); - return (EIO); - } + rv = vm_pager_get_pages(obj, &m, 1); if (rv != VM_PAGER_OK) { printf( "uiomove_object: vm_obj %p idx %jd valid %x pager error %d\n", @@ -421,7 +413,7 @@ static int shm_dotruncate(struct shmfd *shmfd, off_t length) { vm_object_t object; - vm_page_t m, ma[1]; + vm_page_t m; vm_pindex_t idx, nobjsize; vm_ooffset_t delta; int base, rv; @@ -463,12 +455,9 @@ retry: VM_WAIT; VM_OBJECT_WLOCK(object); goto retry; - } else if (m->valid != VM_PAGE_BITS_ALL) { - ma[0] = m; - rv = vm_pager_get_pages(object, ma, 1, - 0); - m = vm_page_lookup(object, idx); - } else + } else if (m->valid != VM_PAGE_BITS_ALL) + rv = vm_pager_get_pages(object, &m, 1); + else /* A cached page was reactivated. */ rv = VM_PAGER_OK; vm_page_lock(m); Index: sys/kern/uipc_syscalls.c =================================================================== --- sys/kern/uipc_syscalls.c (revision 282213) +++ sys/kern/uipc_syscalls.c (working copy) @@ -2024,12 +2024,9 @@ sendfile_readpage(vm_object_t obj, struct vnode *v VM_OBJECT_WLOCK(obj); } else { if (vm_pager_has_page(obj, pindex, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &m, 1, 0); + rv = vm_pager_get_pages(obj, &m, 1); SFSTAT_INC(sf_iocnt); - m = vm_page_lookup(obj, pindex); - if (m == NULL) - error = EIO; - else if (rv != VM_PAGER_OK) { + if (rv != VM_PAGER_OK) { vm_page_lock(m); vm_page_free(m); vm_page_unlock(m); Index: sys/kern/vfs_default.c =================================================================== --- sys/kern/vfs_default.c (revision 282213) +++ sys/kern/vfs_default.c (working copy) @@ -731,12 +731,11 @@ vop_stdgetpages(ap) struct vnode *a_vp; vm_page_t *a_m; int a_count; - int a_reqpage; } */ *ap; { return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, - ap->a_count, ap->a_reqpage, NULL, NULL); + ap->a_count, NULL, NULL); } static int @@ -744,8 +743,8 @@ vop_stdgetpages_async(struct vop_getpages_async_ar { int error; - error = VOP_GETPAGES(ap->a_vp, ap->a_m, ap->a_count, ap->a_reqpage); - ap->a_iodone(ap->a_arg, ap->a_m, ap->a_reqpage, error); + error = VOP_GETPAGES(ap->a_vp, ap->a_m, ap->a_count); + ap->a_iodone(ap->a_arg, ap->a_m, ap->a_count, error); return (error); } Index: sys/kern/vnode_if.src =================================================================== --- sys/kern/vnode_if.src (revision 282213) +++ sys/kern/vnode_if.src (working copy) @@ -472,7 +472,6 @@ vop_getpages { IN struct vnode *vp; IN vm_page_t *m; IN int count; - IN int reqpage; }; @@ -482,7 +481,6 @@ vop_getpages_async { IN struct vnode *vp; IN vm_page_t *m; IN int count; - IN int reqpage; IN vop_getpages_iodone_t *iodone; IN void *arg; }; Index: sys/sys/buf.h =================================================================== --- sys/sys/buf.h (revision 282213) +++ sys/sys/buf.h (working copy) @@ -124,14 +124,9 @@ struct buf { struct ucred *b_wcred; /* Write credentials reference. */ void *b_saveaddr; /* Original b_addr for physio. */ union { - TAILQ_ENTRY(buf) bu_freelist; /* (Q) */ - struct { - void (*pg_iodone)(void *, vm_page_t *, int, int); - int pg_reqpage; - } bu_pager; - } b_union; -#define b_freelist b_union.bu_freelist -#define b_pager b_union.bu_pager + TAILQ_ENTRY(buf) b_freelist; /* (Q) */ + void (*b_pgiodone)(void *, vm_page_t *, int, int); + }; union cluster_info { TAILQ_HEAD(cluster_list_head, buf) cluster_head; TAILQ_ENTRY(buf) cluster_entry; Index: sys/vm/default_pager.c =================================================================== --- sys/vm/default_pager.c (revision 282213) +++ sys/vm/default_pager.c (working copy) @@ -56,7 +56,7 @@ __FBSDID("$FreeBSD$"); static vm_object_t default_pager_alloc(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); static void default_pager_dealloc(vm_object_t); -static int default_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int default_pager_getpages(vm_object_t, vm_page_t *, int); static void default_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t default_pager_haspage(vm_object_t, vm_pindex_t, int *, @@ -121,11 +121,10 @@ default_pager_dealloc(object) * see a vm_page with assigned swap here. */ static int -default_pager_getpages(object, m, count, reqpage) +default_pager_getpages(object, m, count) vm_object_t object; vm_page_t *m; int count; - int reqpage; { return VM_PAGER_FAIL; } Index: sys/vm/device_pager.c =================================================================== --- sys/vm/device_pager.c (revision 282213) +++ sys/vm/device_pager.c (working copy) @@ -59,7 +59,7 @@ static void dev_pager_init(void); static vm_object_t dev_pager_alloc(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); static void dev_pager_dealloc(vm_object_t); -static int dev_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int dev_pager_getpages(vm_object_t, vm_page_t *, int); static void dev_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t dev_pager_haspage(vm_object_t, vm_pindex_t, int *, @@ -257,33 +257,27 @@ dev_pager_dealloc(object) } static int -dev_pager_getpages(vm_object_t object, vm_page_t *ma, int count, int reqpage) +dev_pager_getpages(vm_object_t object, vm_page_t *ma, int count) { - int error, i; + int error; + /* Since our putpages reports zero after/before, the count is 1. */ + KASSERT(count == 1, ("%s: count %d", __func__, count)); VM_OBJECT_ASSERT_WLOCKED(object); error = object->un_pager.devp.ops->cdev_pg_fault(object, - IDX_TO_OFF(ma[reqpage]->pindex), PROT_READ, &ma[reqpage]); + IDX_TO_OFF(ma[0]->pindex), PROT_READ, &ma[0]); VM_OBJECT_ASSERT_WLOCKED(object); - for (i = 0; i < count; i++) { - if (i != reqpage) { - vm_page_lock(ma[i]); - vm_page_free(ma[i]); - vm_page_unlock(ma[i]); - } - } - if (error == VM_PAGER_OK) { KASSERT((object->type == OBJT_DEVICE && - (ma[reqpage]->oflags & VPO_UNMANAGED) != 0) || + (ma[0]->oflags & VPO_UNMANAGED) != 0) || (object->type == OBJT_MGTDEVICE && - (ma[reqpage]->oflags & VPO_UNMANAGED) == 0), - ("Wrong page type %p %p", ma[reqpage], object)); + (ma[0]->oflags & VPO_UNMANAGED) == 0), + ("Wrong page type %p %p", ma[0], object)); if (object->type == OBJT_DEVICE) { TAILQ_INSERT_TAIL(&object->un_pager.devp.devp_pglist, - ma[reqpage], plinks.q); + ma[0], plinks.q); } } Index: sys/vm/phys_pager.c =================================================================== --- sys/vm/phys_pager.c (revision 282213) +++ sys/vm/phys_pager.c (working copy) @@ -137,7 +137,7 @@ phys_pager_dealloc(vm_object_t object) * Fill as many pages as vm_fault has allocated for us. */ static int -phys_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +phys_pager_getpages(vm_object_t object, vm_page_t *m, int count) { int i; @@ -152,13 +152,6 @@ static int ("phys_pager_getpages: partially valid page %p", m[i])); KASSERT(m[i]->dirty == 0, ("phys_pager_getpages: dirty page %p", m[i])); - /* The requested page must remain busy, the others not. */ - if (i == reqpage) { - vm_page_lock(m[i]); - vm_page_flash(m[i]); - vm_page_unlock(m[i]); - } else - vm_page_xunbusy(m[i]); } return (VM_PAGER_OK); } Index: sys/vm/sg_pager.c =================================================================== --- sys/vm/sg_pager.c (revision 282213) +++ sys/vm/sg_pager.c (working copy) @@ -49,7 +49,7 @@ __FBSDID("$FreeBSD$"); static vm_object_t sg_pager_alloc(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); static void sg_pager_dealloc(vm_object_t); -static int sg_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int sg_pager_getpages(vm_object_t, vm_page_t *, int); static void sg_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t sg_pager_haspage(vm_object_t, vm_pindex_t, int *, @@ -133,7 +133,7 @@ sg_pager_dealloc(vm_object_t object) } static int -sg_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +sg_pager_getpages(vm_object_t object, vm_page_t *m, int count) { struct sglist *sg; vm_page_t m_paddr, page; @@ -143,11 +143,13 @@ static int size_t space; int i; + /* Since our putpages reports zero after/before, the count is 1. */ + KASSERT(count == 1, ("%s: count %d", __func__, count)); VM_OBJECT_ASSERT_WLOCKED(object); sg = object->handle; memattr = object->memattr; VM_OBJECT_WUNLOCK(object); - offset = m[reqpage]->pindex; + offset = m[0]->pindex; /* * Lookup the physical address of the requested page. An initial @@ -176,7 +178,7 @@ static int } /* Return a fake page for the requested page. */ - KASSERT(!(m[reqpage]->flags & PG_FICTITIOUS), + KASSERT(!(m[0]->flags & PG_FICTITIOUS), ("backing page for SG is fake")); /* Construct a new fake page. */ @@ -183,17 +185,9 @@ static int page = vm_page_getfake(paddr, memattr); VM_OBJECT_WLOCK(object); TAILQ_INSERT_TAIL(&object->un_pager.sgp.sgp_pglist, page, plinks.q); - - /* Free the original pages and insert this fake page into the object. */ - for (i = 0; i < count; i++) { - if (i == reqpage && - vm_page_replace(page, object, offset) != m[i]) - panic("sg_pager_getpages: invalid place replacement"); - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - m[reqpage] = page; + if (vm_page_replace(page, object, offset) != m[0]) + panic("sg_pager_getpages: invalid place replacement"); + m[0] = page; page->valid = VM_PAGE_BITS_ALL; return (VM_PAGER_OK); Index: sys/vm/swap_pager.c =================================================================== --- sys/vm/swap_pager.c (revision 282213) +++ sys/vm/swap_pager.c (working copy) @@ -362,8 +362,8 @@ static vm_object_t swap_pager_alloc(void *handle, vm_ooffset_t size, vm_prot_t prot, vm_ooffset_t offset, struct ucred *); static void swap_pager_dealloc(vm_object_t object); -static int swap_pager_getpages(vm_object_t, vm_page_t *, int, int); -static int swap_pager_getpages_async(vm_object_t, vm_page_t *, int, int, +static int swap_pager_getpages(vm_object_t, vm_page_t *, int); +static int swap_pager_getpages_async(vm_object_t, vm_page_t *, int, pgo_getpages_iodone_t, void *); static void swap_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t @@ -418,16 +418,6 @@ static void swp_pager_meta_free(vm_object_t, vm_pi static void swp_pager_meta_free_all(vm_object_t); static daddr_t swp_pager_meta_ctl(vm_object_t, vm_pindex_t, int); -static void -swp_pager_free_nrpage(vm_page_t m) -{ - - vm_page_lock(m); - if (m->wire_count == 0) - vm_page_free(m); - vm_page_unlock(m); -} - /* * SWP_SIZECHECK() - update swap_pager_full indication * @@ -1109,20 +1099,11 @@ swap_pager_unswapped(vm_page_t m) * left busy, but the others adjusted. */ static int -swap_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +swap_pager_getpages(vm_object_t object, vm_page_t *m, int count) { struct buf *bp; - vm_page_t mreq; - int i; - int j; daddr_t blk; - mreq = m[reqpage]; - - KASSERT(mreq->object == object, - ("swap_pager_getpages: object mismatch %p/%p", - object, mreq->object)); - /* * Calculate range to retrieve. The pages have already been assigned * their swapblks. We require a *contiguous* range but we know it to @@ -1132,45 +1113,18 @@ static int * * The swp_*() calls must be made with the object locked. */ - blk = swp_pager_meta_ctl(mreq->object, mreq->pindex, 0); + blk = swp_pager_meta_ctl(m[0]->object, m[0]->pindex, 0); - for (i = reqpage - 1; i >= 0; --i) { - daddr_t iblk; - - iblk = swp_pager_meta_ctl(m[i]->object, m[i]->pindex, 0); - if (blk != iblk + (reqpage - i)) - break; - } - ++i; - - for (j = reqpage + 1; j < count; ++j) { - daddr_t jblk; - - jblk = swp_pager_meta_ctl(m[j]->object, m[j]->pindex, 0); - if (blk != jblk - (j - reqpage)) - break; - } - - /* - * free pages outside our collection range. Note: we never free - * mreq, it must remain busy throughout. - */ - if (0 < i || j < count) { - int k; - - for (k = 0; k < i; ++k) - swp_pager_free_nrpage(m[k]); - for (k = j; k < count; ++k) - swp_pager_free_nrpage(m[k]); - } - - /* - * Return VM_PAGER_FAIL if we have nothing to do. Return mreq - * still busy, but the others unbusied. - */ if (blk == SWAPBLK_NONE) return (VM_PAGER_FAIL); +#ifdef INVARIANTS + for (int i = 0; i < count; i++) + KASSERT(blk + i == + swp_pager_meta_ctl(m[i]->object, m[i]->pindex, 0), + ("%s: range is not contiguous", __func__)); +#endif + /* * Getpbuf() can sleep. */ @@ -1185,21 +1139,16 @@ static int bp->b_iodone = swp_pager_async_iodone; bp->b_rcred = crhold(thread0.td_ucred); bp->b_wcred = crhold(thread0.td_ucred); - bp->b_blkno = blk - (reqpage - i); - bp->b_bcount = PAGE_SIZE * (j - i); - bp->b_bufsize = PAGE_SIZE * (j - i); - bp->b_pager.pg_reqpage = reqpage - i; + bp->b_blkno = blk; + bp->b_bcount = PAGE_SIZE * count; + bp->b_bufsize = PAGE_SIZE * count; + bp->b_npages = count; VM_OBJECT_WLOCK(object); - { - int k; - - for (k = i; k < j; ++k) { - bp->b_pages[k - i] = m[k]; - m[k]->oflags |= VPO_SWAPINPROG; - } + for (int i = 0; i < count; i++) { + bp->b_pages[i] = m[i]; + m[i]->oflags |= VPO_SWAPINPROG; } - bp->b_npages = j - i; PCPU_INC(cnt.v_swapin); PCPU_ADD(cnt.v_swappgsin, bp->b_npages); @@ -1231,8 +1180,8 @@ static int * is set in the meta-data. */ VM_OBJECT_WLOCK(object); - while ((mreq->oflags & VPO_SWAPINPROG) != 0) { - mreq->oflags |= VPO_SWAPSLEEP; + while ((m[0]->oflags & VPO_SWAPINPROG) != 0) { + m[0]->oflags |= VPO_SWAPSLEEP; PCPU_INC(cnt.v_intrans); if (VM_OBJECT_SLEEP(object, &object->paging_in_progress, PSWP, "swread", hz * 20)) { @@ -1243,16 +1192,14 @@ static int } /* - * mreq is left busied after completion, but all the other pages - * are freed. If we had an unrecoverable read error the page will - * not be valid. + * If we had an unrecoverable read error pages will not be valid. */ - if (mreq->valid != VM_PAGE_BITS_ALL) { - return (VM_PAGER_ERROR); - } else { - return (VM_PAGER_OK); - } + for (int i = 0; i < count; i++) + if (m[i]->valid != VM_PAGE_BITS_ALL) + return (VM_PAGER_ERROR); + return (VM_PAGER_OK); + /* * A final note: in a low swap situation, we cannot deallocate swap * and mark a page dirty here because the caller is likely to mark @@ -1269,11 +1216,11 @@ static int */ static int swap_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, - int reqpage, pgo_getpages_iodone_t iodone, void *arg) + pgo_getpages_iodone_t iodone, void *arg) { int r, error; - r = swap_pager_getpages(object, m, count, reqpage); + r = swap_pager_getpages(object, m, count); VM_OBJECT_WUNLOCK(object); switch (r) { case VM_PAGER_OK: @@ -1572,33 +1519,11 @@ swp_pager_async_iodone(struct buf *bp) */ if (bp->b_iocmd == BIO_READ) { /* - * When reading, reqpage needs to stay - * locked for the parent, but all other - * pages can be freed. We still want to - * wakeup the parent waiting on the page, - * though. ( also: pg_reqpage can be -1 and - * not match anything ). - * - * We have to wake specifically requested pages - * up too because we cleared VPO_SWAPINPROG and - * someone may be waiting for that. - * * NOTE: for reads, m->dirty will probably * be overridden by the original caller of * getpages so don't play cute tricks here. */ m->valid = 0; - if (i != bp->b_pager.pg_reqpage) - swp_pager_free_nrpage(m); - else { - vm_page_lock(m); - vm_page_flash(m); - vm_page_unlock(m); - } - /* - * If i == bp->b_pager.pg_reqpage, do not wake - * the page up. The caller needs to. - */ } else { /* * If a write error occurs, reactivate page @@ -1620,38 +1545,12 @@ swp_pager_async_iodone(struct buf *bp) * want to do that anyway, but it was an optimization * that existed in the old swapper for a time before * it got ripped out due to precisely this problem. - * - * If not the requested page then deactivate it. - * - * Note that the requested page, reqpage, is left - * busied, but we still have to wake it up. The - * other pages are released (unbusied) by - * vm_page_xunbusy(). */ KASSERT(!pmap_page_is_mapped(m), ("swp_pager_async_iodone: page %p is mapped", m)); - m->valid = VM_PAGE_BITS_ALL; KASSERT(m->dirty == 0, ("swp_pager_async_iodone: page %p is dirty", m)); - - /* - * We have to wake specifically requested pages - * up too because we cleared VPO_SWAPINPROG and - * could be waiting for it in getpages. However, - * be sure to not unbusy getpages specifically - * requested page - getpages expects it to be - * left busy. - */ - if (i != bp->b_pager.pg_reqpage) { - vm_page_lock(m); - vm_page_deactivate(m); - vm_page_unlock(m); - vm_page_xunbusy(m); - } else { - vm_page_lock(m); - vm_page_flash(m); - vm_page_unlock(m); - } + m->valid = VM_PAGE_BITS_ALL; } else { /* * For write success, clear the dirty @@ -1772,7 +1671,7 @@ swp_pager_force_pagein(vm_object_t object, vm_pind return; } - if (swap_pager_getpages(object, &m, 1, 0) != VM_PAGER_OK) + if (swap_pager_getpages(object, &m, 1) != VM_PAGER_OK) panic("swap_pager_force_pagein: read from swap failed");/*XXX*/ vm_object_pip_wakeup(object); vm_page_dirty(m); Index: sys/vm/vm_fault.c =================================================================== --- sys/vm/vm_fault.c (revision 282213) +++ sys/vm/vm_fault.c (working copy) @@ -672,26 +672,21 @@ vnode_locked: fs.m, behind, ahead, marray, &reqpage); rv = faultcount ? - vm_pager_get_pages(fs.object, marray, faultcount, - reqpage) : VM_PAGER_FAIL; + vm_pager_get_pages(fs.object, marray, faultcount) : + VM_PAGER_FAIL; if (rv == VM_PAGER_OK) { /* * Found the page. Leave it busy while we play - * with it. + * with it. Unbusy companion pages. */ - - /* - * Relookup in case pager changed page. Pager - * is responsible for disposition of old page - * if moved. - */ - fs.m = vm_page_lookup(fs.object, fs.pindex); - if (!fs.m) { - unlock_and_deallocate(&fs); - goto RetryFault; + for (int i = 0; i < faultcount; i++) { + if (i == reqpage) + continue; + vm_page_readahead_finish(marray[i]); } - + /* Pager could have changed the page. */ + fs.m = marray[reqpage]; hardfault++; break; /* break to PAGE HAS BEEN FOUND */ } Index: sys/vm/vm_glue.c =================================================================== --- sys/vm/vm_glue.c (revision 282213) +++ sys/vm/vm_glue.c (working copy) @@ -230,7 +230,7 @@ vsunlock(void *addr, size_t len) static vm_page_t vm_imgact_hold_page(vm_object_t object, vm_ooffset_t offset) { - vm_page_t m, ma[1]; + vm_page_t m; vm_pindex_t pindex; int rv; @@ -238,11 +238,7 @@ vm_imgact_hold_page(vm_object_t object, vm_ooffset pindex = OFF_TO_IDX(offset); m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL); if (m->valid != VM_PAGE_BITS_ALL) { - ma[0] = m; - rv = vm_pager_get_pages(object, ma, 1, 0); - m = vm_page_lookup(object, pindex); - if (m == NULL) - goto out; + rv = vm_pager_get_pages(object, &m, 1); if (rv != VM_PAGER_OK) { vm_page_lock(m); vm_page_free(m); @@ -571,34 +567,37 @@ vm_thread_swapin(struct thread *td) { vm_object_t ksobj; vm_page_t ma[KSTACK_MAX_PAGES]; - int i, j, k, pages, rv; + int pages; pages = td->td_kstack_pages; ksobj = td->td_kstack_obj; VM_OBJECT_WLOCK(ksobj); - for (i = 0; i < pages; i++) + for (int i = 0; i < pages; i++) ma[i] = vm_page_grab(ksobj, i, VM_ALLOC_NORMAL | VM_ALLOC_WIRED); - for (i = 0; i < pages; i++) { - if (ma[i]->valid != VM_PAGE_BITS_ALL) { - vm_page_assert_xbusied(ma[i]); - vm_object_pip_add(ksobj, 1); - for (j = i + 1; j < pages; j++) { - if (ma[j]->valid != VM_PAGE_BITS_ALL) - vm_page_assert_xbusied(ma[j]); - if (ma[j]->valid == VM_PAGE_BITS_ALL) - break; - } - rv = vm_pager_get_pages(ksobj, ma + i, j - i, 0); - if (rv != VM_PAGER_OK) - panic("vm_thread_swapin: cannot get kstack for proc: %d", - td->td_proc->p_pid); - vm_object_pip_wakeup(ksobj); - for (k = i; k < j; k++) - ma[k] = vm_page_lookup(ksobj, k); + for (int i = 0; i < pages;) { + int j, a, count, rv; + + vm_page_assert_xbusied(ma[i]); + if (ma[i]->valid == VM_PAGE_BITS_ALL) { vm_page_xunbusy(ma[i]); - } else if (vm_page_xbusied(ma[i])) - vm_page_xunbusy(ma[i]); + i++; + continue; + } + vm_object_pip_add(ksobj, 1); + for (j = i + 1; j < pages; j++) + if (ma[j]->valid == VM_PAGE_BITS_ALL) + break; + rv = vm_pager_has_page(ksobj, ma[i]->pindex, NULL, &a); + KASSERT(rv == 1, ("%s: missing page %p", __func__, ma[i])); + count = min(a + 1, j - i); + rv = vm_pager_get_pages(ksobj, ma + i, count); + KASSERT(rv == VM_PAGER_OK, ("%s: cannot get kstack for proc %d", + __func__, td->td_proc->p_pid)); + vm_object_pip_wakeup(ksobj); + for (j = i; j < i + count; j++) + vm_page_xunbusy(ma[j]); + i += count; } VM_OBJECT_WUNLOCK(ksobj); pmap_qenter(td->td_kstack, ma, pages); Index: sys/vm/vm_object.c =================================================================== --- sys/vm/vm_object.c (revision 282213) +++ sys/vm/vm_object.c (working copy) @@ -2042,7 +2042,7 @@ vm_object_page_cache(vm_object_t object, vm_pindex boolean_t vm_object_populate(vm_object_t object, vm_pindex_t start, vm_pindex_t end) { - vm_page_t m, ma[1]; + vm_page_t m; vm_pindex_t pindex; int rv; @@ -2050,11 +2050,7 @@ vm_object_populate(vm_object_t object, vm_pindex_t for (pindex = start; pindex < end; pindex++) { m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL); if (m->valid != VM_PAGE_BITS_ALL) { - ma[0] = m; - rv = vm_pager_get_pages(object, ma, 1, 0); - m = vm_page_lookup(object, pindex); - if (m == NULL) - break; + rv = vm_pager_get_pages(object, &m, 1); if (rv != VM_PAGER_OK) { vm_page_lock(m); vm_page_free(m); Index: sys/vm/vm_page.c =================================================================== --- sys/vm/vm_page.c (revision 282213) +++ sys/vm/vm_page.c (working copy) @@ -863,32 +863,19 @@ void vm_page_readahead_finish(vm_page_t m) { - if (m->valid != 0) { - /* - * Since the page is not the requested page, whether - * it should be activated or deactivated is not - * obvious. Empirical results have shown that - * deactivating the page is usually the best choice, - * unless the page is wanted by another thread. - */ - vm_page_lock(m); - if ((m->busy_lock & VPB_BIT_WAITERS) != 0) - vm_page_activate(m); - else - vm_page_deactivate(m); - vm_page_unlock(m); - vm_page_xunbusy(m); - } else { - /* - * Free the completely invalid page. Such page state - * occurs due to the short read operation which did - * not covered our page at all, or in case when a read - * error happens. - */ - vm_page_lock(m); - vm_page_free(m); - vm_page_unlock(m); - } + /* + * Since the page is not the requested page, whether it should be + * activated or deactivated is not obvious. Empirical results have + * shown that deactivating the page is usually the best choice, + * unless the page is wanted by another thread. + */ + vm_page_lock(m); + if ((m->busy_lock & VPB_BIT_WAITERS) != 0) + vm_page_activate(m); + else + vm_page_deactivate(m); + vm_page_unlock(m); + vm_page_xunbusy(m); } /* Index: sys/vm/vm_pager.c =================================================================== --- sys/vm/vm_pager.c (revision 282213) +++ sys/vm/vm_pager.c (working copy) @@ -251,7 +251,95 @@ vm_pager_deallocate(object) } /* - * vm_pager_get_pages() - inline, see vm/vm_pager.h + * Retrieve pages from the VM system in order to map them into an object + * ( or into VM space somewhere ). If the pagein was successful, we + * must fully validate it. + */ +int +vm_pager_get_pages(vm_object_t object, vm_page_t *m, int count) +{ +#ifdef INVARIANTS + vm_pindex_t pindex = m[0]->pindex; +#endif + int r; + + VM_OBJECT_ASSERT_WLOCKED(object); + KASSERT(count > 0, ("%s: 0 count", __func__)); + + /* + * If the last page is partially valid, just return it and zero-out + * the blanks. Partially valid pages can only occur at the file EOF. + */ + if (m[count - 1]->valid != 0) { + vm_page_zero_invalid(m[count - 1], TRUE); + if (--count == 0) + return (VM_PAGER_OK); + } + +#ifdef INVARIANTS + /* + * All pages must be busied, not mapped, not valid, not dirty + * and belong to the proper object. + */ + for (int i = 0 ; i < count; i++) { + vm_page_assert_xbusied(m[i]); + KASSERT(!pmap_page_is_mapped(m[i]), + ("%s: page %p is mapped", __func__, m[i])); + KASSERT(m[i]->valid == 0, + ("%s: request for a valid page %p", __func__, m[i])); + KASSERT(m[i]->dirty == 0, + ("%s: page %p is dirty", __func__, m[i])); + KASSERT(m[i]->object == object, + ("%s: wrong object %p/%p", __func__, object, m[i]->object)); + } +#endif + + r = (*pagertab[object->type]->pgo_getpages)(object, m, count); + if (r != VM_PAGER_OK) + return (r); + + for (int i = 0; i < count; i++) { + /* + * If pager has replaced a page, assert that it had + * updated the array. + */ + KASSERT(m[i] == vm_page_lookup(object, pindex++), + ("%s: mismatch page %p pindex %ju", __func__, + m[i], (uintmax_t )pindex - 1)); + /* + * Zero out partially filled data. + */ + if (m[i]->valid != VM_PAGE_BITS_ALL) + vm_page_zero_invalid(m[count - 1], TRUE); + } + return (VM_PAGER_OK); +} + +int +vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count, + pgo_getpages_iodone_t iodone, void *arg) +{ + + VM_OBJECT_ASSERT_WLOCKED(object); + KASSERT(count > 0, ("%s: 0 count", __func__)); + + /* + * If the last page is partially valid, just return it and zero-out + * the blanks. Partially valid pages can only occur at the file EOF. + */ + if (m[count - 1]->valid != 0) { + vm_page_zero_invalid(m[count - 1], TRUE); + if (--count == 0) { + iodone(arg, m, 1, 0); + return (VM_PAGER_OK); + } + } + + return ((*pagertab[object->type]->pgo_getpages_async)(object, m, + count, iodone, arg)); +} + +/* * vm_pager_put_pages() - inline, see vm/vm_pager.h * vm_pager_has_page() - inline, see vm/vm_pager.h */ @@ -283,39 +371,6 @@ vm_pager_object_lookup(struct pagerlst *pg_list, v } /* - * Free the non-requested pages from the given array. To remove all pages, - * caller should provide out of range reqpage number. - */ -void -vm_pager_free_nonreq(vm_object_t object, vm_page_t ma[], int reqpage, - int npages, boolean_t object_locked) -{ - enum { UNLOCKED, CALLER_LOCKED, INTERNALLY_LOCKED } locked; - int i; - - if (object_locked) { - VM_OBJECT_ASSERT_WLOCKED(object); - locked = CALLER_LOCKED; - } else { - VM_OBJECT_ASSERT_UNLOCKED(object); - locked = UNLOCKED; - } - for (i = 0; i < npages; ++i) { - if (i != reqpage) { - if (locked == UNLOCKED) { - VM_OBJECT_WLOCK(object); - locked = INTERNALLY_LOCKED; - } - vm_page_lock(ma[i]); - vm_page_free(ma[i]); - vm_page_unlock(ma[i]); - } - } - if (locked == INTERNALLY_LOCKED) - VM_OBJECT_WUNLOCK(object); -} - -/* * initialize a physical buffer */ Index: sys/vm/vm_pager.h =================================================================== --- sys/vm/vm_pager.h (revision 282213) +++ sys/vm/vm_pager.h (working copy) @@ -50,9 +50,9 @@ typedef void pgo_init_t(void); typedef vm_object_t pgo_alloc_t(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); typedef void pgo_dealloc_t(vm_object_t); -typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int); +typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int); typedef void pgo_getpages_iodone_t(void *, vm_page_t *, int, int); -typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int, +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, pgo_getpages_iodone_t, void *); typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *); typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *); @@ -106,49 +106,13 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v vm_ooffset_t, struct ucred *); void vm_pager_bufferinit(void); void vm_pager_deallocate(vm_object_t); -static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int); -static inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, - int, pgo_getpages_iodone_t, void *); +int vm_pager_get_pages(vm_object_t, vm_page_t *, int); +int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, + pgo_getpages_iodone_t, void *); static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *); void vm_pager_init(void); vm_object_t vm_pager_object_lookup(struct pagerlst *, void *); -void vm_pager_free_nonreq(vm_object_t object, vm_page_t ma[], int reqpage, - int npages, boolean_t object_locked); -/* - * vm_page_get_pages: - * - * Retrieve pages from the VM system in order to map them into an object - * ( or into VM space somewhere ). If the pagein was successful, we - * must fully validate it. - */ -static __inline int -vm_pager_get_pages( - vm_object_t object, - vm_page_t *m, - int count, - int reqpage -) { - int r; - - VM_OBJECT_ASSERT_WLOCKED(object); - r = (*pagertab[object->type]->pgo_getpages)(object, m, count, reqpage); - if (r == VM_PAGER_OK && m[reqpage]->valid != VM_PAGE_BITS_ALL) { - vm_page_zero_invalid(m[reqpage], TRUE); - } - return (r); -} - -static inline int -vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count, - int reqpage, pgo_getpages_iodone_t iodone, void *arg) -{ - - VM_OBJECT_ASSERT_WLOCKED(object); - return ((*pagertab[object->type]->pgo_getpages_async)(object, m, - count, reqpage, iodone, arg)); -} - static __inline void vm_pager_put_pages( vm_object_t object, Index: sys/vm/vnode_pager.c =================================================================== --- sys/vm/vnode_pager.c (revision 282213) +++ sys/vm/vnode_pager.c (working copy) @@ -84,11 +84,9 @@ static int vnode_pager_addr(struct vnode *vp, vm_o static int vnode_pager_input_smlfs(vm_object_t object, vm_page_t m); static int vnode_pager_input_old(vm_object_t object, vm_page_t m); static void vnode_pager_dealloc(vm_object_t); -static int vnode_pager_local_getpages0(struct vnode *, vm_page_t *, int, int, +static int vnode_pager_getpages(vm_object_t, vm_page_t *, int); +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, vop_getpages_iodone_t, void *); -static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int); -static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int, - vop_getpages_iodone_t, void *); static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, int, int *); static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *); static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t, @@ -662,7 +660,7 @@ vnode_pager_input_old(vm_object_t object, vm_page_ * backing vp's VOP_GETPAGES. */ static int -vnode_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +vnode_pager_getpages(vm_object_t object, vm_page_t *m, int count) { int rtval; struct vnode *vp; @@ -670,7 +668,7 @@ static int vp = object->handle; VM_OBJECT_WUNLOCK(object); - rtval = VOP_GETPAGES(vp, m, bytes, reqpage); + rtval = VOP_GETPAGES(vp, m, bytes); KASSERT(rtval != EOPNOTSUPP, ("vnode_pager: FS getpages not implemented\n")); VM_OBJECT_WLOCK(object); @@ -679,7 +677,7 @@ static int static int vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, - int reqpage, vop_getpages_iodone_t iodone, void *arg) + vop_getpages_iodone_t iodone, void *arg) { struct vnode *vp; int rtval; @@ -686,8 +684,7 @@ vnode_pager_getpages_async(vm_object_t object, vm_ vp = object->handle; VM_OBJECT_WUNLOCK(object); - rtval = VOP_GETPAGES_ASYNC(vp, m, count * PAGE_SIZE, reqpage, - iodone, arg); + rtval = VOP_GETPAGES_ASYNC(vp, m, count * PAGE_SIZE, iodone, arg); KASSERT(rtval != EOPNOTSUPP, ("vnode_pager: FS getpages_async not implemented\n")); VM_OBJECT_WLOCK(object); @@ -703,8 +700,8 @@ int vnode_pager_local_getpages(struct vop_getpages_args *ap) { - return (vnode_pager_local_getpages0(ap->a_vp, ap->a_m, ap->a_count, - ap->a_reqpage, NULL, NULL)); + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + NULL, NULL)); } int @@ -711,42 +708,10 @@ int vnode_pager_local_getpages_async(struct vop_getpages_async_args *ap) { - return (vnode_pager_local_getpages0(ap->a_vp, ap->a_m, ap->a_count, - ap->a_reqpage, ap->a_iodone, ap->a_arg)); + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + ap->a_iodone, ap->a_arg)); } -static int -vnode_pager_local_getpages0(struct vnode *vp, vm_page_t *m, int bytecount, - int reqpage, vop_getpages_iodone_t iodone, void *arg) -{ - vm_page_t mreq; - - mreq = m[reqpage]; - - /* - * Since the caller has busied the requested page, that page's valid - * field will not be changed by other threads. - */ - vm_page_assert_xbusied(mreq); - - /* - * The requested page has valid blocks. Invalid part can only - * exist at the end of file, and the page is made fully valid - * by zeroing in vm_pager_get_pages(). Free non-requested - * pages, since no i/o is done to read its content. - */ - if (mreq->valid != 0) { - vm_pager_free_nonreq(mreq->object, m, reqpage, - round_page(bytecount) / PAGE_SIZE, FALSE); - if (iodone != NULL) - iodone(arg, m, reqpage, 0); - return (VM_PAGER_OK); - } - - return (vnode_pager_generic_getpages(vp, m, bytecount, reqpage, - iodone, arg)); -} - /* * This is now called from local media FS's to operate against their * own vnodes if they fail to implement VOP_GETPAGES. @@ -753,29 +718,31 @@ vnode_pager_local_getpages_async(struct vop_getpag */ int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount, - int reqpage, vop_getpages_iodone_t iodone, void *arg) + vop_getpages_iodone_t iodone, void *arg) { vm_object_t object; off_t foff; - int i, j, size, bsize, first, *freecnt; - daddr_t firstaddr, reqblock; + int error, count, bsize, i, after, secmask, *freecnt; + daddr_t reqblock; struct bufobj *bo; - int runpg; - int runend; struct buf *bp; - int count; - int error; - object = vp->v_object; - count = bytecount / PAGE_SIZE; + KASSERT(vp->v_type != VCHR && vp->v_type != VBLK, + ("%s does not support devices", __func__)); + KASSERT(bytecount > 0 && (bytecount & ~PAGE_MASK) == bytecount, + ("%s: bytecount %d", __func__, bytecount)); - KASSERT(vp->v_type != VCHR && vp->v_type != VBLK, - ("vnode_pager_generic_getpages does not support devices")); if (vp->v_iflag & VI_DOOMED) return VM_PAGER_BAD; + object = vp->v_object; + foff = IDX_TO_OFF(m[0]->pindex); + + KASSERT(foff < object->un_pager.vnp.vnp_size, + ("%s: page %p offset beyond vp %p size", __func__, m[0], vp)); + + count = bytecount >> PAGE_SHIFT; bsize = vp->v_mount->mnt_stat.f_iosize; - foff = IDX_TO_OFF(m[reqpage]->pindex); /* * Synchronous and asynchronous paging operations use different @@ -794,172 +761,58 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ * If the file system doesn't support VOP_BMAP, use old way of * getting pages via VOP_READ. */ - error = VOP_BMAP(vp, foff / bsize, &bo, &reqblock, NULL, NULL); + error = VOP_BMAP(vp, foff / bsize, &bo, &reqblock, &after, NULL); if (error == EOPNOTSUPP) { relpbuf(bp, freecnt); VM_OBJECT_WLOCK(object); - for (i = 0; i < count; i++) - if (i != reqpage) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - PCPU_INC(cnt.v_vnodein); - PCPU_INC(cnt.v_vnodepgsin); - error = vnode_pager_input_old(object, m[reqpage]); + for (i = 0; i < count; i++) { + PCPU_INC(cnt.v_vnodein); + PCPU_INC(cnt.v_vnodepgsin); + error = vnode_pager_input_old(object, m[i]); + if (error) + break; + } VM_OBJECT_WUNLOCK(object); return (error); } else if (error != 0) { relpbuf(bp, freecnt); - vm_pager_free_nonreq(object, m, reqpage, count, FALSE); return (VM_PAGER_ERROR); - - /* - * if the blocksize is smaller than a page size, then use - * special small filesystem code. NFS sometimes has a small - * blocksize, but it can handle large reads itself. - */ - } else if ((PAGE_SIZE / bsize) > 1 && - (vp->v_mount->mnt_stat.f_type != nfs_mount_type)) { - relpbuf(bp, freecnt); - vm_pager_free_nonreq(object, m, reqpage, count, FALSE); - PCPU_INC(cnt.v_vnodein); - PCPU_INC(cnt.v_vnodepgsin); - return vnode_pager_input_smlfs(object, m[reqpage]); } /* - * Since the caller has busied the requested page, that page's valid - * field will not be changed by other threads. + * If the blocksize is smaller than a page size, then use + * special small filesystem code. NFS sometimes has a small + * blocksize, but it can handle large reads itself. */ - vm_page_assert_xbusied(m[reqpage]); - - /* - * If we have a completely valid page available to us, we can - * clean up and return. Otherwise we have to re-read the - * media. - */ - if (m[reqpage]->valid == VM_PAGE_BITS_ALL) { + if ((PAGE_SIZE / bsize) > 1 && + (vp->v_mount->mnt_stat.f_type != nfs_mount_type)) { relpbuf(bp, freecnt); - vm_pager_free_nonreq(object, m, reqpage, count, FALSE); - return (VM_PAGER_OK); - } else if (reqblock == -1) { - relpbuf(bp, freecnt); - pmap_zero_page(m[reqpage]); - KASSERT(m[reqpage]->dirty == 0, - ("vnode_pager_generic_getpages: page %p is dirty", m)); - VM_OBJECT_WLOCK(object); - m[reqpage]->valid = VM_PAGE_BITS_ALL; - vm_pager_free_nonreq(object, m, reqpage, count, TRUE); - VM_OBJECT_WUNLOCK(object); - return (VM_PAGER_OK); - } else if (m[reqpage]->valid != 0) { - VM_OBJECT_WLOCK(object); - m[reqpage]->valid = 0; - VM_OBJECT_WUNLOCK(object); - } - - /* - * here on direct device I/O - */ - firstaddr = -1; - - /* - * calculate the run that includes the required page - */ - for (first = 0, i = 0; i < count; i = runend) { - if (vnode_pager_addr(vp, IDX_TO_OFF(m[i]->pindex), &firstaddr, - &runpg) != 0) { - relpbuf(bp, freecnt); - /* The requested page may be out of range. */ - vm_pager_free_nonreq(object, m + i, reqpage - i, - count - i, FALSE); - return (VM_PAGER_ERROR); + for (i = 0; i < count; i++) { + PCPU_INC(cnt.v_vnodein); + PCPU_INC(cnt.v_vnodepgsin); + error = vnode_pager_input_smlfs(object, m[i]); + if (error) + break; } - if (firstaddr == -1) { - VM_OBJECT_WLOCK(object); - if (i == reqpage && foff < object->un_pager.vnp.vnp_size) { - panic("vnode_pager_getpages: unexpected missing page: firstaddr: %jd, foff: 0x%jx%08jx, vnp_size: 0x%jx%08jx", - (intmax_t)firstaddr, (uintmax_t)(foff >> 32), - (uintmax_t)foff, - (uintmax_t) - (object->un_pager.vnp.vnp_size >> 32), - (uintmax_t)object->un_pager.vnp.vnp_size); - } - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - VM_OBJECT_WUNLOCK(object); - runend = i + 1; - first = runend; - continue; - } - runend = i + runpg; - if (runend <= reqpage) { - VM_OBJECT_WLOCK(object); - for (j = i; j < runend; j++) { - vm_page_lock(m[j]); - vm_page_free(m[j]); - vm_page_unlock(m[j]); - } - VM_OBJECT_WUNLOCK(object); - } else { - if (runpg < (count - first)) { - VM_OBJECT_WLOCK(object); - for (i = first + runpg; i < count; i++) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - VM_OBJECT_WUNLOCK(object); - count = first + runpg; - } - break; - } - first = runend; + return (error); } /* - * the first and last page have been calculated now, move input pages - * to be zero based... + * Truncate bytecount to vnode real size and round up physical size + * for real devices. */ - if (first != 0) { - m += first; - count -= first; - reqpage -= first; - } + if ((foff + bytecount) > object->un_pager.vnp.vnp_size) + bytecount = object->un_pager.vnp.vnp_size - foff; + secmask = bo->bo_bsize - 1; + KASSERT(secmask < PAGE_SIZE && secmask > 0, + ("%s: sector size %d too large", __func__, secmask + 1)); + bytecount = (bytecount + secmask) & ~secmask; /* - * calculate the file virtual address for the transfer + * And map the pages to be read into the kva, if the filesystem + * requires mapped buffers. */ - foff = IDX_TO_OFF(m[0]->pindex); - - /* - * calculate the size of the transfer - */ - size = count * PAGE_SIZE; - KASSERT(count > 0, ("zero count")); - if ((foff + size) > object->un_pager.vnp.vnp_size) - size = object->un_pager.vnp.vnp_size - foff; - KASSERT(size > 0, ("zero size")); - - /* - * round up physical size for real devices. - */ - if (1) { - int secmask = bo->bo_bsize - 1; - KASSERT(secmask < PAGE_SIZE && secmask > 0, - ("vnode_pager_generic_getpages: sector size %d too large", - secmask + 1)); - size = (size + secmask) & ~secmask; - } - bp->b_kvaalloc = bp->b_data; - - /* - * and map the pages to be read into the kva, if the filesystem - * requires mapped buffers. - */ if ((vp->v_mount->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 && unmapped_buf_allowed) { bp->b_data = unmapped_buf; @@ -969,38 +822,33 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ } else pmap_qenter((vm_offset_t)bp->b_kvaalloc, m, count); - /* build a minimal buffer header */ + /* Build a minimal buffer header. */ bp->b_iocmd = BIO_READ; KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred")); KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred")); bp->b_rcred = crhold(curthread->td_ucred); bp->b_wcred = crhold(curthread->td_ucred); - bp->b_blkno = firstaddr; + bp->b_blkno = reqblock + ((foff % bsize) / DEV_BSIZE); pbgetbo(bo, bp); bp->b_vp = vp; - bp->b_bcount = size; - bp->b_bufsize = size; - bp->b_runningbufspace = bp->b_bufsize; + bp->b_bcount = bp->b_bufsize = bp->b_runningbufspace = bytecount; for (i = 0; i < count; i++) bp->b_pages[i] = m[i]; bp->b_npages = count; - bp->b_pager.pg_reqpage = reqpage; + bp->b_iooffset = dbtob(bp->b_blkno); + atomic_add_long(&runningbufspace, bp->b_runningbufspace); - PCPU_INC(cnt.v_vnodein); PCPU_ADD(cnt.v_vnodepgsin, count); - /* do the input */ - bp->b_iooffset = dbtob(bp->b_blkno); - if (iodone != NULL) { /* async */ - bp->b_pager.pg_iodone = iodone; + bp->b_pgiodone = iodone; bp->b_caller1 = arg; bp->b_iodone = vnode_pager_generic_getpages_done_async; bp->b_flags |= B_ASYNC; BUF_KERNPROC(bp); bstrategy(bp); - /* Good bye! */ + return (0); } else { bp->b_iodone = bdone; bstrategy(bp); @@ -1011,9 +859,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ bp->b_vp = NULL; pbrelbo(bp); relpbuf(bp, &vnode_pbuf_freecnt); + return (error != 0 ? VM_PAGER_ERROR : VM_PAGER_OK); } - - return (error != 0 ? VM_PAGER_ERROR : VM_PAGER_OK); } static void @@ -1022,8 +869,7 @@ vnode_pager_generic_getpages_done_async(struct buf int error; error = vnode_pager_generic_getpages_done(bp); - bp->b_pager.pg_iodone(bp->b_caller1, bp->b_pages, - bp->b_pager.pg_reqpage, error); + bp->b_pgiodone(bp->b_caller1, bp->b_pages, bp->b_npages, error); for (int i = 0; i < bp->b_npages; i++) bp->b_pages[i] = NULL; bp->b_vp = NULL; @@ -1089,9 +935,6 @@ vnode_pager_generic_getpages_done(struct buf *bp) object->un_pager.vnp.vnp_size - tfoff)) == 0, ("%s: page %p is dirty", __func__, mt)); } - - if (i != bp->b_pager.pg_reqpage) - vm_page_readahead_finish(mt); } VM_OBJECT_WUNLOCK(object); if (error != 0) Index: sys/vm/vnode_pager.h =================================================================== --- sys/vm/vnode_pager.h (revision 282213) +++ sys/vm/vnode_pager.h (working copy) @@ -41,7 +41,7 @@ #ifdef _KERNEL int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, - int count, int reqpage, vop_getpages_iodone_t iodone, void *arg); + int count, vop_getpages_iodone_t iodone, void *arg); int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m, int count, boolean_t sync, int *rtvals); --45Z9DzgjV8m4Oswq--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150430142408.GS546>