From owner-svn-src-user@FreeBSD.ORG Tue Dec 1 00:42:18 2009 Return-Path: Delivered-To: svn-src-user@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 25C60106568D; Tue, 1 Dec 2009 00:42:18 +0000 (UTC) (envelope-from kmacy@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 0FA5C8FC16; Tue, 1 Dec 2009 00:42:18 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id nB10gIU4047330; Tue, 1 Dec 2009 00:42:18 GMT (envelope-from kmacy@svn.freebsd.org) Received: (from kmacy@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id nB10gHV1047316; Tue, 1 Dec 2009 00:42:17 GMT (envelope-from kmacy@svn.freebsd.org) Message-Id: <200912010042.nB10gHV1047316@svn.freebsd.org> From: Kip Macy Date: Tue, 1 Dec 2009 00:42:17 +0000 (UTC) To: src-committers@freebsd.org, svn-src-user@freebsd.org X-SVN-Group: user MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r199978 - in user/kmacy/releng_8_fcs: cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzpool/common/sys sys/amd64/amd64 sys/amd64/include sys/cddl/compat/opensolaris/s... X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Dec 2009 00:42:18 -0000 Author: kmacy Date: Tue Dec 1 00:42:17 2009 New Revision: 199978 URL: http://svn.freebsd.org/changeset/base/199978 Log: - merge updated FCS changes Modified: user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/cmd/ztest/ztest.c user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h user/kmacy/releng_8_fcs/sys/amd64/amd64/minidump_machdep.c user/kmacy/releng_8_fcs/sys/amd64/amd64/pmap.c user/kmacy/releng_8_fcs/sys/amd64/amd64/uma_machdep.c user/kmacy/releng_8_fcs/sys/amd64/include/md_var.h user/kmacy/releng_8_fcs/sys/amd64/include/vmparam.h user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/kmem.h user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/mutex.h user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/rwlock.h user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_zfetch.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_log.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c user/kmacy/releng_8_fcs/sys/compat/freebsd32/freebsd32_misc.c user/kmacy/releng_8_fcs/sys/conf/files user/kmacy/releng_8_fcs/sys/conf/files.amd64 user/kmacy/releng_8_fcs/sys/conf/files.i386 user/kmacy/releng_8_fcs/sys/conf/kern.pre.mk user/kmacy/releng_8_fcs/sys/conf/options user/kmacy/releng_8_fcs/sys/kern/uipc_sockbuf.c user/kmacy/releng_8_fcs/sys/kern/uipc_socket.c user/kmacy/releng_8_fcs/sys/kern/uipc_syscalls.c user/kmacy/releng_8_fcs/sys/kern/vfs_bio.c user/kmacy/releng_8_fcs/sys/modules/zfs/Makefile user/kmacy/releng_8_fcs/sys/netinet/in_pcb.c user/kmacy/releng_8_fcs/sys/netinet/in_pcb.h user/kmacy/releng_8_fcs/sys/netinet/ip_output.c user/kmacy/releng_8_fcs/sys/netinet/tcp_input.c user/kmacy/releng_8_fcs/sys/netinet/tcp_usrreq.c user/kmacy/releng_8_fcs/sys/sys/buf.h user/kmacy/releng_8_fcs/sys/sys/file.h user/kmacy/releng_8_fcs/sys/sys/malloc.h user/kmacy/releng_8_fcs/sys/sys/param.h user/kmacy/releng_8_fcs/sys/sys/sockbuf.h user/kmacy/releng_8_fcs/sys/sys/socket.h user/kmacy/releng_8_fcs/sys/sys/socketvar.h user/kmacy/releng_8_fcs/sys/sys/sockstate.h user/kmacy/releng_8_fcs/sys/sys/syscallsubr.h user/kmacy/releng_8_fcs/sys/vm/pmap.h user/kmacy/releng_8_fcs/sys/vm/uma.h user/kmacy/releng_8_fcs/sys/vm/uma_core.c user/kmacy/releng_8_fcs/sys/vm/vm.h user/kmacy/releng_8_fcs/sys/vm/vm_contig.c user/kmacy/releng_8_fcs/sys/vm/vm_glue.c user/kmacy/releng_8_fcs/sys/vm/vm_kern.c user/kmacy/releng_8_fcs/sys/vm/vm_page.c user/kmacy/releng_8_fcs/sys/vm/vnode_pager.c Modified: user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/cmd/ztest/ztest.c ============================================================================== --- user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/cmd/ztest/ztest.c Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/cmd/ztest/ztest.c Tue Dec 1 00:42:17 2009 (r199978) @@ -1304,7 +1304,7 @@ ztest_dmu_objset_create_destroy(ztest_ar if (ztest_random(2) == 0 && dmu_objset_open(name, DMU_OST_OTHER, DS_MODE_OWNER, &os) == 0) { zr.zr_os = os; - zil_replay(os, &zr, &zr.zr_assign, ztest_replay_vector, NULL); + zil_replay(os, &zr, ztest_replay_vector); dmu_objset_close(os); } @@ -3321,8 +3321,7 @@ ztest_run(char *pool) if (test_future) ztest_dmu_check_future_leak(&za[t]); zr.zr_os = za[d].za_os; - zil_replay(zr.zr_os, &zr, &zr.zr_assign, - ztest_replay_vector, NULL); + zil_replay(zr.zr_os, &zr, ztest_replay_vector); za[d].za_zilog = zil_open(za[d].za_os, NULL); } Modified: user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h ============================================================================== --- user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h Tue Dec 1 00:42:17 2009 (r199978) @@ -305,6 +305,8 @@ extern void cv_broadcast(kcondvar_t *cv) #define KM_PUSHPAGE KM_SLEEP #define KM_NOSLEEP UMEM_DEFAULT #define KMC_NODEBUG UMC_NODEBUG +#define KM_NODEBUG KMC_NODEBUG + #define kmem_alloc(_s, _f) umem_alloc(_s, _f) #define kmem_zalloc(_s, _f) umem_zalloc(_s, _f) #define kmem_free(_b, _s) umem_free(_b, _s) Modified: user/kmacy/releng_8_fcs/sys/amd64/amd64/minidump_machdep.c ============================================================================== --- user/kmacy/releng_8_fcs/sys/amd64/amd64/minidump_machdep.c Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/amd64/amd64/minidump_machdep.c Tue Dec 1 00:42:17 2009 (r199978) @@ -56,6 +56,7 @@ CTASSERT(sizeof(struct kerneldumpheader) extern uint64_t KPDPphys; uint64_t *vm_page_dump; +uint64_t *vm_page_dump_exclude; int vm_page_dump_size; static struct kerneldumpheader kdh; @@ -71,10 +72,16 @@ CTASSERT(sizeof(*vm_page_dump) == 8); static int is_dumpable(vm_paddr_t pa) { - int i; + int i, idx, bit, isdata; + uint64_t pfn = pa; + + pfn >>= PAGE_SHIFT; + idx = pfn >> 6; /* 2^6 = 64 */ + bit = pfn & 63; + isdata = ((vm_page_dump_exclude[idx] & (1ul << bit)) == 0); for (i = 0; dump_avail[i] != 0 || dump_avail[i + 1] != 0; i += 2) { - if (pa >= dump_avail[i] && pa < dump_avail[i + 1]) + if (pa >= dump_avail[i] && pa < dump_avail[i + 1] && isdata) return (1); } return (0); @@ -226,6 +233,7 @@ minidumpsys(struct dumperinfo *di) dumpsize = ptesize; dumpsize += round_page(msgbufp->msg_size); dumpsize += round_page(vm_page_dump_size); + printf("dumpsize: "); for (i = 0; i < vm_page_dump_size / sizeof(*vm_page_dump); i++) { bits = vm_page_dump[i]; while (bits) { @@ -238,10 +246,13 @@ minidumpsys(struct dumperinfo *di) dump_drop_page(pa); } bits &= ~(1ul << bit); + if ((dumpsize % (1<<29)) == 0) + printf("%ldMB ", (dumpsize>>20)); } } dumpsize += PAGE_SIZE; + printf("\n"); /* Determine dump offset on device. */ if (di->mediasize < SIZEOF_METADATA + dumpsize + sizeof(kdh) * 2) { error = ENOSPC; @@ -273,6 +284,7 @@ minidumpsys(struct dumperinfo *di) goto fail; dumplo += sizeof(kdh); + printf("write header\n"); /* Dump my header */ bzero(&fakept, sizeof(fakept)); bcopy(&mdhdr, &fakept, sizeof(mdhdr)); @@ -280,16 +292,19 @@ minidumpsys(struct dumperinfo *di) if (error) goto fail; + printf("write msgbuf\n"); /* Dump msgbuf up front */ error = blk_write(di, (char *)msgbufp->msg_ptr, 0, round_page(msgbufp->msg_size)); if (error) goto fail; + printf("write bitmap\n"); /* Dump bitmap */ error = blk_write(di, (char *)vm_page_dump, 0, round_page(vm_page_dump_size)); if (error) goto fail; + printf("\nDump kernel page table pages\n"); /* Dump kernel page table pages */ pdp = (uint64_t *)PHYS_TO_DMAP(KPDPphys); for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + NKPT * NBPDR, @@ -343,8 +358,10 @@ minidumpsys(struct dumperinfo *di) /* Dump memory chunks */ /* XXX cluster it up and use blk_dump() */ - for (i = 0; i < vm_page_dump_size / sizeof(*vm_page_dump); i++) { - bits = vm_page_dump[i]; + printf("\nclustering memory chunks\n"); + for (i = 0; + i < vm_page_dump_size / sizeof(*vm_page_dump); i++) { + bits = vm_page_dump[i] & ~(vm_page_dump_exclude[i]); while (bits) { bit = bsfq(bits); pa = (((uint64_t)i * sizeof(*vm_page_dump) * NBBY) + bit) * PAGE_SIZE; @@ -354,7 +371,6 @@ minidumpsys(struct dumperinfo *di) bits &= ~(1ul << bit); } } - error = blk_flush(di); if (error) goto fail; @@ -365,6 +381,7 @@ minidumpsys(struct dumperinfo *di) goto fail; dumplo += sizeof(kdh); + printf("\nstarting dump\n"); /* Signal completion, signoff and exit stage left. */ dump_write(di, NULL, 0, 0, 0); printf("\nDump complete\n"); @@ -403,3 +420,25 @@ dump_drop_page(vm_paddr_t pa) bit = pa & 63; atomic_clear_long(&vm_page_dump[idx], 1ul << bit); } + +void +dump_exclude_page(vm_paddr_t pa) +{ + int idx, bit; + + pa >>= PAGE_SHIFT; + idx = pa >> 6; /* 2^6 = 64 */ + bit = pa & 63; + atomic_set_long(&vm_page_dump_exclude[idx], 1ul << bit); +} + +void +dump_unexclude_page(vm_paddr_t pa) +{ + int idx, bit; + + pa >>= PAGE_SHIFT; + idx = pa >> 6; /* 2^6 = 64 */ + bit = pa & 63; + atomic_clear_long(&vm_page_dump_exclude[idx], 1ul << bit); +} Modified: user/kmacy/releng_8_fcs/sys/amd64/amd64/pmap.c ============================================================================== --- user/kmacy/releng_8_fcs/sys/amd64/amd64/pmap.c Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/amd64/amd64/pmap.c Tue Dec 1 00:42:17 2009 (r199978) @@ -1137,10 +1137,16 @@ pmap_map(vm_offset_t *virt, vm_paddr_t s * Note: SMP coherent. Uses a ranged shootdown IPI. */ void -pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) +pmap_qenter_prot(vm_offset_t sva, vm_page_t *ma, int count, vm_prot_t prot) { pt_entry_t *endpte, oldpte, *pte; + uint64_t flags = PG_V; + if (prot & VM_PROT_WRITE) + flags |= PG_RW; + if ((prot & VM_PROT_EXECUTE) == 0) + flags |= PG_NX; + oldpte = 0; pte = vtopte(sva); endpte = pte + count; @@ -1148,6 +1154,9 @@ pmap_qenter(vm_offset_t sva, vm_page_t * oldpte |= *pte; pte_store(pte, VM_PAGE_TO_PHYS(*ma) | PG_G | pmap_cache_bits((*ma)->md.pat_mode, 0) | PG_RW | PG_V); + pte_store(pte, VM_PAGE_TO_PHYS(*ma) | PG_G | flags); + if (prot & VM_PROT_EXCLUDE) + dump_exclude_page(VM_PAGE_TO_PHYS(*ma)); pte++; ma++; } @@ -1156,6 +1165,16 @@ pmap_qenter(vm_offset_t sva, vm_page_t * PAGE_SIZE); } +void +pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) +{ + + pmap_qenter_prot(sva, ma, count, + VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE); + +} + + /* * This routine tears out page mappings from the * kernel -- it is meant only for temporary mappings. @@ -1168,6 +1187,7 @@ pmap_qremove(vm_offset_t sva, int count) va = sva; while (count-- > 0) { + dump_unexclude_page(pmap_kextract(va)); pmap_kremove(va); va += PAGE_SIZE; } Modified: user/kmacy/releng_8_fcs/sys/amd64/amd64/uma_machdep.c ============================================================================== --- user/kmacy/releng_8_fcs/sys/amd64/amd64/uma_machdep.c Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/amd64/amd64/uma_machdep.c Tue Dec 1 00:42:17 2009 (r199978) @@ -66,7 +66,8 @@ uma_small_alloc(uma_zone_t zone, int byt break; } pa = m->phys_addr; - dump_add_page(pa); + if ((wait & M_NODUMP) == 0) + dump_add_page(pa); va = (void *)PHYS_TO_DMAP(pa); if ((wait & M_ZERO) && (m->flags & PG_ZERO) == 0) pagezero(va); Modified: user/kmacy/releng_8_fcs/sys/amd64/include/md_var.h ============================================================================== --- user/kmacy/releng_8_fcs/sys/amd64/include/md_var.h Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/amd64/include/md_var.h Tue Dec 1 00:42:17 2009 (r199978) @@ -60,6 +60,7 @@ extern char kstack[]; extern char sigcode[]; extern int szsigcode; extern uint64_t *vm_page_dump; +extern uint64_t *vm_page_dump_exclude; extern int vm_page_dump_size; extern int _udatasel; extern int _ucodesel; @@ -88,6 +89,8 @@ void fs_load_fault(void) __asm(__STRING( void gs_load_fault(void) __asm(__STRING(gs_load_fault)); void dump_add_page(vm_paddr_t); void dump_drop_page(vm_paddr_t); +void dump_exclude_page(vm_paddr_t); +void dump_unexclude_page(vm_paddr_t); void initializecpu(void); void initializecpucache(void); void fillw(int /*u_short*/ pat, void *base, size_t cnt); Modified: user/kmacy/releng_8_fcs/sys/amd64/include/vmparam.h ============================================================================== --- user/kmacy/releng_8_fcs/sys/amd64/include/vmparam.h Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/amd64/include/vmparam.h Tue Dec 1 00:42:17 2009 (r199978) @@ -88,6 +88,11 @@ #define UMA_MD_SMALL_ALLOC /* + * We machine specific sparse kernel dump + */ +#define VM_MD_MINIDUMP + +/* * The physical address space is densely populated. */ #define VM_PHYSSEG_DENSE Modified: user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/kmem.h ============================================================================== --- user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/kmem.h Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/kmem.h Tue Dec 1 00:42:17 2009 (r199978) @@ -40,7 +40,8 @@ #define KM_SLEEP M_WAITOK #define KM_PUSHPAGE M_WAITOK #define KM_NOSLEEP M_NOWAIT -#define KMC_NODEBUG 0 +#define KMC_NODEBUG UMA_ZONE_NODUMP +#define KM_NODEBUG M_NODUMP typedef struct kmem_cache { char kc_name[32]; Modified: user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/mutex.h ============================================================================== --- user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/mutex.h Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/mutex.h Tue Dec 1 00:42:17 2009 (r199978) @@ -46,11 +46,7 @@ typedef enum { typedef struct sx kmutex_t; -#ifndef DEBUG -#define MUTEX_FLAGS (SX_DUPOK | SX_NOWITNESS) -#else #define MUTEX_FLAGS (SX_DUPOK) -#endif #define mutex_init(lock, desc, type, arg) do { \ const char *_name; \ Modified: user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/rwlock.h ============================================================================== --- user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/rwlock.h Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/cddl/compat/opensolaris/sys/rwlock.h Tue Dec 1 00:42:17 2009 (r199978) @@ -48,11 +48,7 @@ typedef enum { typedef struct sx krwlock_t; -#ifndef DEBUG -#define RW_FLAGS (SX_DUPOK | SX_NOWITNESS) -#else #define RW_FLAGS (SX_DUPOK) -#endif #define RW_READ_HELD(x) (rw_read_held((x))) #define RW_WRITE_HELD(x) (rw_write_held((x))) Modified: user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c ============================================================================== --- user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Tue Dec 1 00:42:17 2009 (r199978) @@ -186,6 +186,11 @@ SYSCTL_QUAD(_vfs_zfs, OID_AUTO, arc_min, SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RDTUN, &zfs_mdcomp_disable, 0, "Disable metadata compression"); +#ifdef ZIO_USE_UMA +extern kmem_cache_t *zio_buf_cache[]; +extern kmem_cache_t *zio_data_buf_cache[]; +#endif + /* * Note that buffers can be in one of 6 states: * ARC_anon - anonymous (discussed below) @@ -218,13 +223,31 @@ SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_di * second level ARC benefit from these fast lookups. */ +#define ARCS_LOCK_PAD 128 +struct arcs_lock { + kmutex_t arcs_lock; +#ifdef _KERNEL + unsigned char pad[(ARCS_LOCK_PAD - sizeof (kmutex_t))]; +#endif +}; + +/* + * must be power of two for mask use to work + * + */ +#define ARC_BUFC_NUMDATALISTS 16 +#define ARC_BUFC_NUMMETADATALISTS 16 +#define ARC_BUFC_NUMLISTS (ARC_BUFC_NUMMETADATALISTS+ARC_BUFC_NUMDATALISTS) + typedef struct arc_state { - list_t arcs_list[ARC_BUFC_NUMTYPES]; /* list of evictable buffers */ uint64_t arcs_lsize[ARC_BUFC_NUMTYPES]; /* amount of evictable data */ uint64_t arcs_size; /* total amount of data in this state */ - kmutex_t arcs_mtx; + list_t arcs_lists[ARC_BUFC_NUMLISTS]; /* list of evictable buffers */ + struct arcs_lock arcs_locks[ARC_BUFC_NUMLISTS] __aligned(128); } arc_state_t; +#define ARCS_LOCK(s, i) &((s)->arcs_locks[(i)].arcs_lock) + /* The 6 states: */ static arc_state_t ARC_anon; static arc_state_t ARC_mru; @@ -953,20 +976,42 @@ arc_buf_freeze(arc_buf_t *buf) } static void +get_buf_info(arc_buf_hdr_t *ab, arc_state_t *state, list_t **list, kmutex_t **lock) +{ + uint64_t buf_hashid = buf_hash(ab->b_spa, &ab->b_dva, ab->b_birth); + + if (ab->b_type == ARC_BUFC_METADATA) + buf_hashid &= (ARC_BUFC_NUMMETADATALISTS-1); + else { + buf_hashid &= (ARC_BUFC_NUMDATALISTS-1); + buf_hashid += ARC_BUFC_NUMMETADATALISTS; + } + + *list = &state->arcs_lists[buf_hashid]; + *lock = ARCS_LOCK(state, buf_hashid); +} + + +static void add_reference(arc_buf_hdr_t *ab, kmutex_t *hash_lock, void *tag) { + ASSERT(MUTEX_HELD(hash_lock)); if ((refcount_add(&ab->b_refcnt, tag) == 1) && (ab->b_state != arc_anon)) { + list_t *list; + kmutex_t *lock; uint64_t delta = ab->b_size * ab->b_datacnt; - list_t *list = &ab->b_state->arcs_list[ab->b_type]; uint64_t *size = &ab->b_state->arcs_lsize[ab->b_type]; - ASSERT(!MUTEX_HELD(&ab->b_state->arcs_mtx)); - mutex_enter(&ab->b_state->arcs_mtx); + get_buf_info(ab, ab->b_state, &list, &lock); + ASSERT(!MUTEX_HELD(lock)); + mutex_enter(lock); ASSERT(list_link_active(&ab->b_arc_node)); list_remove(list, ab); + mutex_exit(lock); + if (GHOST_STATE(ab->b_state)) { ASSERT3U(ab->b_datacnt, ==, 0); ASSERT3P(ab->b_buf, ==, NULL); @@ -975,7 +1020,6 @@ add_reference(arc_buf_hdr_t *ab, kmutex_ ASSERT(delta > 0); ASSERT3U(*size, >=, delta); atomic_add_64(size, -delta); - mutex_exit(&ab->b_state->arcs_mtx); /* remove the prefetch flag if we get a reference */ if (ab->b_flags & ARC_PREFETCH) ab->b_flags &= ~ARC_PREFETCH; @@ -994,14 +1038,19 @@ remove_reference(arc_buf_hdr_t *ab, kmut if (((cnt = refcount_remove(&ab->b_refcnt, tag)) == 0) && (state != arc_anon)) { uint64_t *size = &state->arcs_lsize[ab->b_type]; + list_t *list; + kmutex_t *lock; - ASSERT(!MUTEX_HELD(&state->arcs_mtx)); - mutex_enter(&state->arcs_mtx); + get_buf_info(ab, state, &list, &lock); + + ASSERT(!MUTEX_HELD(lock)); + mutex_enter(lock); ASSERT(!list_link_active(&ab->b_arc_node)); - list_insert_head(&state->arcs_list[ab->b_type], ab); + list_insert_head(list, ab); + mutex_exit(lock); + ASSERT(ab->b_datacnt > 0); atomic_add_64(size, ab->b_size * ab->b_datacnt); - mutex_exit(&state->arcs_mtx); } return (cnt); } @@ -1016,6 +1065,8 @@ arc_change_state(arc_state_t *new_state, arc_state_t *old_state = ab->b_state; int64_t refcnt = refcount_count(&ab->b_refcnt); uint64_t from_delta, to_delta; + list_t *list; + kmutex_t *lock; ASSERT(MUTEX_HELD(hash_lock)); ASSERT(new_state != old_state); @@ -1030,14 +1081,17 @@ arc_change_state(arc_state_t *new_state, */ if (refcnt == 0) { if (old_state != arc_anon) { - int use_mutex = !MUTEX_HELD(&old_state->arcs_mtx); + int use_mutex; uint64_t *size = &old_state->arcs_lsize[ab->b_type]; + get_buf_info(ab, old_state, &list, &lock); + use_mutex = !MUTEX_HELD(lock); + if (use_mutex) - mutex_enter(&old_state->arcs_mtx); + mutex_enter(lock); ASSERT(list_link_active(&ab->b_arc_node)); - list_remove(&old_state->arcs_list[ab->b_type], ab); + list_remove(list, ab); /* * If prefetching out of the ghost cache, @@ -1052,16 +1106,20 @@ arc_change_state(arc_state_t *new_state, atomic_add_64(size, -from_delta); if (use_mutex) - mutex_exit(&old_state->arcs_mtx); + mutex_exit(lock); } if (new_state != arc_anon) { - int use_mutex = !MUTEX_HELD(&new_state->arcs_mtx); + int use_mutex; uint64_t *size = &new_state->arcs_lsize[ab->b_type]; + get_buf_info(ab, new_state, &list, &lock); + use_mutex = !MUTEX_HELD(lock); + + if (use_mutex) - mutex_enter(&new_state->arcs_mtx); + mutex_enter(lock); - list_insert_head(&new_state->arcs_list[ab->b_type], ab); + list_insert_head(list, ab); /* ghost elements have a ghost size */ if (GHOST_STATE(new_state)) { @@ -1072,7 +1130,7 @@ arc_change_state(arc_state_t *new_state, atomic_add_64(size, to_delta); if (use_mutex) - mutex_exit(&new_state->arcs_mtx); + mutex_exit(lock); } } @@ -1462,21 +1520,49 @@ arc_evict(arc_state_t *state, spa_t *spa { arc_state_t *evicted_state; uint64_t bytes_evicted = 0, skipped = 0, missed = 0; + int64_t bytes_remaining; arc_buf_hdr_t *ab, *ab_prev = NULL; - list_t *list = &state->arcs_list[type]; + list_t *evicted_list, *list, *evicted_list_start, *list_start; + kmutex_t *lock, *evicted_lock; kmutex_t *hash_lock; boolean_t have_lock; void *stolen = NULL; + static int evict_metadata_offset, evict_data_offset; + int i, idx, offset, list_count, count; ASSERT(state == arc_mru || state == arc_mfu); evicted_state = (state == arc_mru) ? arc_mru_ghost : arc_mfu_ghost; + + if (type == ARC_BUFC_METADATA) { + offset = 0; + list_count = ARC_BUFC_NUMMETADATALISTS; + list_start = &state->arcs_lists[0]; + evicted_list_start = &evicted_state->arcs_lists[0]; + idx = evict_metadata_offset; + } else { + offset = ARC_BUFC_NUMMETADATALISTS; + + list_start = &state->arcs_lists[offset]; + evicted_list_start = &evicted_state->arcs_lists[offset]; + list_count = ARC_BUFC_NUMDATALISTS; + idx = evict_data_offset; + } + bytes_remaining = evicted_state->arcs_lsize[type]; + count = 0; + +evict_start: + list = &list_start[idx]; + evicted_list = &evicted_list_start[idx]; + lock = ARCS_LOCK(state, (offset + idx)); + evicted_lock = ARCS_LOCK(evicted_state, (offset + idx)); - mutex_enter(&state->arcs_mtx); - mutex_enter(&evicted_state->arcs_mtx); + mutex_enter(lock); + mutex_enter(evicted_lock); for (ab = list_tail(list); ab; ab = ab_prev) { ab_prev = list_prev(list, ab); + bytes_remaining -= (ab->b_size * ab->b_datacnt); /* prefetch buffers have a minimum lifespan */ if (HDR_IO_IN_PROGRESS(ab) || (spa && ab->b_spa != spa) || @@ -1536,18 +1622,36 @@ arc_evict(arc_state_t *state, spa_t *spa mutex_exit(hash_lock); if (bytes >= 0 && bytes_evicted >= bytes) break; + if (bytes_remaining > 0) { + mutex_exit(evicted_lock); + mutex_exit(lock); + idx = ((idx + 1)&(list_count-1)); + count++; + goto evict_start; + } } else { missed += 1; } } - mutex_exit(&evicted_state->arcs_mtx); - mutex_exit(&state->arcs_mtx); - - if (bytes_evicted < bytes) - dprintf("only evicted %lld bytes from %x", - (longlong_t)bytes_evicted, state); + mutex_exit(evicted_lock); + mutex_exit(lock); + + idx = ((idx + 1)&(list_count-1)); + count++; + if (bytes_evicted < bytes) { + if (count < list_count) + goto evict_start; + else + dprintf("only evicted %lld bytes from %x", + (longlong_t)bytes_evicted, state); + } + if (type == ARC_BUFC_METADATA) + evict_metadata_offset = idx; + else + evict_data_offset = idx; + if (skipped) ARCSTAT_INCR(arcstat_evict_skip, skipped); @@ -1586,14 +1690,28 @@ static void arc_evict_ghost(arc_state_t *state, spa_t *spa, int64_t bytes) { arc_buf_hdr_t *ab, *ab_prev; - list_t *list = &state->arcs_list[ARC_BUFC_DATA]; - kmutex_t *hash_lock; + list_t *list, *list_start; + kmutex_t *hash_lock, *lock; uint64_t bytes_deleted = 0; uint64_t bufs_skipped = 0; + static int evict_offset; + int list_count, idx = evict_offset; + int offset, count = 0; ASSERT(GHOST_STATE(state)); -top: - mutex_enter(&state->arcs_mtx); + + /* + * data lists come after metadata lists + */ + list_start = &state->arcs_lists[ARC_BUFC_NUMMETADATALISTS]; + list_count = ARC_BUFC_NUMDATALISTS; + offset = ARC_BUFC_NUMMETADATALISTS; + +evict_start: + list = &list_start[idx]; + lock = ARCS_LOCK(state, idx + offset); + + mutex_enter(lock); for (ab = list_tail(list); ab; ab = ab_prev) { ab_prev = list_prev(list, ab); if (spa && ab->b_spa != spa) @@ -1623,20 +1741,31 @@ top: break; } else { if (bytes < 0) { - mutex_exit(&state->arcs_mtx); + /* + * we're draining the ARC, retry + */ + mutex_exit(lock); mutex_enter(hash_lock); mutex_exit(hash_lock); - goto top; + goto evict_start; } bufs_skipped += 1; } } - mutex_exit(&state->arcs_mtx); - - if (list == &state->arcs_list[ARC_BUFC_DATA] && + mutex_exit(lock); + idx = ((idx + 1)&(ARC_BUFC_NUMDATALISTS-1)); + count++; + + if (count < list_count) + goto evict_start; + + evict_offset = idx; + if ((uintptr_t)list > (uintptr_t)&state->arcs_lists[ARC_BUFC_NUMMETADATALISTS] && (bytes < 0 || bytes_deleted < bytes)) { - list = &state->arcs_list[ARC_BUFC_METADATA]; - goto top; + list_start = &state->arcs_lists[0]; + list_count = ARC_BUFC_NUMMETADATALISTS; + offset = count = 0; + goto evict_start; } if (bufs_skipped) { @@ -1750,22 +1879,22 @@ restart: void arc_flush(spa_t *spa) { - while (list_head(&arc_mru->arcs_list[ARC_BUFC_DATA])) { + while (arc_mru->arcs_lsize[ARC_BUFC_DATA]) { (void) arc_evict(arc_mru, spa, -1, FALSE, ARC_BUFC_DATA); if (spa) break; } - while (list_head(&arc_mru->arcs_list[ARC_BUFC_METADATA])) { + while (arc_mru->arcs_lsize[ARC_BUFC_METADATA]) { (void) arc_evict(arc_mru, spa, -1, FALSE, ARC_BUFC_METADATA); if (spa) break; } - while (list_head(&arc_mfu->arcs_list[ARC_BUFC_DATA])) { + while (arc_mfu->arcs_lsize[ARC_BUFC_DATA]) { (void) arc_evict(arc_mfu, spa, -1, FALSE, ARC_BUFC_DATA); if (spa) break; } - while (list_head(&arc_mfu->arcs_list[ARC_BUFC_METADATA])) { + while (arc_mfu->arcs_lsize[ARC_BUFC_METADATA]) { (void) arc_evict(arc_mfu, spa, -1, FALSE, ARC_BUFC_METADATA); if (spa) break; @@ -1821,6 +1950,12 @@ arc_reclaim_needed(void) #endif #ifdef _KERNEL + if (needfree) + return (1); + if (arc_size > arc_c_max) + return (1); + if (arc_size <= arc_c_min) + return (0); /* * If pages are needed or we're within 2048 pages @@ -1829,9 +1964,6 @@ arc_reclaim_needed(void) if (vm_pages_needed || (vm_paging_target() > -2048)) return (1); - if (needfree) - return (1); - #if 0 /* * take 'desfree' extra pages, so we reclaim sooner, rather than later @@ -1893,8 +2025,6 @@ arc_kmem_reap_now(arc_reclaim_strategy_t size_t i; kmem_cache_t *prev_cache = NULL; kmem_cache_t *prev_data_cache = NULL; - extern kmem_cache_t *zio_buf_cache[]; - extern kmem_cache_t *zio_data_buf_cache[]; #endif #ifdef _KERNEL @@ -2820,7 +2950,9 @@ arc_buf_evict(arc_buf_t *buf) arc_buf_hdr_t *hdr; kmutex_t *hash_lock; arc_buf_t **bufp; - + list_t *list, *evicted_list; + kmutex_t *lock, *evicted_lock; + rw_enter(&buf->b_lock, RW_WRITER); hdr = buf->b_hdr; if (hdr == NULL) { @@ -2868,16 +3000,18 @@ arc_buf_evict(arc_buf_t *buf) evicted_state = (old_state == arc_mru) ? arc_mru_ghost : arc_mfu_ghost; - mutex_enter(&old_state->arcs_mtx); - mutex_enter(&evicted_state->arcs_mtx); + get_buf_info(hdr, old_state, &list, &lock); + get_buf_info(hdr, evicted_state, &evicted_list, &evicted_lock); + mutex_enter(lock); + mutex_enter(evicted_lock); arc_change_state(evicted_state, hdr, hash_lock); ASSERT(HDR_IN_HASH_TABLE(hdr)); hdr->b_flags |= ARC_IN_HASH_TABLE; hdr->b_flags &= ~ARC_BUF_AVAILABLE; - mutex_exit(&evicted_state->arcs_mtx); - mutex_exit(&old_state->arcs_mtx); + mutex_exit(evicted_lock); + mutex_exit(lock); } mutex_exit(hash_lock); rw_exit(&buf->b_lock); @@ -3423,7 +3557,8 @@ void arc_init(void) { int prefetch_tunable_set = 0; - + int i; + mutex_init(&arc_reclaim_thr_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&arc_reclaim_thr_cv, NULL, CV_DEFAULT, NULL); mutex_init(&arc_lowmem_lock, NULL, MUTEX_DEFAULT, NULL); @@ -3491,33 +3626,34 @@ arc_init(void) arc_l2c_only = &ARC_l2c_only; arc_size = 0; - mutex_init(&arc_anon->arcs_mtx, NULL, MUTEX_DEFAULT, NULL); - mutex_init(&arc_mru->arcs_mtx, NULL, MUTEX_DEFAULT, NULL); - mutex_init(&arc_mru_ghost->arcs_mtx, NULL, MUTEX_DEFAULT, NULL); - mutex_init(&arc_mfu->arcs_mtx, NULL, MUTEX_DEFAULT, NULL); - mutex_init(&arc_mfu_ghost->arcs_mtx, NULL, MUTEX_DEFAULT, NULL); - mutex_init(&arc_l2c_only->arcs_mtx, NULL, MUTEX_DEFAULT, NULL); - - list_create(&arc_mru->arcs_list[ARC_BUFC_METADATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_mru->arcs_list[ARC_BUFC_DATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_mru_ghost->arcs_list[ARC_BUFC_METADATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_mru_ghost->arcs_list[ARC_BUFC_DATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_mfu->arcs_list[ARC_BUFC_METADATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_mfu->arcs_list[ARC_BUFC_DATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_mfu_ghost->arcs_list[ARC_BUFC_METADATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_mfu_ghost->arcs_list[ARC_BUFC_DATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_l2c_only->arcs_list[ARC_BUFC_METADATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); - list_create(&arc_l2c_only->arcs_list[ARC_BUFC_DATA], - sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); + for (i = 0; i < ARC_BUFC_NUMLISTS; i++) { + + mutex_init(&arc_anon->arcs_locks[i].arcs_lock, + NULL, MUTEX_DEFAULT, NULL); + mutex_init(&arc_mru->arcs_locks[i].arcs_lock, + NULL, MUTEX_DEFAULT, NULL); + mutex_init(&arc_mru_ghost->arcs_locks[i].arcs_lock, + NULL, MUTEX_DEFAULT, NULL); + mutex_init(&arc_mfu->arcs_locks[i].arcs_lock, + NULL, MUTEX_DEFAULT, NULL); + mutex_init(&arc_mfu_ghost->arcs_locks[i].arcs_lock, + NULL, MUTEX_DEFAULT, NULL); + mutex_init(&arc_l2c_only->arcs_locks[i].arcs_lock, + NULL, MUTEX_DEFAULT, NULL); + + list_create(&arc_mru->arcs_lists[i], + sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); + list_create(&arc_mru_ghost->arcs_lists[i], + sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); + list_create(&arc_mfu->arcs_lists[i], + sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); + list_create(&arc_mfu_ghost->arcs_lists[i], + sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); + list_create(&arc_mfu_ghost->arcs_lists[i], + sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); + list_create(&arc_l2c_only->arcs_lists[i], + sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node)); + } buf_init(); @@ -3591,7 +3727,8 @@ arc_init(void) void arc_fini(void) { - + int i; + mutex_enter(&arc_reclaim_thr_lock); arc_thread_exit = 1; cv_signal(&arc_reclaim_thr_cv); @@ -3612,21 +3749,19 @@ arc_fini(void) mutex_destroy(&arc_reclaim_thr_lock); cv_destroy(&arc_reclaim_thr_cv); - list_destroy(&arc_mru->arcs_list[ARC_BUFC_METADATA]); - list_destroy(&arc_mru_ghost->arcs_list[ARC_BUFC_METADATA]); - list_destroy(&arc_mfu->arcs_list[ARC_BUFC_METADATA]); - list_destroy(&arc_mfu_ghost->arcs_list[ARC_BUFC_METADATA]); - list_destroy(&arc_mru->arcs_list[ARC_BUFC_DATA]); - list_destroy(&arc_mru_ghost->arcs_list[ARC_BUFC_DATA]); - list_destroy(&arc_mfu->arcs_list[ARC_BUFC_DATA]); - list_destroy(&arc_mfu_ghost->arcs_list[ARC_BUFC_DATA]); - - mutex_destroy(&arc_anon->arcs_mtx); - mutex_destroy(&arc_mru->arcs_mtx); - mutex_destroy(&arc_mru_ghost->arcs_mtx); - mutex_destroy(&arc_mfu->arcs_mtx); - mutex_destroy(&arc_mfu_ghost->arcs_mtx); - + for (i = 0; i < ARC_BUFC_NUMLISTS; i++) { + list_destroy(&arc_mru->arcs_lists[i]); + list_destroy(&arc_mru_ghost->arcs_lists[i]); + list_destroy(&arc_mfu->arcs_lists[i]); + list_destroy(&arc_mfu_ghost->arcs_lists[i]); + + mutex_destroy(&arc_anon->arcs_locks[i].arcs_lock); + mutex_destroy(&arc_mru->arcs_locks[i].arcs_lock); + mutex_destroy(&arc_mru_ghost->arcs_locks[i].arcs_lock); + mutex_destroy(&arc_mfu->arcs_locks[i].arcs_lock); + mutex_destroy(&arc_mfu_ghost->arcs_locks[i].arcs_lock); + } + mutex_destroy(&zfs_write_limit_lock); buf_fini(); @@ -4021,26 +4156,26 @@ static list_t * l2arc_list_locked(int list_num, kmutex_t **lock) { list_t *list; + int idx; + + ASSERT(list_num >= 0 && list_num <= 2*ARC_BUFC_NUMLISTS); - ASSERT(list_num >= 0 && list_num <= 3); - - switch (list_num) { - case 0: - list = &arc_mfu->arcs_list[ARC_BUFC_METADATA]; - *lock = &arc_mfu->arcs_mtx; - break; - case 1: - list = &arc_mru->arcs_list[ARC_BUFC_METADATA]; - *lock = &arc_mru->arcs_mtx; - break; - case 2: - list = &arc_mfu->arcs_list[ARC_BUFC_DATA]; - *lock = &arc_mfu->arcs_mtx; - break; - case 3: - list = &arc_mru->arcs_list[ARC_BUFC_DATA]; - *lock = &arc_mru->arcs_mtx; - break; + if (list_num < ARC_BUFC_NUMMETADATALISTS) { + list = &arc_mfu->arcs_lists[list_num]; + *lock = ARCS_LOCK(arc_mfu, list_num); + } else if (list_num < ARC_BUFC_NUMMETADATALISTS*2) { + idx = list_num - ARC_BUFC_NUMMETADATALISTS; + list = &arc_mru->arcs_lists[idx]; + *lock = ARCS_LOCK(arc_mru, idx); + } else if (list_num < (ARC_BUFC_NUMMETADATALISTS*2 + + ARC_BUFC_NUMDATALISTS)) { + idx = list_num - ARC_BUFC_NUMLISTS; + list = &arc_mfu->arcs_lists[idx]; + *lock = ARCS_LOCK(arc_mfu, idx); + } else { + idx = list_num - ARC_BUFC_NUMLISTS - ARC_BUFC_NUMMETADATALISTS; + list = &arc_mru->arcs_lists[idx]; + *lock = ARCS_LOCK(arc_mru, idx); } ASSERT(!(MUTEX_HELD(*lock))); @@ -4211,7 +4346,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_de * Copy buffers for L2ARC writing. */ mutex_enter(&l2arc_buflist_mtx); - for (try = 0; try <= 3; try++) { + for (try = 0; try <= 2*ARC_BUFC_NUMLISTS; try++) { list = l2arc_list_locked(try, &list_lock); passed_sz = 0; Modified: user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c ============================================================================== --- user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c Mon Nov 30 23:54:16 2009 (r199977) +++ user/kmacy/releng_8_fcs/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c Tue Dec 1 00:42:17 2009 (r199978) @@ -177,22 +177,22 @@ dmu_bonus_hold(objset_t *os, uint64_t ob * whose dnodes are in the same block. */ static int -dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, - uint64_t length, int read, void *tag, int *numbufsp, dmu_buf_t ***dbpp) +dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, uint64_t length, + int read, void *tag, int *numbufsp, dmu_buf_t ***dbpp, uint32_t flags) { dsl_pool_t *dp = NULL; dmu_buf_t **dbp; uint64_t blkid, nblks, i; - uint32_t flags; + uint32_t dbuf_flags; int err; zio_t *zio; hrtime_t start; ASSERT(length <= DMU_MAX_ACCESS); - flags = DB_RF_CANFAIL | DB_RF_NEVERWAIT; - if (length > zfetch_array_rd_sz) - flags |= DB_RF_NOPREFETCH; + dbuf_flags = DB_RF_CANFAIL | DB_RF_NEVERWAIT; + if (flags & DMU_READ_NO_PREFETCH || length > zfetch_array_rd_sz) + dbuf_flags |= DB_RF_NOPREFETCH; rw_enter(&dn->dn_struct_rwlock, RW_READER); if (dn->dn_datablkshift) { @@ -230,7 +230,7 @@ dmu_buf_hold_array_by_dnode(dnode_t *dn, /* initiate async i/o */ if (read) { rw_exit(&dn->dn_struct_rwlock); - (void) dbuf_read(db, zio, flags); + (void) dbuf_read(db, zio, dbuf_flags); rw_enter(&dn->dn_struct_rwlock, RW_READER); } dbp[i] = &db->db; @@ -282,7 +282,7 @@ dmu_buf_hold_array(objset_t *os, uint64_ return (err); err = dmu_buf_hold_array_by_dnode(dn, offset, length, read, tag, - numbufsp, dbpp); + numbufsp, dbpp, DMU_READ_PREFETCH); dnode_rele(dn, FTAG); @@ -297,7 +297,7 @@ dmu_buf_hold_array_by_bonus(dmu_buf_t *d int err; err = dmu_buf_hold_array_by_dnode(dn, offset, length, read, tag, - numbufsp, dbpp); + numbufsp, dbpp, DMU_READ_PREFETCH); return (err); } @@ -536,8 +536,8 @@ dmu_free_range(objset_t *os, uint64_t ob } int -dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, - void *buf) +dmu_read_flags(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, + void *buf, uint32_t flags) { dnode_t *dn; dmu_buf_t **dbp; @@ -567,7 +567,7 @@ dmu_read(objset_t *os, uint64_t object, * to be reading in parallel. *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***