From owner-svn-src-all@FreeBSD.ORG Fri Apr 3 14:45:50 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 647A85BF; Fri, 3 Apr 2015 14:45:50 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4F0B2239; Fri, 3 Apr 2015 14:45:50 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id t33EjowV099452; Fri, 3 Apr 2015 14:45:50 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id t33EjnJx099446; Fri, 3 Apr 2015 14:45:49 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201504031445.t33EjnJx099446@svn.freebsd.org> X-Authentication-Warning: svn.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Fri, 3 Apr 2015 14:45:49 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r281026 - in head/sys: cddl/contrib/opensolaris/uts/common/fs/zfs kern sys X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Apr 2015 14:45:50 -0000 Author: mav Date: Fri Apr 3 14:45:48 2015 New Revision: 281026 URL: https://svnweb.freebsd.org/changeset/base/281026 Log: Make ZFS ARC track both KVA usage and fragmentation. Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has even less KVA, but had no such limit on archs with direct map as amd64. As result, on machines with a lot of RAM, during load with very small user- space memory pressure, such as `zfs send`, it was possible to reach state, when there is enough both physical RAM and KVA (I've seen up to 25-30%), but no continuous KVA range to allocate even single 128KB I/O request. Address this situation from two sides: - restore KVA usage limitations in a way the most close to Illumos; - introduce new requirement for KVA fragmentation, specifying that we should have at least one sequential KVA range of zfs_max_recordsize bytes. Experiments show that first limitation done alone is not sufficient. On machine with 64GB of RAM it is sometimes needed to drop up to half of ARC size to get at leats one 1MB KVA chunk. Statically limiting ARC to half of KVA/RAM is too strict, so second limitation makes it to work in cycles: accumulate trash up to certain critical mass, do massive spring-cleaning, and then start littering again. :) MFC after: 1 month Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c head/sys/kern/subr_vmem.c head/sys/sys/vmem.h Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Apr 3 14:39:16 2015 (r281025) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Apr 3 14:45:48 2015 (r281026) @@ -2648,8 +2648,11 @@ arc_reclaim_needed(void) (vmem_size(heap_arena, VMEM_FREE | VMEM_ALLOC)) >> 2); return (1); } +#define zio_arena NULL +#else +#define zio_arena heap_arena #endif -#ifdef illumos + /* * If zio data pages are being allocated out of a separate heap segment, * then enforce that the size of available vmem for this arena remains @@ -2663,7 +2666,14 @@ arc_reclaim_needed(void) vmem_size(zio_arena, VMEM_FREE) < (vmem_size(zio_arena, VMEM_ALLOC) >> 4)) return (1); -#endif /* illumos */ + + /* + * Above limits know nothing about real level of KVA fragmentation. + * Start aggressive reclamation if too little sequential KVA left. + */ + if (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) + return (1); + #else /* _KERNEL */ if (spa_get_random(100) == 0) return (1); Modified: head/sys/kern/subr_vmem.c ============================================================================== --- head/sys/kern/subr_vmem.c Fri Apr 3 14:39:16 2015 (r281025) +++ head/sys/kern/subr_vmem.c Fri Apr 3 14:45:48 2015 (r281026) @@ -1320,6 +1320,7 @@ vmem_add(vmem_t *vm, vmem_addr_t addr, v vmem_size_t vmem_size(vmem_t *vm, int typemask) { + int i; switch (typemask) { case VMEM_ALLOC: @@ -1328,6 +1329,14 @@ vmem_size(vmem_t *vm, int typemask) return vm->vm_size - vm->vm_inuse; case VMEM_FREE|VMEM_ALLOC: return vm->vm_size; + case VMEM_MAXFREE: + for (i = VMEM_MAXORDER - 1; i >= 0; i--) { + if (LIST_EMPTY(&vm->vm_freelist[i])) + continue; + return ((vmem_size_t)ORDER2SIZE(i) << + vm->vm_quantum_shift); + } + return (0); default: panic("vmem_size"); } Modified: head/sys/sys/vmem.h ============================================================================== --- head/sys/sys/vmem.h Fri Apr 3 14:39:16 2015 (r281025) +++ head/sys/sys/vmem.h Fri Apr 3 14:45:48 2015 (r281026) @@ -129,6 +129,7 @@ void vmem_startup(void); /* vmem_size typemask */ #define VMEM_ALLOC 0x01 #define VMEM_FREE 0x02 +#define VMEM_MAXFREE 0x10 #endif /* _KERNEL */