From owner-svn-src-head@freebsd.org Mon Feb 26 03:56:32 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07C12F2670E; Mon, 26 Feb 2018 03:56:32 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AEAEA804DE; Mon, 26 Feb 2018 03:56:31 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [10.1.1.2] (Seawolf.HML3.ScaleEngine.net [209.51.186.28]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id 158BF14A2F; Mon, 26 Feb 2018 03:56:25 +0000 (UTC) Subject: Re: svn commit: r315449 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs To: Steven Hartland , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201703171234.v2HCYvgd026429@repo.freebsd.org> From: Allan Jude Cc: Mike Geiger , Andriy Gapon , Josh Paetzel , Alexander Motin Message-ID: <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org> Date: Sun, 25 Feb 2018 22:56:27 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <201703171234.v2HCYvgd026429@repo.freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Feb 2018 03:56:32 -0000 On 2017-03-17 08:34, Steven Hartland wrote: > Author: smh > Date: Fri Mar 17 12:34:57 2017 > New Revision: 315449 > URL: https://svnweb.freebsd.org/changeset/base/315449 > > Log: > Reduce ARC fragmentation threshold > > As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during zfs recv, > update the threshold at which we start agressive reclamation to use > SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which > defaults to 1M. > > PR: 194513 > Reviewed by: avg, mav > MFC after: 1 month > Sponsored by: Multiplay > Differential Revision: https://reviews.freebsd.org/D10012 > > Modified: > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > > Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > ============================================================================== > --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Mar 17 12:34:56 2017 (r315448) > +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Mar 17 12:34:57 2017 (r315449) > @@ -3978,7 +3978,7 @@ arc_available_memory(void) > * Start aggressive reclamation if too little sequential KVA left. > */ > if (lowest > 0) { > - n = (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ? > + n = (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ? > -((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) : > INT64_MAX; > if (n < lowest) { > I have some users reporting excessive ARC shrinking in 11.1 vs 11.0 due to this change. Memory seems quite fragmented, and this change makes it much more sensitive to that, but the problem seems to be that is can get to aggressive. The most recent case, the machine has 128GB of ram, and no other major processes running, just ZFS zvols being served over iSCIS by ctld. arc_max set to 85GB, rather conservative. After running for a few days, fragmentation seems to trip this line, when there are no 16mb contiguous blocks, and it shrinks the ARC by 1/16th of memory, but this does not result in a 16mb contiguous chunk, so it shrinks the ARC by another 1/16th, and again until it hits arc_min. Apparently eventually the ARC does regrow, but then crashes again later. You can see the ARC oscillating between arc_max and arc_min, with some long periods pinned at arc_min: https://imgur.com/a/emztF [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0; cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c} END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print $_, " "} print "\n"' | column -t | tail 1,723,367,424 zio_data_buf_49152 1,722,875,904 491,520 1,827,057,664 zio_buf_4096 1,826,848,768 208,896 2,289,459,200 zio_data_buf_40960 2,289,090,560 368,640 3,642,736,640 zio_data_buf_81920 3,642,408,960 327,680 6,713,180,160 zio_data_buf_98304 6,712,688,640 491,520 9,388,195,840 zio_buf_8192 9,388,064,768 131,072 11,170,152,448 zio_data_buf_114688 11,168,890,880 1,261,568 29,607,329,792 zio_data_buf_131072 29,606,674,432 655,360 32,944,750,592 zio_buf_65536 32,943,833,088 917,504 114,235,296,752 TOTAL 111,787,212,900 2,448,083,852 [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0; cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c} END { print total, "TOTAL", used, cache } ' | sort -n +3 | perl -a -p -e 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print $_, " "} print "\n"' | column -t | tail 71,565,312 cblk15 0 71,565,312 72,220,672 cblk16 0 72,220,672 72,351,744 cblk18 131,072 72,220,672 72,744,960 cblk3 0 72,744,960 75,497,472 cblk8 0 75,497,472 76,283,904 cblk22 0 76,283,904 403,696,384 128 286,225,792 117,470,592 229,519,360 mbuf_jumbo_page 67,043,328 162,476,032 1,196,795,160 arc_buf_hdr_t_l2only 601,620,624 595,174,536 114,220,354,544 TOTAL 111,778,349,508 2,442,005,036 Maybe the right thing to do is call the new kmem_cache_reap_soon() or other functions that might actually reduce fragmentation, or rate limit how quickly the ARC will shrink? What kind of tools do we have to look at why memory is so fragmented that ZFS feels the need to tank the ARC? I know this block and the FMR_ZIO_FRAG reason have been removed from -CURRENT as part of the NUMA work, but I am worried about addressing this issue for the upcoming 11.2-RELEASE. -- Allan Jude