From owner-svn-src-head@freebsd.org Wed Apr 11 05:04:59 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 834B3F95F49; Wed, 11 Apr 2018 05:04:59 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1F4376B88A; Wed, 11 Apr 2018 05:04:59 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [10.1.1.2] (Seawolf.HML3.ScaleEngine.net [209.51.186.28]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id 6D3DA146E7; Wed, 11 Apr 2018 05:04:58 +0000 (UTC) Subject: Re: svn commit: r315449 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs From: Allan Jude To: Steven Hartland , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Cc: Andriy Gapon , Josh Paetzel , Alexander Motin , Mark Johnston References: <201703171234.v2HCYvgd026429@repo.freebsd.org> <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org> Openpgp: preference=signencrypt Message-ID: <67b803af-3ec3-1bf4-68d9-8cfa19dde160@freebsd.org> Date: Wed, 11 Apr 2018 01:04:54 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="FvKA9jmSEiZ4hcNv4wJsPdcs0VzkmlO1I" X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2018 05:04:59 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --FvKA9jmSEiZ4hcNv4wJsPdcs0VzkmlO1I Content-Type: multipart/mixed; boundary="RS1c8P4Ud5TSvJ9m7xo6RsK4RjtV0w1fT"; protected-headers="v1" From: Allan Jude To: Steven Hartland , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Cc: Andriy Gapon , Josh Paetzel , Alexander Motin , Mark Johnston Message-ID: <67b803af-3ec3-1bf4-68d9-8cfa19dde160@freebsd.org> Subject: Re: svn commit: r315449 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs References: <201703171234.v2HCYvgd026429@repo.freebsd.org> <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org> In-Reply-To: <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org> --RS1c8P4Ud5TSvJ9m7xo6RsK4RjtV0w1fT Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 2018-02-25 22:56, Allan Jude wrote: > On 2017-03-17 08:34, Steven Hartland wrote: >> Author: smh >> Date: Fri Mar 17 12:34:57 2017 >> New Revision: 315449 >> URL: https://svnweb.freebsd.org/changeset/base/315449 >> >> Log: >> Reduce ARC fragmentation threshold >> =20 >> As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during z= fs recv, >> update the threshold at which we start agressive reclamation to use >> SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which= >> defaults to 1M. >> =20 >> PR: 194513 >> Reviewed by: avg, mav >> MFC after: 1 month >> Sponsored by: Multiplay >> Differential Revision: https://reviews.freebsd.org/D10012 >> >> Modified: >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c >> >> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Mar = 17 12:34:56 2017 (r315448) >> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Mar = 17 12:34:57 2017 (r315449) >> @@ -3978,7 +3978,7 @@ arc_available_memory(void) >> * Start aggressive reclamation if too little sequential KVA left. >> */ >> if (lowest > 0) { >> - n =3D (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ? >> + n =3D (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ? >> -((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) : >> INT64_MAX; >> if (n < lowest) { >> >=20 > I have some users reporting excessive ARC shrinking in 11.1 vs 11.0 due= > to this change. >=20 > Memory seems quite fragmented, and this change makes it much more > sensitive to that, but the problem seems to be that is can get to > aggressive. >=20 > The most recent case, the machine has 128GB of ram, and no other major > processes running, just ZFS zvols being served over iSCIS by ctld. >=20 > arc_max set to 85GB, rather conservative. After running for a few days,= > fragmentation seems to trip this line, when there are no 16mb contiguou= s > blocks, and it shrinks the ARC by 1/16th of memory, but this does not > result in a 16mb contiguous chunk, so it shrinks the ARC by another > 1/16th, and again until it hits arc_min. Apparently eventually the ARC > does regrow, but then crashes again later. >=20 > You can see the ARC oscillating between arc_max and arc_min, with some > long periods pinned at arc_min: https://imgur.com/a/emztF >=20 >=20 > [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=3D= 0; > cache=3D0; used=3D0 } {u =3D $2 * $4; c =3D $2 * $5; t =3D u + c; cache= +=3D c; used > +=3D u; total +=3D t; name=3D$1; gsub(" ", "_", name); print t, name, u= , c} > END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e > 'while (($j, $_) =3D each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; prin= t > $_, " "} print "\n"' | column -t | tail TOTAL NAME USED Cache > 1,723,367,424 zio_data_buf_49152 1,722,875,904 491,520 > 1,827,057,664 zio_buf_4096 1,826,848,768 208,896 > 2,289,459,200 zio_data_buf_40960 2,289,090,560 368,640 > 3,642,736,640 zio_data_buf_81920 3,642,408,960 327,680 > 6,713,180,160 zio_data_buf_98304 6,712,688,640 491,520 > 9,388,195,840 zio_buf_8192 9,388,064,768 131,072 > 11,170,152,448 zio_data_buf_114688 11,168,890,880 1,261,568 > 29,607,329,792 zio_data_buf_131072 29,606,674,432 655,360 > 32,944,750,592 zio_buf_65536 32,943,833,088 917,504 > 114,235,296,752 TOTAL 111,787,212,900 2,448,083,852 >=20 >=20 > [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=3D= 0; > cache=3D0; used=3D0 } {u =3D $2 * $4; c =3D $2 * $5; t =3D u + c; cache= +=3D c; used > +=3D u; total +=3D t; name=3D$1; gsub(" ", "_", name); print t, name, u= , c} > END { print total, "TOTAL", used, cache } ' | sort -n +3 | perl -a -p -= e > 'while (($j, $_) =3D each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; prin= t > $_, " "} print "\n"' | column -t | tail Sorted by cache (waste) TOTAL NAME USED Cache > 71,565,312 cblk15 0 71,565,312 > 72,220,672 cblk16 0 72,220,672 > 72,351,744 cblk18 131,072 72,220,672 > 72,744,960 cblk3 0 72,744,960 > 75,497,472 cblk8 0 75,497,472 > 76,283,904 cblk22 0 76,283,904 > 403,696,384 128 286,225,792 117,470,592 > 229,519,360 mbuf_jumbo_page 67,043,328 162,476,032 > 1,196,795,160 arc_buf_hdr_t_l2only 601,620,624 595,174,536 > 114,220,354,544 TOTAL 111,778,349,508 2,442,005,036 >=20 >=20 > Maybe the right thing to do is call the new kmem_cache_reap_soon() or > other functions that might actually reduce fragmentation, or rate limit= > how quickly the ARC will shrink? >=20 > What kind of tools do we have to look at why memory is so fragmented > that ZFS feels the need to tank the ARC? >=20 >=20 >=20 > I know this block and the FMR_ZIO_FRAG reason have been removed from > -CURRENT as part of the NUMA work, but I am worried about addressing > this issue for the upcoming 11.2-RELEASE. >=20 >=20 >=20 Does anyone have any thoughts on this? The 11.2 code slush starts in 1 week, so we really need to decide what to do here. --=20 Allan Jude --RS1c8P4Ud5TSvJ9m7xo6RsK4RjtV0w1fT-- --FvKA9jmSEiZ4hcNv4wJsPdcs0VzkmlO1I Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQIcBAEBAgAGBQJazZd5AAoJEBmVNT4SmAt+YBIQAOuIMs21mhqgs3T+xJYtyIKL EvfE+wfe5N3SwO0TnUHb5vg6A6kX+9dJr9U2UH77x4gAxkrIyl/Gz0KDGg4SxfGR byMYUEkADbJ0aTR828Sw+bKnCRMvRSrZsWYkLM7AY+m5irCmhUXPgZcWECaj7GK9 rTqf6D8dFNFaP+SSnvypotwbv9lDi3RTOASXzrbQShwBGk1On5jNQBA2OqOGXJ2g 371iu+wpPzIQkxqHkQSDKu0vSGac7s03eGLqe7fqi+bUhTdi1pE8tY3F0mXlCwXP 3k8j+uTiPQk3eQg3KXVN2PYhljCV7/Ua+jXwlDQl0k61OB08xrQg5YUQ19WuFdpH 4AdlKzNyeDU1PyT4+cthFIudjHTlguC7CC/8hqELkIUWsHKJKXfUQyP6otG5tolZ vayx7x+w/ibUDdxMEGIdfJhHaOdbV/WJ8v+2cyxpveK6OeMAPpy726eARRNRe85Q PnE/Wp8neTUCpT4p5SsT8uSb3LHaTVDcB53tP3Xtw7LkoeS+CIzkksbayxJBU+g8 9I3ObAeL36VuNjA35UosztaVt6vqS0DdPN3rSItfckBdDzZSaQ2GvfFdErhotU3a nS3LhHP8Rowe5rv9RIk6onejzUV8agE+hXmyvsnVVfumHX904KPOdWbAZ2ruzRjF 9v0JdfThXOYnVnb82VP9 =egwp -----END PGP SIGNATURE----- --FvKA9jmSEiZ4hcNv4wJsPdcs0VzkmlO1I--