Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Apr 2018 01:04:54 -0400
From:      Allan Jude <allanjude@freebsd.org>
To:        Steven Hartland <smh@FreeBSD.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Cc:        Andriy Gapon <avg@FreeBSD.org>, Josh Paetzel <jpaetzel@freebsd.org>, Alexander Motin <mav@FreeBSD.org>, Mark Johnston <markj@freebsd.org>
Subject:   Re: svn commit: r315449 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
Message-ID:  <67b803af-3ec3-1bf4-68d9-8cfa19dde160@freebsd.org>
In-Reply-To: <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org>
References:  <201703171234.v2HCYvgd026429@repo.freebsd.org> <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--FvKA9jmSEiZ4hcNv4wJsPdcs0VzkmlO1I
Content-Type: multipart/mixed; boundary="RS1c8P4Ud5TSvJ9m7xo6RsK4RjtV0w1fT";
 protected-headers="v1"
From: Allan Jude <allanjude@freebsd.org>
To: Steven Hartland <smh@FreeBSD.org>, src-committers@freebsd.org,
 svn-src-all@freebsd.org, svn-src-head@freebsd.org
Cc: Andriy Gapon <avg@FreeBSD.org>, Josh Paetzel <jpaetzel@freebsd.org>,
 Alexander Motin <mav@FreeBSD.org>, Mark Johnston <markj@freebsd.org>
Message-ID: <67b803af-3ec3-1bf4-68d9-8cfa19dde160@freebsd.org>
Subject: Re: svn commit: r315449 -
 head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
References: <201703171234.v2HCYvgd026429@repo.freebsd.org>
 <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org>
In-Reply-To: <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org>

--RS1c8P4Ud5TSvJ9m7xo6RsK4RjtV0w1fT
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

On 2018-02-25 22:56, Allan Jude wrote:
> On 2017-03-17 08:34, Steven Hartland wrote:
>> Author: smh
>> Date: Fri Mar 17 12:34:57 2017
>> New Revision: 315449
>> URL: https://svnweb.freebsd.org/changeset/base/315449
>>
>> Log:
>>   Reduce ARC fragmentation threshold
>>  =20
>>   As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during z=
fs recv,
>>   update the threshold at which we start agressive reclamation to use
>>   SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which=

>>   defaults to 1M.
>>  =20
>>   PR:		194513
>>   Reviewed by:	avg, mav
>>   MFC after:	1 month
>>   Sponsored by:	Multiplay
>>   Differential Revision:	https://reviews.freebsd.org/D10012
>>
>> Modified:
>>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>>
>> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
>> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Mar =
17 12:34:56 2017	(r315448)
>> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Mar =
17 12:34:57 2017	(r315449)
>> @@ -3978,7 +3978,7 @@ arc_available_memory(void)
>>  	 * Start aggressive reclamation if too little sequential KVA left.
>>  	 */
>>  	if (lowest > 0) {
>> -		n =3D (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ?
>> +		n =3D (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ?
>>  		    -((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) :
>>  		    INT64_MAX;
>>  		if (n < lowest) {
>>
>=20
> I have some users reporting excessive ARC shrinking in 11.1 vs 11.0 due=

> to this change.
>=20
> Memory seems quite fragmented, and this change makes it much more
> sensitive to that, but the problem seems to be that is can get to
> aggressive.
>=20
> The most recent case, the machine has 128GB of ram, and no other major
> processes running, just ZFS zvols being served over iSCIS by ctld.
>=20
> arc_max set to 85GB, rather conservative. After running for a few days,=

> fragmentation seems to trip this line, when there are no 16mb contiguou=
s
> blocks, and it shrinks the ARC by 1/16th of memory, but this does not
> result in a 16mb contiguous chunk, so it shrinks the ARC by another
> 1/16th, and again until it hits arc_min. Apparently eventually the ARC
> does regrow, but then crashes again later.
>=20
> You can see the ARC oscillating between arc_max and arc_min, with some
> long periods pinned at arc_min: https://imgur.com/a/emztF
>=20
>=20
> [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=3D=
0;
> cache=3D0; used=3D0 } {u =3D $2 * $4; c =3D $2 * $5; t =3D u + c; cache=
 +=3D c; used
> +=3D u; total +=3D t; name=3D$1; gsub(" ", "_", name); print t, name, u=
, c}
> END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e
> 'while (($j, $_) =3D each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; prin=
t
> $_, " "} print "\n"' | column -t | tail

TOTAL              NAME                   USED             Cache

> 1,723,367,424    zio_data_buf_49152     1,722,875,904    491,520
> 1,827,057,664    zio_buf_4096           1,826,848,768    208,896
> 2,289,459,200    zio_data_buf_40960     2,289,090,560    368,640
> 3,642,736,640    zio_data_buf_81920     3,642,408,960    327,680
> 6,713,180,160    zio_data_buf_98304     6,712,688,640    491,520
> 9,388,195,840    zio_buf_8192           9,388,064,768    131,072
> 11,170,152,448   zio_data_buf_114688    11,168,890,880   1,261,568
> 29,607,329,792   zio_data_buf_131072    29,606,674,432   655,360
> 32,944,750,592   zio_buf_65536          32,943,833,088   917,504
> 114,235,296,752  TOTAL                  111,787,212,900  2,448,083,852
>=20
>=20
> [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=3D=
0;
> cache=3D0; used=3D0 } {u =3D $2 * $4; c =3D $2 * $5; t =3D u + c; cache=
 +=3D c; used
> +=3D u; total +=3D t; name=3D$1; gsub(" ", "_", name); print t, name, u=
, c}
> END { print total, "TOTAL", used, cache } ' | sort -n +3 | perl -a -p -=
e
> 'while (($j, $_) =3D each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; prin=
t
> $_, " "} print "\n"' | column -t | tail

Sorted by cache (waste)

TOTAL              NAME                   USED             Cache

> 71,565,312       cblk15                 0                71,565,312
> 72,220,672       cblk16                 0                72,220,672
> 72,351,744       cblk18                 131,072          72,220,672
> 72,744,960       cblk3                  0                72,744,960
> 75,497,472       cblk8                  0                75,497,472
> 76,283,904       cblk22                 0                76,283,904
> 403,696,384      128                    286,225,792      117,470,592
> 229,519,360      mbuf_jumbo_page        67,043,328       162,476,032
> 1,196,795,160    arc_buf_hdr_t_l2only   601,620,624      595,174,536
> 114,220,354,544  TOTAL                  111,778,349,508  2,442,005,036
>=20
>=20
> Maybe the right thing to do is call the new kmem_cache_reap_soon() or
> other functions that might actually reduce fragmentation, or rate limit=

> how quickly the ARC will shrink?
>=20
> What kind of tools do we have to look at why memory is so fragmented
> that ZFS feels the need to tank the ARC?
>=20
>=20
>=20
> I know this block and the FMR_ZIO_FRAG reason have been removed from
> -CURRENT as part of the NUMA work, but I am worried about addressing
> this issue for the upcoming 11.2-RELEASE.
>=20
>=20
>=20

Does anyone have any thoughts on this? The 11.2 code slush starts in 1
week, so we really need to decide what to do here.

--=20
Allan Jude


--RS1c8P4Ud5TSvJ9m7xo6RsK4RjtV0w1fT--

--FvKA9jmSEiZ4hcNv4wJsPdcs0VzkmlO1I
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQIcBAEBAgAGBQJazZd5AAoJEBmVNT4SmAt+YBIQAOuIMs21mhqgs3T+xJYtyIKL
EvfE+wfe5N3SwO0TnUHb5vg6A6kX+9dJr9U2UH77x4gAxkrIyl/Gz0KDGg4SxfGR
byMYUEkADbJ0aTR828Sw+bKnCRMvRSrZsWYkLM7AY+m5irCmhUXPgZcWECaj7GK9
rTqf6D8dFNFaP+SSnvypotwbv9lDi3RTOASXzrbQShwBGk1On5jNQBA2OqOGXJ2g
371iu+wpPzIQkxqHkQSDKu0vSGac7s03eGLqe7fqi+bUhTdi1pE8tY3F0mXlCwXP
3k8j+uTiPQk3eQg3KXVN2PYhljCV7/Ua+jXwlDQl0k61OB08xrQg5YUQ19WuFdpH
4AdlKzNyeDU1PyT4+cthFIudjHTlguC7CC/8hqELkIUWsHKJKXfUQyP6otG5tolZ
vayx7x+w/ibUDdxMEGIdfJhHaOdbV/WJ8v+2cyxpveK6OeMAPpy726eARRNRe85Q
PnE/Wp8neTUCpT4p5SsT8uSb3LHaTVDcB53tP3Xtw7LkoeS+CIzkksbayxJBU+g8
9I3ObAeL36VuNjA35UosztaVt6vqS0DdPN3rSItfckBdDzZSaQ2GvfFdErhotU3a
nS3LhHP8Rowe5rv9RIk6onejzUV8agE+hXmyvsnVVfumHX904KPOdWbAZ2ruzRjF
9v0JdfThXOYnVnb82VP9
=egwp
-----END PGP SIGNATURE-----

--FvKA9jmSEiZ4hcNv4wJsPdcs0VzkmlO1I--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?67b803af-3ec3-1bf4-68d9-8cfa19dde160>