Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Feb 2018 22:56:27 -0500
From:      Allan Jude <allanjude@freebsd.org>
To:        Steven Hartland <smh@FreeBSD.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Cc:        Mike Geiger <myke@servernorth.net>, Andriy Gapon <avg@FreeBSD.org>, Josh Paetzel <jpaetzel@freebsd.org>, Alexander Motin <mav@FreeBSD.org>
Subject:   Re: svn commit: r315449 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
Message-ID:  <6bf452f9-fd55-1ea9-196a-1cfcf97d06f4@freebsd.org>
In-Reply-To: <201703171234.v2HCYvgd026429@repo.freebsd.org>
References:  <201703171234.v2HCYvgd026429@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-03-17 08:34, Steven Hartland wrote:
> Author: smh
> Date: Fri Mar 17 12:34:57 2017
> New Revision: 315449
> URL: https://svnweb.freebsd.org/changeset/base/315449
> 
> Log:
>   Reduce ARC fragmentation threshold
>   
>   As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during zfs recv,
>   update the threshold at which we start agressive reclamation to use
>   SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which
>   defaults to 1M.
>   
>   PR:		194513
>   Reviewed by:	avg, mav
>   MFC after:	1 month
>   Sponsored by:	Multiplay
>   Differential Revision:	https://reviews.freebsd.org/D10012
> 
> Modified:
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> 
> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> ==============================================================================
> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Mar 17 12:34:56 2017	(r315448)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Mar 17 12:34:57 2017	(r315449)
> @@ -3978,7 +3978,7 @@ arc_available_memory(void)
>  	 * Start aggressive reclamation if too little sequential KVA left.
>  	 */
>  	if (lowest > 0) {
> -		n = (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ?
> +		n = (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ?
>  		    -((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) :
>  		    INT64_MAX;
>  		if (n < lowest) {
> 

I have some users reporting excessive ARC shrinking in 11.1 vs 11.0 due
to this change.

Memory seems quite fragmented, and this change makes it much more
sensitive to that, but the problem seems to be that is can get to
aggressive.

The most recent case, the machine has 128GB of ram, and no other major
processes running, just ZFS zvols being served over iSCIS by ctld.

arc_max set to 85GB, rather conservative. After running for a few days,
fragmentation seems to trip this line, when there are no 16mb contiguous
blocks, and it shrinks the ARC by 1/16th of memory, but this does not
result in a 16mb contiguous chunk, so it shrinks the ARC by another
1/16th, and again until it hits arc_min. Apparently eventually the ARC
does regrow, but then crashes again later.

You can see the ARC oscillating between arc_max and arc_min, with some
long periods pinned at arc_min: https://imgur.com/a/emztF


[root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0;
cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used
+= u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c}
END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e
'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print
$_, " "} print "\n"' | column -t | tail
1,723,367,424    zio_data_buf_49152     1,722,875,904    491,520
1,827,057,664    zio_buf_4096           1,826,848,768    208,896
2,289,459,200    zio_data_buf_40960     2,289,090,560    368,640
3,642,736,640    zio_data_buf_81920     3,642,408,960    327,680
6,713,180,160    zio_data_buf_98304     6,712,688,640    491,520
9,388,195,840    zio_buf_8192           9,388,064,768    131,072
11,170,152,448   zio_data_buf_114688    11,168,890,880   1,261,568
29,607,329,792   zio_data_buf_131072    29,606,674,432   655,360
32,944,750,592   zio_buf_65536          32,943,833,088   917,504
114,235,296,752  TOTAL                  111,787,212,900  2,448,083,852


[root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0;
cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used
+= u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c}
END { print total, "TOTAL", used, cache } ' | sort -n +3 | perl -a -p -e
'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print
$_, " "} print "\n"' | column -t | tail
71,565,312       cblk15                 0                71,565,312
72,220,672       cblk16                 0                72,220,672
72,351,744       cblk18                 131,072          72,220,672
72,744,960       cblk3                  0                72,744,960
75,497,472       cblk8                  0                75,497,472
76,283,904       cblk22                 0                76,283,904
403,696,384      128                    286,225,792      117,470,592
229,519,360      mbuf_jumbo_page        67,043,328       162,476,032
1,196,795,160    arc_buf_hdr_t_l2only   601,620,624      595,174,536
114,220,354,544  TOTAL                  111,778,349,508  2,442,005,036


Maybe the right thing to do is call the new kmem_cache_reap_soon() or
other functions that might actually reduce fragmentation, or rate limit
how quickly the ARC will shrink?

What kind of tools do we have to look at why memory is so fragmented
that ZFS feels the need to tank the ARC?



I know this block and the FMR_ZIO_FRAG reason have been removed from
-CURRENT as part of the NUMA work, but I am worried about addressing
this issue for the upcoming 11.2-RELEASE.



-- 
Allan Jude



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6bf452f9-fd55-1ea9-196a-1cfcf97d06f4>