From owner-freebsd-current@freebsd.org Fri Apr 6 17:33:38 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA3A4F98972 for ; Fri, 6 Apr 2018 17:33:37 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from mx2.catspoiler.org (mx2.catspoiler.org [IPv6:2607:f740:16::d18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "amnesiac", Issuer "amnesiac" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7DEBA6ABC8; Fri, 6 Apr 2018 17:33:37 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org ([76.212.85.177]) by mx2.catspoiler.org (8.15.2/8.15.2) with ESMTPS id w36HYiYl079898 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 6 Apr 2018 17:34:46 GMT (envelope-from truckman@FreeBSD.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTPS id w36HXOrb072284 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Apr 2018 10:33:27 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Date: Fri, 6 Apr 2018 10:33:26 -0700 (PDT) From: Don Lewis Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT To: Mark Johnston cc: Andriy Gapon , Bryan Drewery , Peter Jeremy , Jeff Roberson , FreeBSD current In-Reply-To: <20180404174949.GA12271@raichu> Message-ID: References: <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> <20180320070745.GA12880@server.rulingia.com> <2b3db2af-03c7-65ff-25e7-425cfd8815b6@FreeBSD.org> <1fd2b47b-b559-69f8-7e39-665f0f599c8f@FreeBSD.org> <20180404174949.GA12271@raichu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Content-Disposition: INLINE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2018 17:33:38 -0000 On 4 Apr, Mark Johnston wrote: > On Tue, Apr 03, 2018 at 09:42:48PM -0700, Don Lewis wrote: >> On 3 Apr, Don Lewis wrote: >> > I reconfigured my Ryzen box to be more similar to my default package >> > builder by disabling SMT and half of the RAM, to limit it to 8 cores >> > and 32 GB and then started bisecting to try to track down the problem. >> > For each test, I first filled ARC by tarring /usr/ports/distfiles to >> > /dev/null. The commit range that I was searching was r329844 to >> > r331716. I narrowed the range to r329844 to r329904. With r329904 >> > and newer, ARC is totally unresponsive to memory pressure and the >> > machine pages heavily. I see ARC sizes of 28-29GB and 30GB of wired >> > RAM, so there is not much leftover for getting useful work done. Active >> > memory and free memory both hover under 1GB each. Looking at the >> > commit logs over this range, the most likely culprit is: >> > >> > r329882 | jeff | 2018-02-23 14:51:51 -0800 (Fri, 23 Feb 2018) | 13 lines >> > >> > Add a generic Proportional Integral Derivative (PID) controller algorithm and >> > use it to regulate page daemon output. >> > >> > This provides much smoother and more responsive page daemon output, anticipating >> > demand and avoiding pageout stalls by increasing the number of pages to match >> > the workload. This is a reimplementation of work done by myself and mlaier at >> > Isilon. >> > >> > >> > It is quite possible that the recent fixes to the PID controller will >> > fix the problem. Not that r329844 was trouble free ... I left tar >> > running over lunchtime to fill ARC and the OOM killer nuked top, tar, >> > ntpd, both of my ssh sessions into the machine, and multiple instances >> > of getty while I was away. I was able to log in again and successfully >> > run poudriere, and ARC did respond to the memory pressure and cranked >> > itself down to about 5 GB by the end of the run. I did not see the same >> > problem with tar when I did the same with r329904. >> >> I just tried r331966 and see no improvement. No OOM process kills >> during the tar run to fill ARC, but with ARC filled, the machine is >> thrashing itself at the start of the poudriere run while trying to build >> ports-mgmt/pkg (39 minutes so far). ARC appears to be unresponsive to >> memory demand. I've seen no decrease in ARC size or wired memory since >> starting poudriere. > > Re-reading the ARC reclaim code, I see a couple of issues which might be > at the root of the behaviour you're seeing. > > 1. zfs_arc_free_target is too low now. It is initialized to the page > daemon wakeup threshold, which is slightly above v_free_min. With the > PID controller, the page daemon uses a setpoint of v_free_target. > Moreover, it now wakes up regularly rather than having wakeups be > synchronized by a mutex, so it will respond quickly if the free page > count dips below v_free_target. The free page count will dip below > zfs_arc_free_target only in the face of sudden and extreme memory > pressure now, so the FMT_LOTSFREE case probably isn't getting > exercised. Try initializing zfs_arc_free_target to v_free_target. > > 2. In the inactive queue scan, we used to compute the shortage after > running uma_reclaim() and the lowmem handlers (which includes a > synchronous call to arc_lowmem()). Now it's computed before, so we're > not taking into account the pages that get freed by the ARC and UMA. > The following rather hacky patch may help. I note that the lowmem > logic is now somewhat broken when multiple NUMA domains are > configured, however, since it fires only when domain 0 has a free > page shortage. > > Index: sys/vm/vm_pageout.c > =================================================================== > --- sys/vm/vm_pageout.c (revision 331933) > +++ sys/vm/vm_pageout.c (working copy) > @@ -1114,25 +1114,6 @@ > boolean_t queue_locked; > > /* > - * If we need to reclaim memory ask kernel caches to return > - * some. We rate limit to avoid thrashing. > - */ > - if (vmd == VM_DOMAIN(0) && pass > 0 && > - (time_uptime - lowmem_uptime) >= lowmem_period) { > - /* > - * Decrease registered cache sizes. > - */ > - SDT_PROBE0(vm, , , vm__lowmem_scan); > - EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES); > - /* > - * We do this explicitly after the caches have been > - * drained above. > - */ > - uma_reclaim(); > - lowmem_uptime = time_uptime; > - } > - > - /* > * The addl_page_shortage is the number of temporarily > * stuck pages in the inactive queue. In other words, the > * number of pages from the inactive count that should be > @@ -1824,6 +1805,26 @@ > atomic_store_int(&vmd->vmd_pageout_wanted, 1); > > /* > + * If we need to reclaim memory ask kernel caches to return > + * some. We rate limit to avoid thrashing. > + */ > + if (vmd == VM_DOMAIN(0) && > + vmd->vmd_free_count < vmd->vmd_free_target && > + (time_uptime - lowmem_uptime) >= lowmem_period) { > + /* > + * Decrease registered cache sizes. > + */ > + SDT_PROBE0(vm, , , vm__lowmem_scan); > + EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES); > + /* > + * We do this explicitly after the caches have been > + * drained above. > + */ > + uma_reclaim(); > + lowmem_uptime = time_uptime; > + } > + > + /* > * Use the controller to calculate how many pages to free in > * this interval. > */