From owner-freebsd-current@freebsd.org Fri Apr 6 00:47:29 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59899F9087D for ; Fri, 6 Apr 2018 00:47:29 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-wr0-x231.google.com (mail-wr0-x231.google.com [IPv6:2a00:1450:400c:c0c::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C84647A4B2; Fri, 6 Apr 2018 00:47:28 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: by mail-wr0-x231.google.com with SMTP id m13so31221452wrj.5; Thu, 05 Apr 2018 17:47:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rYiXb80W16pYrvOLN/hTbJFYn+/CB6iqhvu2PsouC4U=; b=kGowZHgtsSS9ZKKzbeTgfyGX41esv9PtOQK2ZmziVuNMyaYGhkZgFpiBA7lp2RdhrM ob7nOn6zxGRYbpu0AwjMJIWyMDljPLLdAE/sY5WC9zvnuOxJJ1xQkhoS7jF/wGVawK5h apenVgRtTFzWiFPJp11JgHaSzmOdSd9eqsJrUkzr6xMpvdp0c2Oc4PbxkTmQuppVEwXI 5JU49KESk6KSbL5Xv3NkDN/fUDyIvxepLhaUdRKeKoZZOpcWHiNfwdD13+M552j2R9xi fucKcsJtpxFA7FE/ofJmS6J4pg0GQn6Q2oORCMT2Q1XjhbcR+zyM3YwbCJJQCRxkxi5j Yv9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rYiXb80W16pYrvOLN/hTbJFYn+/CB6iqhvu2PsouC4U=; b=F24zUDL5cArzEPI+5jxEXjB8oB30Tit/V5DFnDtN9oJXk4YWiquVsnvjtb4SVgLWPI Hi9oZ9+ZbxYsCHpuWczEQ4ynozd8KFO3z5l4RaO2HIajgbYbVEWO3P/5zv9bdCa+kePu tBE1YOJJMCv5v6F8ami14q8UTae7cC4cNCA5eT+PCtTHVFHwVNdmd5jwyfYTY3MHGe8I Dcg4xgfBUOv22S6o6bfQ8+Zc48Jw2gkMzIWJFdJlptQ3JspkTD+c8vVqTKCuhVUZpVck /GrgcSo+6qLwmyD5g3LSCBxo15r1IyCyC+ojMyjwQha6/qa0WhSbyBLHZUqDGy2WeHPi kBvg== X-Gm-Message-State: ALQs6tC0xm/XSrgMegdJa0pZONIwMk3DwdKsuUUjQHmWiHiafI0Acfbw SjDuj5xlpfGw0OGqpdtz0c2ShvPl8aJXqnOiO4Y= X-Google-Smtp-Source: AIpwx48cLKYS/YG4RQ8kNCRfntx+XvfrSjirPpfRnTZcMUZz4TVTp5/kokDAaTo6MGi6N0wqRe5Ijz3Tyetvtv3Sq4s= X-Received: by 2002:a19:1c0f:: with SMTP id c15-v6mr14439862lfc.44.1522975645828; Thu, 05 Apr 2018 17:47:25 -0700 (PDT) MIME-Version: 1.0 References: <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> <20180320070745.GA12880@server.rulingia.com> <2b3db2af-03c7-65ff-25e7-425cfd8815b6@FreeBSD.org> <1fd2b47b-b559-69f8-7e39-665f0f599c8f@FreeBSD.org> <20180404174949.GA12271@raichu> In-Reply-To: From: Justin Hibbits Date: Fri, 06 Apr 2018 00:47:14 +0000 Message-ID: Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT To: Don Lewis Cc: Mark Johnston , Andriy Gapon , Bryan Drewery , Peter Jeremy , Jeff Roberson , FreeBSD current Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2018 00:47:29 -0000 On Wed, Apr 4, 2018, 13:20 Don Lewis wrote: > On 4 Apr, Mark Johnston wrote: > > On Tue, Apr 03, 2018 at 09:42:48PM -0700, Don Lewis wrote: > >> On 3 Apr, Don Lewis wrote: > >> > I reconfigured my Ryzen box to be more similar to my default package > >> > builder by disabling SMT and half of the RAM, to limit it to 8 cores > >> > and 32 GB and then started bisecting to try to track down the problem. > >> > For each test, I first filled ARC by tarring /usr/ports/distfiles to > >> > /dev/null. The commit range that I was searching was r329844 to > >> > r331716. I narrowed the range to r329844 to r329904. With r329904 > >> > and newer, ARC is totally unresponsive to memory pressure and the > >> > machine pages heavily. I see ARC sizes of 28-29GB and 30GB of wired > >> > RAM, so there is not much leftover for getting useful work done. > Active > >> > memory and free memory both hover under 1GB each. Looking at the > >> > commit logs over this range, the most likely culprit is: > >> > > >> > r329882 | jeff | 2018-02-23 14:51:51 -0800 (Fri, 23 Feb 2018) | 13 > lines > >> > > >> > Add a generic Proportional Integral Derivative (PID) controller > algorithm and > >> > use it to regulate page daemon output. > >> > > >> > This provides much smoother and more responsive page daemon output, > anticipating > >> > demand and avoiding pageout stalls by increasing the number of pages > to match > >> > the workload. This is a reimplementation of work done by myself and > mlaier at > >> > Isilon. > >> > > >> > > >> > It is quite possible that the recent fixes to the PID controller will > >> > fix the problem. Not that r329844 was trouble free ... I left tar > >> > running over lunchtime to fill ARC and the OOM killer nuked top, tar, > >> > ntpd, both of my ssh sessions into the machine, and multiple instances > >> > of getty while I was away. I was able to log in again and > successfully > >> > run poudriere, and ARC did respond to the memory pressure and cranked > >> > itself down to about 5 GB by the end of the run. I did not see the > same > >> > problem with tar when I did the same with r329904. > >> > >> I just tried r331966 and see no improvement. No OOM process kills > >> during the tar run to fill ARC, but with ARC filled, the machine is > >> thrashing itself at the start of the poudriere run while trying to build > >> ports-mgmt/pkg (39 minutes so far). ARC appears to be unresponsive to > >> memory demand. I've seen no decrease in ARC size or wired memory since > >> starting poudriere. > > > > Re-reading the ARC reclaim code, I see a couple of issues which might be > > at the root of the behaviour you're seeing. > > > > 1. zfs_arc_free_target is too low now. It is initialized to the page > > daemon wakeup threshold, which is slightly above v_free_min. With the > > PID controller, the page daemon uses a setpoint of v_free_target. > > Moreover, it now wakes up regularly rather than having wakeups be > > synchronized by a mutex, so it will respond quickly if the free page > > count dips below v_free_target. The free page count will dip below > > zfs_arc_free_target only in the face of sudden and extreme memory > > pressure now, so the FMT_LOTSFREE case probably isn't getting > > exercised. Try initializing zfs_arc_free_target to v_free_target. > > Changing zfs_arc_free_target definitely helps. My previous poudriere > run failed when poudriere timed out the ports-mgmt/pkg build after two > hours. After changing this setting, poudriere seems to be running > properly and ARC has dropped from 29GB to 26GB ten minutes into the run > and I'm not seeing processes in the swread state. > > > 2. In the inactive queue scan, we used to compute the shortage after > > running uma_reclaim() and the lowmem handlers (which includes a > > synchronous call to arc_lowmem()). Now it's computed before, so we're > > not taking into account the pages that get freed by the ARC and UMA. > > The following rather hacky patch may help. I note that the lowmem > > logic is now somewhat broken when multiple NUMA domains are > > configured, however, since it fires only when domain 0 has a free > > page shortage. > > I will try this next. > > > Index: sys/vm/vm_pageout.c > > =================================================================== > > --- sys/vm/vm_pageout.c (revision 331933) > > +++ sys/vm/vm_pageout.c (working copy) > > @@ -1114,25 +1114,6 @@ > > boolean_t queue_locked; > > > > /* > > - * If we need to reclaim memory ask kernel caches to return > > - * some. We rate limit to avoid thrashing. > > - */ > > - if (vmd == VM_DOMAIN(0) && pass > 0 && > > - (time_uptime - lowmem_uptime) >= lowmem_period) { > > - /* > > - * Decrease registered cache sizes. > > - */ > > - SDT_PROBE0(vm, , , vm__lowmem_scan); > > - EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES); > > - /* > > - * We do this explicitly after the caches have been > > - * drained above. > > - */ > > - uma_reclaim(); > > - lowmem_uptime = time_uptime; > > - } > > - > > - /* > > * The addl_page_shortage is the number of temporarily > > * stuck pages in the inactive queue. In other words, the > > * number of pages from the inactive count that should be > > @@ -1824,6 +1805,26 @@ > > atomic_store_int(&vmd->vmd_pageout_wanted, 1); > > > > /* > > + * If we need to reclaim memory ask kernel caches to return > > + * some. We rate limit to avoid thrashing. > > + */ > > + if (vmd == VM_DOMAIN(0) && > > + vmd->vmd_free_count < vmd->vmd_free_target && > > + (time_uptime - lowmem_uptime) >= lowmem_period) { > > + /* > > + * Decrease registered cache sizes. > > + */ > > + SDT_PROBE0(vm, , , vm__lowmem_scan); > > + EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES); > > + /* > > + * We do this explicitly after the caches have been > > + * drained above. > > + */ > > + uma_reclaim(); > > + lowmem_uptime = time_uptime; > > + } > > + > > + /* > > * Use the controller to calculate how many pages to free > in > > * this interval. > > */ > My powerpc64 embedded machine is virtually unusable since these vm changes. I tried setting vfs.zfs.arc_free_target as suggested, and that didn't help at all. Eventually the machine hangs and just gets stuck in vmdaemon, with many processes in wait channel btalloc. - Justin >