From owner-freebsd-current@freebsd.org Wed Apr 4 17:49:56 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B7A6AF9FE77 for ; Wed, 4 Apr 2018 17:49:56 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-it0-x22d.google.com (mail-it0-x22d.google.com [IPv6:2607:f8b0:4001:c0b::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4811768096; Wed, 4 Apr 2018 17:49:56 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-it0-x22d.google.com with SMTP id 71-v6so26675102ith.2; Wed, 04 Apr 2018 10:49:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=egaAu1voqDj+KOP75mHYmkGgANUhZqoenNto96mPgBE=; b=ktLOQcQUJG2uLrPWVCqPerLtu7Zyth/AK5SCZgy6a+70DuPJQ0ckWxnS55EhNoFYdB LD9GFIBVsS59w36pzAkWhC6IG2KOY86qO1xw4NDxazHkX1EUsVfrgh0rLyNOkVc6aySb e1n8AdgSM2CgxHxY/Gg5+C1caz/f5osUzQo2EvSys+iNrfDNRPxRdIwp9pHXelpdgnI9 EaUvwHaeeaat7II4J9k89Y07BG2bZvBZusZ/hYCSLMO9jOyXMwApKjhsg0Moukkh5hEW B9RKcL+8HD2RGSajjlwBprcSaV7DLoGfzUTTH6KxB6smw+S5XQxtmwwUKCOuODeDfVq9 996w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=egaAu1voqDj+KOP75mHYmkGgANUhZqoenNto96mPgBE=; b=YV4SICX/MpWrQ8xFcaxcDAqCprN/oJxOxRqVJxI9S87Hdr54QErSFFqf8+fDj/zQ2q 0ezZ2z3ZdKz0r76ndjSdIoHzOnU8xB9uPNC84gyzBsEopAn3lu4js2D5VbVBOwfsmgQ8 H9AsUYhX97oF6YtrGmqryiE5Sl7qi7M3o5A4hBA/hemIkqtownGvfy/XF6UQuXcolYhg eowP73X2kTpETkLxlmWPUiHAWo8UJx/c65WZgu1TMfz49s9FyC3I5uOUObH1KO1Pbn12 Asso84tEIi/1yi15Yu7FSGjPBRDYHxqfLFHAgFf3+B+XdyxRWeNF12ExSrMz5+PzQi+B TaZw== X-Gm-Message-State: ALQs6tC2PvuT/rRKOtrDCO1Kn4B5PjSYzgQxRVElX0mZVxjAxapwjIzv nDqHpQcyy5j/WJ8e7LqiXUvdAA== X-Google-Smtp-Source: AIpwx49MwKWOs+IkxfCpkPV5wSewAD9qvO3tBvGdQvq5Bya24YP396ljxjMA/y2VsAK2BK17od4uMA== X-Received: by 2002:a24:d8:: with SMTP id 207-v6mr10205701ita.3.1522864195260; Wed, 04 Apr 2018 10:49:55 -0700 (PDT) Received: from raichu (toroon0560w-lp130-04-184-145-252-74.dsl.bell.ca. [184.145.252.74]) by smtp.gmail.com with ESMTPSA id k4-v6sm2662324ith.4.2018.04.04.10.49.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Apr 2018 10:49:54 -0700 (PDT) Sender: Mark Johnston Date: Wed, 4 Apr 2018 13:49:49 -0400 From: Mark Johnston To: Don Lewis Cc: Andriy Gapon , Bryan Drewery , Peter Jeremy , Jeff Roberson , FreeBSD current Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT Message-ID: <20180404174949.GA12271@raichu> References: <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> <20180320070745.GA12880@server.rulingia.com> <2b3db2af-03c7-65ff-25e7-425cfd8815b6@FreeBSD.org> <1fd2b47b-b559-69f8-7e39-665f0f599c8f@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Apr 2018 17:49:57 -0000 On Tue, Apr 03, 2018 at 09:42:48PM -0700, Don Lewis wrote: > On 3 Apr, Don Lewis wrote: > > I reconfigured my Ryzen box to be more similar to my default package > > builder by disabling SMT and half of the RAM, to limit it to 8 cores > > and 32 GB and then started bisecting to try to track down the problem. > > For each test, I first filled ARC by tarring /usr/ports/distfiles to > > /dev/null. The commit range that I was searching was r329844 to > > r331716. I narrowed the range to r329844 to r329904. With r329904 > > and newer, ARC is totally unresponsive to memory pressure and the > > machine pages heavily. I see ARC sizes of 28-29GB and 30GB of wired > > RAM, so there is not much leftover for getting useful work done. Active > > memory and free memory both hover under 1GB each. Looking at the > > commit logs over this range, the most likely culprit is: > > > > r329882 | jeff | 2018-02-23 14:51:51 -0800 (Fri, 23 Feb 2018) | 13 lines > > > > Add a generic Proportional Integral Derivative (PID) controller algorithm and > > use it to regulate page daemon output. > > > > This provides much smoother and more responsive page daemon output, anticipating > > demand and avoiding pageout stalls by increasing the number of pages to match > > the workload. This is a reimplementation of work done by myself and mlaier at > > Isilon. > > > > > > It is quite possible that the recent fixes to the PID controller will > > fix the problem. Not that r329844 was trouble free ... I left tar > > running over lunchtime to fill ARC and the OOM killer nuked top, tar, > > ntpd, both of my ssh sessions into the machine, and multiple instances > > of getty while I was away. I was able to log in again and successfully > > run poudriere, and ARC did respond to the memory pressure and cranked > > itself down to about 5 GB by the end of the run. I did not see the same > > problem with tar when I did the same with r329904. > > I just tried r331966 and see no improvement. No OOM process kills > during the tar run to fill ARC, but with ARC filled, the machine is > thrashing itself at the start of the poudriere run while trying to build > ports-mgmt/pkg (39 minutes so far). ARC appears to be unresponsive to > memory demand. I've seen no decrease in ARC size or wired memory since > starting poudriere. Re-reading the ARC reclaim code, I see a couple of issues which might be at the root of the behaviour you're seeing. 1. zfs_arc_free_target is too low now. It is initialized to the page daemon wakeup threshold, which is slightly above v_free_min. With the PID controller, the page daemon uses a setpoint of v_free_target. Moreover, it now wakes up regularly rather than having wakeups be synchronized by a mutex, so it will respond quickly if the free page count dips below v_free_target. The free page count will dip below zfs_arc_free_target only in the face of sudden and extreme memory pressure now, so the FMT_LOTSFREE case probably isn't getting exercised. Try initializing zfs_arc_free_target to v_free_target. 2. In the inactive queue scan, we used to compute the shortage after running uma_reclaim() and the lowmem handlers (which includes a synchronous call to arc_lowmem()). Now it's computed before, so we're not taking into account the pages that get freed by the ARC and UMA. The following rather hacky patch may help. I note that the lowmem logic is now somewhat broken when multiple NUMA domains are configured, however, since it fires only when domain 0 has a free page shortage. Index: sys/vm/vm_pageout.c =================================================================== --- sys/vm/vm_pageout.c (revision 331933) +++ sys/vm/vm_pageout.c (working copy) @@ -1114,25 +1114,6 @@ boolean_t queue_locked; /* - * If we need to reclaim memory ask kernel caches to return - * some. We rate limit to avoid thrashing. - */ - if (vmd == VM_DOMAIN(0) && pass > 0 && - (time_uptime - lowmem_uptime) >= lowmem_period) { - /* - * Decrease registered cache sizes. - */ - SDT_PROBE0(vm, , , vm__lowmem_scan); - EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES); - /* - * We do this explicitly after the caches have been - * drained above. - */ - uma_reclaim(); - lowmem_uptime = time_uptime; - } - - /* * The addl_page_shortage is the number of temporarily * stuck pages in the inactive queue. In other words, the * number of pages from the inactive count that should be @@ -1824,6 +1805,26 @@ atomic_store_int(&vmd->vmd_pageout_wanted, 1); /* + * If we need to reclaim memory ask kernel caches to return + * some. We rate limit to avoid thrashing. + */ + if (vmd == VM_DOMAIN(0) && + vmd->vmd_free_count < vmd->vmd_free_target && + (time_uptime - lowmem_uptime) >= lowmem_period) { + /* + * Decrease registered cache sizes. + */ + SDT_PROBE0(vm, , , vm__lowmem_scan); + EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES); + /* + * We do this explicitly after the caches have been + * drained above. + */ + uma_reclaim(); + lowmem_uptime = time_uptime; + } + + /* * Use the controller to calculate how many pages to free in * this interval. */