From owner-freebsd-current@freebsd.org Sun Apr 4 17:53:15 2021 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E0C025AEA22 for ; Sun, 4 Apr 2021 17:53:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FD1bt5kq4z3lPg for ; Sun, 4 Apr 2021 17:53:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 134HqwmP042027 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sun, 4 Apr 2021 20:53:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 134HqwmP042027 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 134HqvYo042026; Sun, 4 Apr 2021 20:52:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 4 Apr 2021 20:52:57 +0300 From: Konstantin Belousov To: Warner Losh Cc: Mateusz Guzik , Poul-Henning Kamp , FreeBSD CURRENT Subject: Re: [SOLVED] Re: Strange behavior after running under high load Message-ID: References: <58bea0f0-5c3d-4263-ebee-f939a7e169e9@freebsd.org> <494d4aab-487b-83c9-03f3-10cf470081c5@freebsd.org> <81671.1617432659@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4FD1bt5kq4z3lPg X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=gmail.com (policy=none); spf=softfail (mx1.freebsd.org: 2001:470:d5e7:1::1 is neither permitted nor denied by domain of kostikbel@gmail.com) smtp.mailfrom=kostikbel@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_TLS_ALL(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; DMARC_POLICY_SOFTFAIL(0.10)[gmail.com : No valid SPF, No valid DKIM,none]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FREEMAIL_FROM(0.00)[gmail.com]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2001:470:d5e7:1::1:from]; R_SPF_SOFTFAIL(0.00)[~all]; SPAMHAUS_ZRD(0.00)[2001:470:d5e7:1::1:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; FREEMAIL_CC(0.00)[gmail.com,phk.freebsd.dk,freebsd.org]; MAILMAN_DEST(0.00)[freebsd-current]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Apr 2021 17:53:16 -0000 On Sun, Apr 04, 2021 at 08:45:41AM -0600, Warner Losh wrote: > On Sun, Apr 4, 2021, 5:51 AM Mateusz Guzik wrote: > > > On 4/3/21, Poul-Henning Kamp wrote: > > > -------- > > > Mateusz Guzik writes: > > > > > >> It is high because of this: > > >> msleep(&vnlruproc_sig, &vnode_list_mtx, PVFS, "vlruwk", > > >> hz); > > >> > > >> i.e. it literally sleeps for 1 second. > > > > > > Before the line looked like that, it slept on "lbolt" aka "lightning > > > bolt" which was woken once a second. > > > > > > The calculations which come up with those "constants" have always > > > been utterly bogus math, not quite "square-root of shoe-size > > > times sun-angle in Patagonia", but close. > > > > > > The original heuristic came from university environments with tons of > > > students doing assignments and nethack behind VT102 terminals, on > > > filesystems where files only seldom grew past 100KB, so it made sense > > > to scale number of vnodes to how much RAM was in the system, because > > > that also scaled the size of the buffer-cache. > > > > > > With a merged VM buffer-cache, whatever validity that heuristic had > > > was lost, and we tweaked the bogomath in various ways until it > > > seemed to mostly work, trusting the users for which it did not, to > > > tweak things themselves. > > > > > > Please dont tweak the Finagle Constants again. > > > > > > Rip all that crap out and come up with something fundamentally better. > > > > > > > Some level of pacing is probably useful to control total memory use -- > > there can be A LOT of memory tied up in mere fact that vnode is fully > > cached. imo the thing to do is to come up with some watermarks to be > > revisited every 1-2 years and to change the behavior when they get > > exceeded -- try to whack some stuff but in face of trouble just go > > ahead and alloc without sleep 1. Should the load spike sort itself > > out, vnlru will slowly get things down to the watermark. If the > > watermark is too low, maybe it can autotune. Bottom line is that even > > with the current idea of limiting preferred total vnode count, the > > corner case behavior can be drastically better suffering SOME perf > > loss from recycling vnodes, but not sleeping for a second for every > > single one. > > > > I'd suggest that going directly to a PID to control this would be better > than the watermarks. That would give a smoother response than high/low > watermarks would. While you'd need some level to keep things at still, the > laundry stuff has shown the precise level of that level is less critical > than the watermarks. But what would you provide as the input for PID controller, and what would be the targets? The main reason for the (almost) hard cap on the number of vnodes is not that excessive number of vnodes is harmful by itself. Each allocated vnode typically implies existence of several second-order allocations that accumulate into significant KVA usage: - filesystem inode - vm object - namecache entries There are usually even more allocations, third-order, for instance UFS inode carries a pointer to the dinode copy in RAM, and possibly EA area. And of course, the fact that vnode names pages in the page cache owned by corresponding file, i.e. amount of allocated vnodes regulates amount of work for pagedaemon. We currently trying to put some rational limit for total number of vnodes, estimating both KVA and physical memory consumed by them. If you remove that limit, you need to ensure that we do not create OOM situation either for KVA or for physical memory just by creating too many vnodes, otherwise system cannot get out of it. So there are some combinations of machine config (RAM) and loads where default settings are arguably low. Raising the limits needs to handle the indirect resource usage from vnode. I do not know how to write the feedback formula, taking into account all the consequences of the vnode existence, and that effects depend also on the underlying filesystem and patterns of VM paging usage. In this sense ZFS is probably simplest case, because its caching subsystem is autonomous. While UFS or NFS are tightly integrated with VM. > > Warner > > I think the notion of 'struct vnode' being a separately allocated > > object is not very useful and it comes with complexity (and happens to > > suffer from several bugs). > > > > That said, the easiest and safest thing to do in the meantime is to > > bump the limit. Perhaps the sleep can be whacked as it is which would > > largely sort it out. > > > > -- > > Mateusz Guzik > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"