From owner-freebsd-current@freebsd.org Sun Apr 4 19:01:54 2021 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B8F565B0D2F for ; Sun, 4 Apr 2021 19:01:54 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FD3754jcZz3q2C for ; Sun, 4 Apr 2021 19:01:52 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (v-critter.freebsd.dk [192.168.55.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by phk.freebsd.dk (Postfix) with ESMTPS id 435D189287; Sun, 4 Apr 2021 19:01:45 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.16.1/8.16.1) with ESMTPS id 134J1im4011449 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sun, 4 Apr 2021 19:01:44 GMT (envelope-from phk@critter.freebsd.dk) Received: (from phk@localhost) by critter.freebsd.dk (8.16.1/8.16.1/Submit) id 134J1il6011448; Sun, 4 Apr 2021 19:01:44 GMT (envelope-from phk) To: Konstantin Belousov cc: Warner Losh , Mateusz Guzik , FreeBSD CURRENT Subject: Re: [SOLVED] Re: Strange behavior after running under high load In-reply-to: From: "Poul-Henning Kamp" References: <58bea0f0-5c3d-4263-ebee-f939a7e169e9@freebsd.org> <494d4aab-487b-83c9-03f3-10cf470081c5@freebsd.org> <81671.1617432659@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <11446.1617562904.1@critter.freebsd.dk> Content-Transfer-Encoding: quoted-printable Date: Sun, 04 Apr 2021 19:01:44 +0000 Message-ID: <11447.1617562904@critter.freebsd.dk> X-Rspamd-Queue-Id: 4FD3754jcZz3q2C X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of phk@critter.freebsd.dk designates 130.225.244.222 as permitted sender) smtp.mailfrom=phk@critter.freebsd.dk X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_TLS_ALL(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[130.225.244.222:from]; FREEFALL_USER(0.00)[phk]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.dk]; ARC_NA(0.00)[]; SPAMHAUS_ZRD(0.00)[130.225.244.222:from:127.0.2.255]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FORGED_SENDER(0.30)[phk@phk.freebsd.dk,phk@critter.freebsd.dk]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:1835, ipnet:130.225.0.0/16, country:EU]; FROM_NEQ_ENVFROM(0.00)[phk@phk.freebsd.dk,phk@critter.freebsd.dk]; MAILMAN_DEST(0.00)[freebsd-current]; FREEMAIL_CC(0.00)[bsdimp.com,gmail.com,freebsd.org] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Apr 2021 19:01:54 -0000 -------- Konstantin Belousov writes: > But what would you provide as the input for PID controller, and what wou= ld be the targets? Viewing this purely as a vnode related issue is wrong, this is about memor= y allocation in general. We may or may not want a PID regulator, but putting it on counts of vnode = would not improve things, precisely, as you point out, because the amount = of memory a vnode ties up has enormous variance. We should focus on the end goal: To ensure "sufficient" memory can always = be allocated for any purpose "without major delay". Architecturally there are three major problems: A) While each subsystem generally have a good idea about memory that can b= e released "without major delay", the information does not trickle up thro= ugh a summarizing NUMA aware tree. B) We lack a nuanced call-back to tell the subsystems to release some of t= heir memory "without major delay". C) We have never attempted to enlist userland, where jemalloc often hang o= n to a lot of unused VM pages. As far as vnodes go: It used to be that "without major delay" meant "without disk-I/O" which ag= ain led to the "dirty buffers/VM pages" heuristic. With microsecond SSD backing store, that heuristic is not only invalid, it= is down-right harmful in many cases. GEOM maintains estimates of per-provider latency and VM+VFS should use tha= t to schedule write-back so that more of it happens outside rush-hour, in = order to increase the amount of memory which can be released "without majo= r delay". Today that happens largely as a side effect of the periodic syncer, which = does a really bad job at it, because it still expects VAX-era hardware per= formance and workloads. -- = Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe = Never attribute to malice what can adequately be explained by incompetence= .