From owner-freebsd-current@freebsd.org Sun Apr 4 20:01:46 2021 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A82335B2C68 for ; Sun, 4 Apr 2021 20:01:46 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FD4S95S1pz3tcM for ; Sun, 4 Apr 2021 20:01:45 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (v-critter.freebsd.dk [192.168.55.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by phk.freebsd.dk (Postfix) with ESMTPS id DEE3789287; Sun, 4 Apr 2021 20:01:43 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.16.1/8.16.1) with ESMTPS id 134K1haD011599 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sun, 4 Apr 2021 20:01:43 GMT (envelope-from phk@critter.freebsd.dk) Received: (from phk@localhost) by critter.freebsd.dk (8.16.1/8.16.1/Submit) id 134K1h3V011598; Sun, 4 Apr 2021 20:01:43 GMT (envelope-from phk) To: Konstantin Belousov cc: Warner Losh , Mateusz Guzik , FreeBSD CURRENT Subject: Re: [SOLVED] Re: Strange behavior after running under high load In-reply-to: From: "Poul-Henning Kamp" References: <58bea0f0-5c3d-4263-ebee-f939a7e169e9@freebsd.org> <494d4aab-487b-83c9-03f3-10cf470081c5@freebsd.org> <81671.1617432659@critter.freebsd.dk> <11447.1617562904@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <11596.1617566503.1@critter.freebsd.dk> Content-Transfer-Encoding: quoted-printable Date: Sun, 04 Apr 2021 20:01:43 +0000 Message-ID: <11597.1617566503@critter.freebsd.dk> X-Rspamd-Queue-Id: 4FD4S95S1pz3tcM X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of phk@critter.freebsd.dk designates 130.225.244.222 as permitted sender) smtp.mailfrom=phk@critter.freebsd.dk X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_TLS_ALL(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[130.225.244.222:from]; FREEFALL_USER(0.00)[phk]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.dk]; ARC_NA(0.00)[]; SPAMHAUS_ZRD(0.00)[130.225.244.222:from:127.0.2.255]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FORGED_SENDER(0.30)[phk@phk.freebsd.dk,phk@critter.freebsd.dk]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:1835, ipnet:130.225.0.0/16, country:EU]; FROM_NEQ_ENVFROM(0.00)[phk@phk.freebsd.dk,phk@critter.freebsd.dk]; MAILMAN_DEST(0.00)[freebsd-current]; FREEMAIL_CC(0.00)[bsdimp.com,gmail.com,freebsd.org] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Apr 2021 20:01:46 -0000 -------- Konstantin Belousov writes: > > B) We lack a nuanced call-back to tell the subsystems to release some = of their memory "without major delay". > The delay in the wall clock sense does not drive the issue. I didnt say anything about "wall clock" and you're missing my point by a w= ide margin. We need to make major memory consumers, like vnodes take action *before* s= hortages happen, so that *when* they happen, a lot of memory can be releas= ed to relive them. > We cannot expect any io to proceed while we are low on memory [...] Which is precisely why the top level goal should be for that to never happ= en, while still allowing the freeable" memory to be used as a cache as muc= h as possible. > > C) We have never attempted to enlist userland, where jemalloc often ha= ng on to a lot of unused VM pages. > > = > The userland does not add to this problem, [...] No, but userland can help solve it: The unused pages from jemalloc/userla= nd can very quickly be released to relieve any imminent shortage the kerne= l might have. As can pages from vnodes, and for that matter socket buffers. But there are always costs, actual costs, ie: what it will take to release= the memory (locking, VM mappings, washing) and potential costs (lack of f= uture caching opportunities). These costs need to be presented to the central memory allocator, so when = it decides back-pressure is appropriate, it can decide who to punk for how= much memory. > But normally operating system does not have an issue with user pages. = Only if you disregard all non-UNIX operating systems. Many other kernels have cooperated with userland to balance memory (and fo= r that matter disk-space). Just imagine how much better the desktop experience would be, if we could = send SIGVM to firefox to tell it stop being a memory-pig. (At least two of the major operating systems in the desktop world does som= ething like that today.) > Io latency is not the factor there. We must avoid situations where > instantiating a vnode stalls waiting for KVA to appear, similarly we > must avoid system state where vnodes allocation consumed so much kmem > that other allocations stall. My argument is the precise opposite: We must make vnodes and the allocati= ons they cause responsive to the sytems overall memory availability, well = in advance of the shortage happening in the first place. > Quite indicative is that we do not shrink the vnode list on low memory > events. Vnlru also does not account for the memory pressure. The only reason we do not, is that we cannot tell definitively if freeing = a vnode will cause disk-I/O (which may not matter with SSD's) or even how = much memory it might free, if anything. -- = Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe = Never attribute to malice what can adequately be explained by incompetence= .