From nobody Wed Mar 30 13:07:48 2022 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6A2DB1A494D5 for ; Wed, 30 Mar 2022 13:18:39 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KT6Ss6NR6z4jxb for ; Wed, 30 Mar 2022 13:18:37 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.org [185.220.148.12]) by uucp.dinoex.org (8.17.1/8.17.1) with ESMTPS id 22UDI4a0066913 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Wed, 30 Mar 2022 15:18:05 +0200 (CEST) (envelope-from pmc@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: X-Authentication-Warning: uucp.dinoex.sub.de: Host uucp.dinoex.org [185.220.148.12] claimed to be uucp.dinoex.sub.de Received: (from uucp@localhost) by uucp.dinoex.sub.de (8.17.1/8.17.1/Submit) with UUCP id 22UDI4Kv066912 for freebsd-stable@freebsd.org; Wed, 30 Mar 2022 15:18:04 +0200 (CEST) (envelope-from pmc@citylink.dinoex.sub.org) Received: from gate.intra.daemon.contact (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.16.1/8.16.1) with ESMTP id 22UD7w1T046190 for ; Wed, 30 Mar 2022 15:07:58 +0200 (CEST) (envelope-from peter@gate.intra.daemon.contact) Received: from gate.intra.daemon.contact (gate-e [192.168.98.2]) by gate.intra.daemon.contact (8.16.1/8.16.1) with ESMTPS id 22UD7m6A046160 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Wed, 30 Mar 2022 15:07:48 +0200 (CEST) (envelope-from peter@gate.intra.daemon.contact) Received: (from peter@localhost) by gate.intra.daemon.contact (8.16.1/8.16.1/Submit) id 22UD7mho046159 for freebsd-stable@freebsd.org; Wed, 30 Mar 2022 15:07:48 +0200 (CEST) (envelope-from peter) Date: Wed, 30 Mar 2022 15:07:48 +0200 From: Peter To: freebsd-stable@freebsd.org Subject: stable/13: ARC no longer self-tuning? Message-ID: List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 185.220.148.12; Sender-helo: uucp.dinoex.sub.de;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [185.220.148.12]); Wed, 30 Mar 2022 15:18:07 +0200 (CEST) X-Rspamd-Queue-Id: 4KT6Ss6NR6z4jxb X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of pmc@citylink.dinoex.sub.org designates 2a0b:f840::12 as permitted sender) smtp.mailfrom=pmc@citylink.dinoex.sub.org X-Spamd-Result: default: False [-2.30 / 15.00]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; SH_EMAIL_DBL_DONT_QUERY_IPS(0.00)[0.0.0.0:email]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCVD_COUNT_THREE(0.00)[4]; TO_DN_NONE(0.00)[]; DBL_PROHIBIT(0.00)[0.0.0.0:email]; NEURAL_HAM_SHORT(-1.00)[-0.999]; DMARC_NA(0.00)[sub.org]; MLMMJ_DEST(0.00)[freebsd-stable]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:205376, ipnet:2a0b:f840::/32, country:DE]; SUBJECT_ENDS_QUESTION(1.00)[] X-ThisMailContainsUnwantedMimeParts: N Hi, while up to Rel 12 the ZFS ARC adjusted it's size to the demand, in Rel. 13 it appears to be locked to a fixed minimum of 100M compressed. Consequentially I just got a machine stall/freeze under moderate load: no cmdline reaction (except in the guests), no login possible, all processes in "D" state. Reset button needed, all guests and jails destroyed: 38378 - DJ 0:03.36 find -sx / /ext /var /usr/local /usr/ports /usr/obj 39414 - DJ 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail 39415 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39416 - DJ 0:00.00 /usr/local/www/cgit/cgit.cgi 39417 - D< 0:00.00 /usr/local/bin/ruby /ext/libexec/heatctl.rb (ruby27) 39418 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39419 - DJ 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail 39420 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39421 - DJ 0:00.00 sendmail: accepting connections (sendmail) 39426 - D 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail 39427 - D 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39428 - DJ 0:00.00 sendmail: Queue runner@00:03:00 for /var/spool/clien 39429 - DJ 0:00.00 sendmail: accepting connections (sendmail) 39430 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39465 - Ds 0:00.01 newsyslog 39466 - Ds 0:00.01 /bin/sh /usr/libexec/save-entropy 59365 - DsJ 0:00.09 /usr/sbin/cron -s "top", apparently the only process still running, shows this: last pid: 39657; load averages: 0.27, 1.24, 4.55 up 0+04:05:42 04:11:54 805 processes: 1 running, 804 sleeping CPU: 0.1% user, 0.0% nice, 0.9% system, 0.0% interrupt, 99.0% idle Mem: 16G Active, 5118M Inact, 1985M Laundry, 7144M Wired, 462M Buf, 905M Free ARC: 1417M Total, 326M MFU, 347M MRU, 8216K Anon, 30M Header, 706M Other 119M Compressed, 546M Uncompressed, 4.57:1 Ratio Swap: 36G Total, 995M Used, 35G Free, 2% Inuse, 76K In This is different to 12.3: there I would expect the ARC near 6G, wired near 11G, and swap near 5G. Last message in the log was 20 minutes earlier: Mar 30 03:45:17 edge ntpd[7768]: no peer for too long, server running free now So, strangely, networking has also stalled. I thought networking uses other device drivers separate from the disk drivers? The effect appeared slowly, machine became increasingly unresponsive and laggy (in all regards of I/O) during the "periodic daily". First night it runs find over a million files in all jails, as these are not yet in l2arc. Apparently this killed it: It might be related to the periodic daily running find in every jail: 35944 - DJ 0:04.71 find -sx / /var /ext /usr/local /usr/obj /usr/ports 36186 - DJ 0:04.75 find -sx / /var /usr/local /usr/obj /usr/ports /dev/ 37599 - DJ 0:04.14 find -sx / /var /ext /usr/local /ext/rapp /usr/ports 38378 - DJ 0:03.36 find -sx / /ext /var /usr/local /usr/ports /usr/obj ... This would need a *lot* of inodes, and the arc seems quite small for that. I've not seen such behaviour before - I had ZFS running in ~2007 with 384 MB ram installed; now here are 32G (which I wouldn't have bought, got them by accident), and that doesn't work well. The ARC is configured in loader.conf: # kenv vfs.zfs.arc_max="10240M" vfs.zfs.arc_min="1024M" However, sysctl shows: vfs.zfs.arc.max: 10737418240 vfs.zfs.arc.min: 0 Observing the behaviour, ARC wants to stay at or even below 1G: last pid: 38718; load averages: 2.12, 2.93, 2.88 up 0+01:09:08 05:30:25 625 processes: 1 running, 624 sleeping CPU: 0.0% user, 0.1% nice, 6.3% system, 0.0% interrupt, 93.6% idle Mem: 12G Active, 1433M Inact, 9987M Wired, 50M Buf, 8237M Free ARC: 749M Total, 116M MFU, 254M MRU, 2457K Anon, 42M Header, 334M Other 84M Compressed, 396M Uncompressed, 4.70:1 Ratio Swap: 36G Total, 36G Free There are 3 bhyve with 16G + 7G + 2G, these naturally create much dirty memory. The point is that these should go to swap, that's what SSD are for. The ARC only grows when there is not much activity on the system. That may be nice for desktops, but is no good for solid workload. I need it to grow against workload (which it did before, but now doesn't) and against paging (which not even appears). Do we have some new knobs to tune? This one is appears to already be zero by default: vfs.zfs.arc.grow_retry: 0 And what is this one doing? vfs.zfs.arc.p_dampener_disable=1 Do I need to read all the code? There are lots of other things that did work on 12.3 and now fail or crash, like net/dhcpcd (crashes now in libc), or mountd not understanding the zfs exports (syntax changed, doesn't match the manpage, didn't in 12.3 either, but differently), and I only have two eyes (and they don't get better with age). What would be needed for the ARC is an affinity balance: should it prefer to try and grow towards arc_max even with load (server use with well-configured arc_max), or should it shrink away as soon as some serious activity is on the system (gamers and bloated browsers use).