From nobody Tue Nov 23 11:18:58 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 921B618ACECE for ; Tue, 23 Nov 2021 11:19:10 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Hz1qd45Jhz4dK3; Tue, 23 Nov 2021 11:19:09 +0000 (UTC) (envelope-from ronald-lists@klop.ws) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=klop.ws; s=mail; h=In-Reply-To:Message-ID:From:Content-Transfer-Encoding:MIME-Version: Date:References:Subject:Cc:To:Content-Type:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=C3OwT/gc8PbWBUMEI3PH5VO+6GrrmxXvTL0TuRn+R8U=; b=mgLUoosgjztWVAXrxzI4q4Ge7L li1dnrOwKzC5myvErVt70BucMKpy41ifrT3/QfYo9kPA8i/L2LL6xz7NzVn/grV/V5AV7cKUAEWD3 ZIk6JwpMGjITz+dQx2yONcgpDbtZAhLrdXu/rNNl5uYOMAEcKcjyBaiFyAZJ00GKCeMg=; Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: "Andriy Gapon" , "Chris Ross" Cc: "Mark Johnston" , freebsd-fs Subject: Re: swap_pager: cannot allocate bio References: <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com> <6DA63618-F0E9-48EC-AB57-3C3C102BC0C0@distal.com> <35c14795-3b1c-9315-8e9b-a8dfad575a04@FreeBSD.org> <471B80F4-B8F4-4D5A-9DEB-3F1E00F42A68@distal.com> Date: Tue, 23 Nov 2021 12:18:58 +0100 List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Transfer-Encoding: Quoted-Printable Message-ID: In-Reply-To: User-Agent: Opera Mail/1.0 (Win32) X-Authenticated-As-Hash: bdb49c4ff80bd276e321aade33e76e02752072e2 X-Virus-Scanned: by clamav at smarthost1.greenhost.nl X-Spam-Level: / X-Spam-Score: -0.4 X-Spam-Status: No, score=-0.4 required=5.0 tests=ALL_TRUSTED,BAYES_50,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF autolearn=disabled version=3.4.2 X-Scan-Signature: d2b65465fd2235632e6f1ea95c39dbf6 X-Rspamd-Queue-Id: 4Hz1qd45Jhz4dK3 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=klop.ws header.s=mail header.b=mgLUoosg; dmarc=pass (policy=quarantine) header.from=klop.ws; spf=pass (mx1.freebsd.org: domain of ronald-lists@klop.ws designates 195.190.28.88 as permitted sender) smtp.mailfrom=ronald-lists@klop.ws X-Spamd-Result: default: False [-3.50 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[klop.ws:s=mail]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:195.190.28.64/27]; NEURAL_HAM_LONG(-1.00)[-1.000]; TAGGED_RCPT(0.00)[freebsd]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[klop.ws:+]; DMARC_POLICY_ALLOW(-0.50)[klop.ws,quarantine]; RCVD_IN_DNSWL_NONE(0.00)[195.190.28.88:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; MID_RHS_NOT_FQDN(0.50)[]; ASN(0.00)[asn:47172, ipnet:195.190.28.0/24, country:NL]; RWL_MAILSPIKE_POSSIBLE(0.00)[195.190.28.88:from] Reply-To: ronald-lists@klop.ws From: Ronald Klop via freebsd-fs X-Original-From: Ronald Klop X-ThisMailContainsUnwantedMimeParts: N On Sat, 20 Nov 2021 04:35:52 +0100, Chris Ross = wrote: > (Sorry that the subject on this thread may not be relevant any more, b= ut = > I don=E2=80=99t want to disconnect the thread.) > >> On Nov 15, 2021, at 13:17, Chris Ross wrot= e: >>> On Nov 15, 2021, at 10:08, Andriy Gapon wrote: >> >>> Yes, I propose to remove the wait for ARC evictions from arc_lowmem(= ). >>> >>> Another thing that may help a bit is having a greater "slack" betwee= n = >>> a threshold where the page daemon starts paging out and a threshold = = >>> where memory allocations start to wait (via vm_wait_domain). >>> >>> Also, I think that for a long time we had a problem (but not sure if= = >>> it's still present) where allocations succeeded without waiting unti= l = >>> the free memory went below certain threshold M, but once a thread = >>> started waiting in vm_wait it would not be woken up until the free = >>> memory went above another threshold N. And the problem was that N >= > = >>> M. In other words, a lot of memory had to be freed (and not grabbed= = >>> by other threads) before the waiting thread would be woken up. >> >> Thank you both for your inputs. Let me know if you=E2=80=99d like me= to try = >> anything, and I=E2=80=99ll kick (reboot) the system and can build a n= ew kernel = >> when you=E2=80=99d like. I did get another procstat -kka out of it t= his = >> morning, and the system has since gone less responsive, but I assume = = >> that new procstat won=E2=80=99t show anything last night=E2=80=99s di= dn=E2=80=99t. > > I=E2=80=99m still having this issue. I rebooted the machine, fsck=E2=80= =99d the disks, = > and got it running again. Again, it ran for ~50 hours before getting = = > stuck. I got another procstat-kka off of it, let me know if you=E2=80= =99d like = > a copy of it. But, it looks like the active processes are all in = > arc_wait_for_eviction. A pagedaemon is in a arc_wait_for_eviction und= er = > a arc_lowmem, but the python processes that were doing the real work = > don=E2=80=99t have arc_lowmem in their stacks, just the arc_wait_for_e= viction. > > Please let me know if there=E2=80=99s anything I can do to assist in f= inding a = > remedy for this. Thank you. > > - Chris > > > Just a wild guess. Would it help if you set a limit in the vfs.zfs.arc_m= ax = variable? Maybe that will help lower the memory pressure and gain some stability. You can use the zfs-stats package to see the current ARC size. My RPI4 gives: # zfs-stats -A ... ARC Size: 28.19% 1.93 GiB Target Size: (Adaptive) 30.47% 2.08 GiB Min Size (Hard Limit): 3.58% 250.80 MiB Max Size (High Water): 27:1 6.84 GiB ... You can use your stats to tune it to not use too much memory for ARC and= = leave more for the running applications so swapping might also be reduce= d. You can check zfs-stats -E to see if the ARC cache hit ratio is still ok= = with limited ARC. Regards, Ronald.