Date: Mon, 15 Nov 2021 17:08:29 +0200 From: Andriy Gapon <avg@freebsd.org> To: Mark Johnston <markj@freebsd.org> Cc: Chris Ross <cross+freebsd@distal.com>, freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: swap_pager: cannot allocate bio Message-ID: <b2121d25-0782-5cc3-2b55-33ba11c41995@FreeBSD.org> In-Reply-To: <YZJzy%2ByI40wXFYjd@nuc> References: <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <op.1cpimpsmkndu52@joepie> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <YY7KSgGZY9ehdjzu@nuc> <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com> <6DA63618-F0E9-48EC-AB57-3C3C102BC0C0@distal.com> <35c14795-3b1c-9315-8e9b-a8dfad575a04@FreeBSD.org> <YZJzy%2ByI40wXFYjd@nuc>
next in thread | previous in thread | raw e-mail | index | archive | help
On 15/11/2021 16:50, Mark Johnston wrote: > On Mon, Nov 15, 2021 at 04:20:26PM +0200, Andriy Gapon wrote: >> On 15/11/2021 05:26, Chris Ross wrote: >>> A procstat -kka output is available (208kb of text, 1441 lines) at >>> https://pastebin.com/SvDcvRvb >> >> 67 100542 pagedaemon dom0 mi_switch+0xc1 >> _cv_wait+0xf2 arc_wait_for_eviction+0x1df arc_lowmem+0xca >> vm_pageout_worker+0x3c4 vm_pageout+0x1d7 fork_exit+0x8a fork_trampoline+0xe >> >> I was always of an opinion that waiting for the ARC reclaim in arc_lowmem was >> wrong. This shows an example of why it is so. >> >>> An ssh of a top command completed and shows: >>> >>> last pid: 91551; load averages: 0.00, 0.02, 0.30 up 2+00:19:33 22:23:15 >>> 40 processes: 1 running, 38 sleeping, 1 zombie >>> CPU: 3.9% user, 0.0% nice, 0.9% system, 0.0% interrupt, 95.2% idle >>> Mem: 58G Active, 210M Inact, 1989M Laundry, 52G Wired, 1427M Buf, 12G Free >> >> To me it looks like there is still plenty of free memory. >> >> I am not sure why vm_wait_domain (called by vm_page_alloc_noobj_domain) is not >> waking up. > > It's a deadlock: the page daemon is sleeping on the arc evict thread, > and the arc evict thread is waiting for memory: My point was that waiting for the free memory was not strictly needed yet given 12G free, but that's kind of obvious. > 2561 100722 zfskern arc_evict > mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 > vm_page_alloc_noobj_domain+0x184 uma_small_alloc+0x62 keg_alloc_slab+0xb0 > zone_import+0xee zone_alloc_item+0x6f arc_evict_state+0x81 arc_evict_cb+0x483 > zthr_procedure+0xba fork_exit+0x8a fork_trampoline+0xe > > I presume this is from the marker allocations in arc_evict_state(). > > The second problem is that UMA is refusing to try to allocate from the > "wrong" NUMA domain, but that policy seems overly strict. Fixing that > alone would make the problem harder to hit, but I think it wouldn't > solve it completely. Yes, I propose to remove the wait for ARC evictions from arc_lowmem(). Another thing that may help a bit is having a greater "slack" between a threshold where the page daemon starts paging out and a threshold where memory allocations start to wait (via vm_wait_domain). Also, I think that for a long time we had a problem (but not sure if it's still present) where allocations succeeded without waiting until the free memory went below certain threshold M, but once a thread started waiting in vm_wait it would not be woken up until the free memory went above another threshold N. And the problem was that N >> M. In other words, a lot of memory had to be freed (and not grabbed by other threads) before the waiting thread would be woken up. >> Perhaps this is some sort of a NUMA related issue where one memory domain is >> exhausted while other(s) still have a lot of memory. >> Or maybe it's something else but it must be some sort of a bug. >> >>> ARC: 48G Total, 10G MFU, 38G MRU, 128K Anon, 106M Header, 23M Other >>> 46G Compressed, 46G Uncompressed, 1.00:1 Ratio >>> Swap: 425G Total, 3487M Used, 422G Free -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b2121d25-0782-5cc3-2b55-33ba11c41995>