Date: Sun, 14 Nov 2021 22:26:13 -0500 From: Chris Ross <cross+freebsd@distal.com> To: Mark Johnston <markj@freebsd.org> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: swap_pager: cannot allocate bio Message-ID: <6DA63618-F0E9-48EC-AB57-3C3C102BC0C0@distal.com> In-Reply-To: <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com> References: <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <op.1cpimpsmkndu52@joepie> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <YY7KSgGZY9ehdjzu@nuc> <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Nov 12, 2021, at 23:15, Chris Ross <cross+freebsd@distal.com> = wrote: >=20 > I=E2=80=99ve built a stable/13 as of today, and updated the system. = I=E2=80=99ll see > If the problem recurs, it usually takes about 24 hours to show. If > It does, I=E2=80=99ll see if I can run a procstat -kka and get it off = of the system. Happy Sunday, all. So, I logged in this evening 48 hours after starting = the job that uses a lot of CPU and I/O to the ZFS pool. The system seemed = to be working, and I was thought stable/13 just fixed it. But, after only = a few minutes of fooling around it started to show problems. Ssh connection hung, and new ones couldn=E2=80=99t be made. Then they could, but the = shell got stuck in disk wait once, and others worked. Very odd. I logged into = the console and ran a procstat -kka. Then, I tried to ls -f a directory in = the large ZFS fs (/tank), which hung. Ctrl-T on that shows: load: 0.04. cmd: ls 87050 [aw.aew_cv] 41.13r 0.00u 0.00s 0% 2632k mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x1df = arc_get_data_impl+0x85 arc_hdr_alloc_abd+0x7b arc_read+0x6f7 = dbuf_read+0xc5b dmu_buf_hold+0x46 zap_cursor_retrieve+0x163 = zfs_freebsd_readdir+0x393 VOP_READDIR_APV+0x1f kern_getdirentries+0x1d9 = sys_getdirentries+0x29 amd64_syscall+0x10c fast_syscall_common+0xf8 A procstat -kka output is available (208kb of text, 1441 lines) at https://pastebin.com/SvDcvRvb An ssh of a top command completed and shows: last pid: 91551; load averages: 0.00, 0.02, 0.30 up 2+00:19:33 = 22:23:15 40 processes: 1 running, 38 sleeping, 1 zombie CPU: 3.9% user, 0.0% nice, 0.9% system, 0.0% interrupt, 95.2% idle Mem: 58G Active, 210M Inact, 1989M Laundry, 52G Wired, 1427M Buf, 12G = Free ARC: 48G Total, 10G MFU, 38G MRU, 128K Anon, 106M Header, 23M Other 46G Compressed, 46G Uncompressed, 1.00:1 Ratio Swap: 425G Total, 3487M Used, 422G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 90996 root 1 22 0 21M 9368K select 22 0:00 0.10% = sshd 89398 cross 23 52 0 97G 60G uwait 4 94.1H 0.00% = python3. 55463 cross 18 20 0 301M 54M kqread 31 4:30 0.00% = python3. 54338 cross 4 20 0 82M 9632K kqread 33 1:02 0.00% = python3. 84083 ntpd 1 20 0 21M 1712K select 33 0:07 0.00% = ntpd I=E2=80=99d love to hear any thoughts. Again, this is running a = 13-stable stable/13-n248044-4a36455c417. Thanks all. - Chris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6DA63618-F0E9-48EC-AB57-3C3C102BC0C0>