FreeBSD Mail Archives

Date:      Sun, 14 Nov 2021 22:26:13 -0500
From:      Chris Ross <cross+freebsd@distal.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: swap_pager: cannot allocate bio
Message-ID:  <6DA63618-F0E9-48EC-AB57-3C3C102BC0C0@distal.com>
In-Reply-To: <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com>
References:  <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <op.1cpimpsmkndu52@joepie> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <YY7KSgGZY9ehdjzu@nuc> <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com>



> On Nov 12, 2021, at 23:15, Chris Ross <cross+freebsd@distal.com> =
wrote:
>=20
> I=E2=80=99ve built a stable/13 as of today, and updated the system.  =
I=E2=80=99ll see
> If the problem recurs, it usually takes about 24 hours to show.  If
> It does, I=E2=80=99ll see if I can run a procstat -kka and get it off =
of the system.

Happy Sunday, all.  So, I logged in this evening 48 hours after starting =
the
job that uses a lot of CPU and I/O to the ZFS pool.  The system seemed =
to
be working, and I was thought stable/13 just fixed it.  But, after only =
a few
minutes of fooling around it started to show problems.  Ssh connection
hung, and new ones couldn=E2=80=99t be made.  Then they could, but the =
shell got
stuck in disk wait once, and others worked.  Very odd.  I logged into =
the
console and ran a procstat -kka.  Then, I tried to ls -f a directory in =
the
large ZFS fs (/tank), which hung.  Ctrl-T on that shows:

load: 0.04. cmd: ls 87050 [aw.aew_cv] 41.13r 0.00u 0.00s 0% 2632k
mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x1df =
arc_get_data_impl+0x85 arc_hdr_alloc_abd+0x7b arc_read+0x6f7 =
dbuf_read+0xc5b dmu_buf_hold+0x46 zap_cursor_retrieve+0x163 =
zfs_freebsd_readdir+0x393 VOP_READDIR_APV+0x1f kern_getdirentries+0x1d9 =
sys_getdirentries+0x29 amd64_syscall+0x10c fast_syscall_common+0xf8

A procstat -kka output is available (208kb of text, 1441 lines) at
https://pastebin.com/SvDcvRvb

An ssh of a top command completed and shows:

last pid: 91551;  load averages:  0.00,  0.02,  0.30  up 2+00:19:33    =
22:23:15
40 processes:  1 running, 38 sleeping, 1 zombie
CPU:  3.9% user,  0.0% nice,  0.9% system,  0.0% interrupt, 95.2% idle
Mem: 58G Active, 210M Inact, 1989M Laundry, 52G Wired, 1427M Buf, 12G =
Free
ARC: 48G Total, 10G MFU, 38G MRU, 128K Anon, 106M Header, 23M Other
     46G Compressed, 46G Uncompressed, 1.00:1 Ratio
Swap: 425G Total, 3487M Used, 422G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU =
COMMAND
90996 root          1  22    0    21M  9368K select  22   0:00   0.10% =
sshd
89398 cross        23  52    0    97G    60G uwait    4  94.1H   0.00% =
python3.
55463 cross        18  20    0   301M    54M kqread  31   4:30   0.00% =
python3.
54338 cross         4  20    0    82M  9632K kqread  33   1:02   0.00% =
python3.
84083 ntpd          1  20    0    21M  1712K select  33   0:07   0.00% =
ntpd

I=E2=80=99d love to hear any thoughts.  Again, this is running a =
13-stable stable/13-n248044-4a36455c417.

Thanks all.

                   - Chris

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6DA63618-F0E9-48EC-AB57-3C3C102BC0C0>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation