Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Nov 2021 08:49:17 -0500
From:      Chris Ross <cross+freebsd@distal.com>
To:        ronald-lists@klop.ws
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: swap_pager: cannot allocate bio
Message-ID:  <7B41B7D7-0C74-4F87-A49C-A666DB970CC3@distal.com>
In-Reply-To: <42006135.15.1636709757975@mailrelay>
References:  <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <op.1cpimpsmkndu52@joepie> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <42006135.15.1636709757975@mailrelay>

next in thread | previous in thread | raw e-mail | index | archive | help

>> root@host:~ # screen
>> load: 0.07 cmd: csh 56116 [vmwait] 35.00r 0.00u 0.01s 0% 3984k
>> mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 =
vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e =
uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee =
zone_alloc_item+0x6f malloc+0x5d sigacts_alloc+0x1c fork1+0x9fb =
sys_fork+0x54 amd64_syscall+0x10c fast_syscall_common+0xf8  As before, =
ps and even mount and df work here on console.  But, a =E2=80=9Czpool =
status tank=E2=80=9D will hang as before.  A Ctrl+D on it

>> load: 0.00 cmd: zpool 62829 [aw.aew_cv] 37.89r 0.00u 0.00s 0% 6976k
>> mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x14a =
arc_get_data_impl+0xdb arc_hdr_alloc_abd+0xa6 arc_hdr_alloc+0x11e =
arc_read+0x4f4 dbuf_read+0xc08 dmu_buf_hold+0x46 zap_lookup_norm+0x35 =
zap_contains+0x26 vdev_rebuild_get_stats+0xac vdev_config_generate+0x3e9 =
vdev_config_generate+0x74f spa_config_generate+0x2a2 =
spa_open_common+0x25c spa_get_stats+0x4e zfs_ioc_pool_stats+0x22

> Hi,
>=20
> Interesting. The details of these stacktraces are unknown to me. But =
it looks like it is waiting for available memory in both cases. What is =
the memory usage of the system while all this is happening. Is it =
swapping a lot?
> And what is the real setup of the disks? Are things like GELI used =
(not that the stack shows that) or swap-on-zfs?

It=E2=80=99s pretty simple.  No GELI, just three 3-disk raidz=E2=80=99s. =
 And swap is a partition on a physical (ish: hardware RAID1) disk, which =
is also where the OS and everything other than the one large ZFS =
filesystem are.

> And is there something else interesting in the logs than "swap_pager: =
cannot allocate bio"? Maybe a reason why it can't allocate the bio.

Not that I saw.  A new execution of procstat -kk (started yesterday), as =
well as a dmesg, both hang now.  They seem to be stuck with the same =
stack-trace as screen is.  And the zpool status shows the same stack =
with Ctrl-T as it has.  Looking at the logs now, Since I rebooted the =
system 24 hours ago, there are no kernel logs after the failure that =
began yesterday afternoon.  Apparently, this is a reproducible problem, =
it takes a day or less to get stuck.  So, that=E2=80=99s valuable in a =
way.  ;-)
=20
> I would not know a pointer on how to debug this except for checking =
tools like iostat, vmstat, etc.. Of course running 13-STABLE can give an =
interesting data point.

So, tl;dr; no data from the most recent hang other than what the =
stack-traces show.  Not even the =E2=80=9Ccannot allocate bio=E2=80=9D I =
saw two days ago after  increasing swap size.  I can take a look at =
13-STABLE, when I give up on this and reboot (likely today) I=E2=80=99ll =
try building that.

         - Chris





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7B41B7D7-0C74-4F87-A49C-A666DB970CC3>