Date: Fri, 12 Nov 2021 08:49:17 -0500 From: Chris Ross <cross+freebsd@distal.com> To: ronald-lists@klop.ws Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: swap_pager: cannot allocate bio Message-ID: <7B41B7D7-0C74-4F87-A49C-A666DB970CC3@distal.com> In-Reply-To: <42006135.15.1636709757975@mailrelay> References: <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <op.1cpimpsmkndu52@joepie> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <42006135.15.1636709757975@mailrelay>
next in thread | previous in thread | raw e-mail | index | archive | help
>> root@host:~ # screen >> load: 0.07 cmd: csh 56116 [vmwait] 35.00r 0.00u 0.01s 0% 3984k >> mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 = vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e = uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee = zone_alloc_item+0x6f malloc+0x5d sigacts_alloc+0x1c fork1+0x9fb = sys_fork+0x54 amd64_syscall+0x10c fast_syscall_common+0xf8 As before, = ps and even mount and df work here on console. But, a =E2=80=9Czpool = status tank=E2=80=9D will hang as before. A Ctrl+D on it >> load: 0.00 cmd: zpool 62829 [aw.aew_cv] 37.89r 0.00u 0.00s 0% 6976k >> mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x14a = arc_get_data_impl+0xdb arc_hdr_alloc_abd+0xa6 arc_hdr_alloc+0x11e = arc_read+0x4f4 dbuf_read+0xc08 dmu_buf_hold+0x46 zap_lookup_norm+0x35 = zap_contains+0x26 vdev_rebuild_get_stats+0xac vdev_config_generate+0x3e9 = vdev_config_generate+0x74f spa_config_generate+0x2a2 = spa_open_common+0x25c spa_get_stats+0x4e zfs_ioc_pool_stats+0x22 > Hi, >=20 > Interesting. The details of these stacktraces are unknown to me. But = it looks like it is waiting for available memory in both cases. What is = the memory usage of the system while all this is happening. Is it = swapping a lot? > And what is the real setup of the disks? Are things like GELI used = (not that the stack shows that) or swap-on-zfs? It=E2=80=99s pretty simple. No GELI, just three 3-disk raidz=E2=80=99s. = And swap is a partition on a physical (ish: hardware RAID1) disk, which = is also where the OS and everything other than the one large ZFS = filesystem are. > And is there something else interesting in the logs than "swap_pager: = cannot allocate bio"? Maybe a reason why it can't allocate the bio. Not that I saw. A new execution of procstat -kk (started yesterday), as = well as a dmesg, both hang now. They seem to be stuck with the same = stack-trace as screen is. And the zpool status shows the same stack = with Ctrl-T as it has. Looking at the logs now, Since I rebooted the = system 24 hours ago, there are no kernel logs after the failure that = began yesterday afternoon. Apparently, this is a reproducible problem, = it takes a day or less to get stuck. So, that=E2=80=99s valuable in a = way. ;-) =20 > I would not know a pointer on how to debug this except for checking = tools like iostat, vmstat, etc.. Of course running 13-STABLE can give an = interesting data point. So, tl;dr; no data from the most recent hang other than what the = stack-traces show. Not even the =E2=80=9Ccannot allocate bio=E2=80=9D I = saw two days ago after increasing swap size. I can take a look at = 13-STABLE, when I give up on this and reboot (likely today) I=E2=80=99ll = try building that. - Chris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7B41B7D7-0C74-4F87-A49C-A666DB970CC3>