FreeBSD Mail Archives

Date:      Fri, 12 Nov 2021 09:15:31 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Chris Ross <cross+freebsd@distal.com>
Cc:        Ronald Klop <ronald-lists@klop.ws>, freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: swap_pager: cannot allocate bio
Message-ID:  <CANCZdfpW3YJ7c_EO82BYwLCFhDXdCp2W_fxmxAXzYvr7HNmnZQ@mail.gmail.com>
In-Reply-To: <7B41B7D7-0C74-4F87-A49C-A666DB970CC3@distal.com>
References:  <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <op.1cpimpsmkndu52@joepie> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <42006135.15.1636709757975@mailrelay> <7B41B7D7-0C74-4F87-A49C-A666DB970CC3@distal.com>

--0000000000005d6b1905d099c5e3
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 12, 2021, 6:49 AM Chris Ross <cross+freebsd@distal.com> wrote:

>
> >> root@host:~ # screen
> >> load: 0.07 cmd: csh 56116 [vmwait] 35.00r 0.00u 0.01s 0% 3984k
> >> mi_switch+0xc1 _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51
> vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e
> uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee
> zone_alloc_item+0x6f malloc+0x5d sigacts_alloc+0x1c fork1+0x9fb
> sys_fork+0x54 amd64_syscall+0x10c fast_syscall_common+0xf8  As before, ps
> and even mount and df work here on console.  But, a =E2=80=9Czpool status=
 tank=E2=80=9D
> will hang as before.  A Ctrl+D on it
>
> >> load: 0.00 cmd: zpool 62829 [aw.aew_cv] 37.89r 0.00u 0.00s 0% 6976k
> >> mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x14a
> arc_get_data_impl+0xdb arc_hdr_alloc_abd+0xa6 arc_hdr_alloc+0x11e
> arc_read+0x4f4 dbuf_read+0xc08 dmu_buf_hold+0x46 zap_lookup_norm+0x35
> zap_contains+0x26 vdev_rebuild_get_stats+0xac vdev_config_generate+0x3e9
> vdev_config_generate+0x74f spa_config_generate+0x2a2 spa_open_common+0x25=
c
> spa_get_stats+0x4e zfs_ioc_pool_stats+0x22
>
> > Hi,
> >
> > Interesting. The details of these stacktraces are unknown to me. But it
> looks like it is waiting for available memory in both cases. What is the
> memory usage of the system while all this is happening. Is it swapping a
> lot?
> > And what is the real setup of the disks? Are things like GELI used (not
> that the stack shows that) or swap-on-zfs?
>
> It=E2=80=99s pretty simple.  No GELI, just three 3-disk raidz=E2=80=99s. =
 And swap is a
> partition on a physical (ish: hardware RAID1) disk, which is also where t=
he
> OS and everything other than the one large ZFS filesystem are.
>
> > And is there something else interesting in the logs than "swap_pager:
> cannot allocate bio"? Maybe a reason why it can't allocate the bio.
>
> Not that I saw.  A new execution of procstat -kk (started yesterday), as
> well as a dmesg, both hang now.  They seem to be stuck with the same
> stack-trace as screen is.  And the zpool status shows the same stack with
> Ctrl-T as it has.  Looking at the logs now, Since I rebooted the system 2=
4
> hours ago, there are no kernel logs after the failure that began yesterda=
y
> afternoon.  Apparently, this is a reproducible problem, it takes a day or
> less to get stuck.  So, that=E2=80=99s valuable in a way.  ;-)
>
> > I would not know a pointer on how to debug this except for checking
> tools like iostat, vmstat, etc.. Of course running 13-STABLE can give an
> interesting data point.
>
> So, tl;dr; no data from the most recent hang other than what the
> stack-traces show.  Not even the =E2=80=9Ccannot allocate bio=E2=80=9D I =
saw two days ago
> after  increasing swap size.  I can take a look at 13-STABLE, when I give
> up on this and reboot (likely today) I=E2=80=99ll try building that.
>

So the root cause of this problem is well known. You have a memory
shortage, so you want to page out dirty pages to reclaim memory.
However, there's not enough memory to allocate the structures you need to
do I/O and so the swapout I/O fails half way down
the stack not being able to allocate a bio. Some paths through the swapper
cope with this well, other parts that execute less
often cope less well.

There's some hacks in the tree today to help with the GELI case: we
prioritize swapping I/O. But there's no g_alloc_bio_swapping() interface
for swapping I/O to get priority on allocating a bio to start with. Places
that use g_clone_bio() could have the clone's copy allocated
from a special swap pool, but that starts to get messy and isn't done
today. And the upper layers like geom_cfs and ZFS are
inconsistent in allocations, so there's work needed to make it robust in
ZFS, but I have only a vague notion of what's needed. At the very
least, the swapping I/O that comes into the top of ZFS won't have swapping
I/O marked coming out the bottom because the
BIO_SWAP flag is quite new.

So until then, swapping on zvols is fraught with deadlocks like this and in
the past there's been a strong admonishment
against it.

Warner

         - Chris
>
>
>
>

--0000000000005d6b1905d099c5e3--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpW3YJ7c_EO82BYwLCFhDXdCp2W_fxmxAXzYvr7HNmnZQ>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation