Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Mar 2022 16:40:54 +0100
From:      Roger Pau =?utf-8?B?TW9ubsOp?= <roger.pau@citrix.com>
To:        Ze Dupsys <zedupsys@gmail.com>
Cc:        <freebsd-xen@freebsd.org>, <buhrow@nfbcal.org>
Subject:   Re: ZFS + FreeBSD XEN dom0 panic
Message-ID:  <YiocBR0LMkgC7bai@Air-de-Roger>
In-Reply-To: <061700f9-86aa-b2ae-07a4-27d4c3d96966@gmail.com>
References:  <CAOEWpzc2WVViMJHrrtuU-G_7yck4eehm6b=JQPSZU1MH-bzmiw@mail.gmail.com> <202203011540.221FeR4f028103@nfbcal.org> <CAOEWpzdC41ithfd7R_qa66%2Bsh_UXeku7OcVC_b%2BXUaLr_9SSTA@mail.gmail.com> <Yh93uLIBqk5NC2xf@Air-de-Roger> <CAOEWpzfsajhbvXfAw5-F1p83jjmSggobANBEyeYFAfiumAWRCA@mail.gmail.com> <YiCa70%2BHQScsoaKX@Air-de-Roger> <3d4691a7-c4b3-1c91-9eaa-7af071561bb6@gmail.com> <YihojHNbzJagm4SI@Air-de-Roger> <061700f9-86aa-b2ae-07a4-27d4c3d96966@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 10, 2022 at 11:34:26AM +0200, Ze Dupsys wrote:
> On 2022.03.09. 10:42, Roger Pau Monné wrote:
> > On Sun, Mar 06, 2022 at 02:41:17PM +0200, Ze Dupsys wrote:
> > > If i execute steps 1 and 4, it seems that machine does not crash; tested 8
> > > hours Dom0 8GB RAM, 4 hours Dom0 2GB RAM. So creating a lot of ZVOLs and
> > > monitoring it does not seem to be the cause.
> > > 
> > > Yesterday was a happy and a sad day at the same time. The happy thing is
> > > that i think that i have understood one panic reason and somewhat better
> > > located where  one problem might be, but at the same time, i think i have
> > > located 2 bugs that cause one specific panic, but 2 other panic reasons are
> > > due to different circumstances.
> > > 
> > > Further is my pure speculation about two things, somewhat related; i do not
> > > know XEN or kernel internals, thus take this with grain of salt.
> > > 
> > > Knowing that this seemed to be RAM problem and thinking how ZFS sysctl
> > > tuning values differ, i started to look at "sysctl -a" diffs for Dom0 with
> > > 2GB RAM and 8GB RAM, by pure luck because comparing Dom0 that was running
> > > longer and just recently restarted, i noticed there are diff lines like so:
> > > ..
> > > +dev.xbbd.202.%driver: xbbd
> > > +dev.xbbd.202.%location:
> > > +dev.xbbd.202.%parent: xenbusb_back0
> > > +dev.xbbd.202.%pnpinfo:
> > > +dev.xbbd.202.xenbus_connection_state: Closed
> > > +dev.xbbd.202.xenbus_dev_type: vbd
> > > +dev.xbbd.202.xenbus_peer_domid: 309
> > > +dev.xbbd.202.xenstore_path: backend/vbd/309/51728
> > > +dev.xbbd.202.xenstore_peer_path: /local/domain/309/device/vbd/51728
> > > +dev.xbbd.203.%desc: Backend Virtual Block Device
> > > +dev.xbbd.203.%driver: xbbd
> > > +dev.xbbd.203.%location:
> > > +dev.xbbd.203.%parent: xenbusb_back0
> > > +dev.xbbd.203.%pnpinfo:
> > > +dev.xbbd.203.xenbus_connection_state: Closed
> > > ..
> > > although actually in system at the time there were only 2 DomUs running max.
> > > This seems to be the reason why this felt like memory leak, since after long
> > > time of start/stop VMs, most probably sysctl hit some sort of RAM limit
> > > (which most probably is auto-calculated from total RAM somehow) and panics.
> > Those dev.xbbd.XXX sysctls reference a specific blkback instance, and
> > should be gone once the guest is shutdown. Do you mean that you have a
> > range of dev.xbbd.[0-203] entries in your sysctl?
> Yes, i have that range of those entries. Even if all DomUs are destroyed.
> Snippet from logs - yesterday started, today crashed:
> cat 2022_03_10__1646887233_sysctl__001.log | grep xbbd | sed -Ee
> 's/(xbbd\.[0-9]*)\..*/\1/g' | sort | uniq
> dev.xbbd.%parent:
> dev.xbbd.0
> dev.xbbd.1
> dev.xbbd.10
> dev.xbbd.100
> dev.xbbd.101
> dev.xbbd.102
> dev.xbbd.103
> dev.xbbd.104
> dev.xbbd.105
> dev.xbbd.106
> dev.xbbd.107
> dev.xbbd.108
> dev.xbbd.109
> dev.xbbd.11
> ..
> dev.xbbd.52
> dev.xbbd.520
> dev.xbbd.521
> dev.xbbd.522
> dev.xbbd.523
> dev.xbbd.524
> ..
> dev.xbbd.99
> irq804: xbbd520:133 @cpu0(domain0): 436527
> irq805: xbbd521:135 @cpu0(domain0): 4777
> irq806: xbbd522:137 @cpu0(domain0): 16589254
> irq807: xbbd523:139 @cpu0(domain0): 6412
> irq808: xbbd524:141 @cpu0(domain0): 103083
> 
> And those entries keep growing.

That's not expected. Can you paste the output of `xenstore-ls -fp`
when you get those stale entries?

Also, what does `xl list`?

Also, can you check the log files at '/var/log/xen/xl-*.log' (where * is
the domain name) to try to gather why the backend is not properly
destroyed?

There's clearly something that prevents blkback from shutting down and
releasing its resources.

> Yesterday, i tried to follow Brian's email
> about working setup with FreeBSD 12.1. With given scripts 8GB RAM Dom0,
> still crashed, took longer, but on morning i saw that it machine rebooted,
> in serial logs had panic message. It seems that "kern.panic_reboot_wait_time
> = -1" did not disable rebooting as well. In monitoring logs i see that
> approx. 3 minutes after crash, testing machine was up and running again
> (rebooted).
> 
> 
> > Note that in dev.xbbd.XXX XXX matches the domain ID of the guest
> > that's using the backend.
> On my setups i have tested this does not seem to be the case. If each VM has
> more than 1 HDD, i have that many instances of xbbd sysctl variables as
> total count of HDDs attached to VMs. But at some point after rebooting
> machines or something, this number starts to grow on each create. For me it
> seems that dom ID is in sysctl, like this "dev.xbbd.5.xenbus_peer_domid",
> and if i grep those values, they show domids for nonexistent VMs.

Right, my bad. Those numbers are assigned by FreeBSD to identify
multiple instances of the same driver.

> I do not know how sysctls are created, but is there some test suite that
> creates awfully lot of sysctl variables and tries to crash system that way?
> I suppose there are no ways to limit sysctl variable count or any protection
> mechanisms for such cases. Then i could see if panic message is the same. Or
> test that allocates sysctl variable with insaneley large value, or appends
> to it.

It's likely the sysctl plus the associated blkback bounce buffers for
example, that are not freed. The problem is that somehow blkback
instances are not properly freed.

> > > Then i caught another sysctl variable that is growing due to XEN,
> > > "kern.msgbuf: Contents of kernel message buffer". I do not know how this
> > > variable grows or by which component it is managed, but in VM start/stop
> > > case it grows and contains lines with pattern like so:
> > > ..
> > > xnb(xnb_rxpkt2rsp:2059): Got error -1 for hypervisor gnttab_copy status
> > > xnb(xnb_ring2pkt:1526): Unknown extra info type 255.  Discarding packet
> > > xnb(xnb_dump_txreq:299): netif_tx_request index =0
> > > xnb(xnb_dump_txreq:300): netif_tx_request.gref  =0
> > > xnb(xnb_dump_txreq:301): netif_tx_request.offset=0
> > > xnb(xnb_dump_txreq:302): netif_tx_request.flags =8
> > > xnb(xnb_dump_txreq:303): netif_tx_request.id    =69
> > > xnb(xnb_dump_txreq:304): netif_tx_request.size  =1000
> > > xnb(xnb_dump_txreq:299): netif_tx_request index =1
> > > xnb(xnb_dump_txreq:300): netif_tx_request.gref  =255
> > > xnb(xnb_dump_txreq:301): netif_tx_request.offset=0
> > > xnb(xnb_dump_txreq:302): netif_tx_request.flags =0
> > > xnb(xnb_dump_txreq:303): netif_tx_request.id    =0
> > > xnb(xnb_dump_txreq:304): netif_tx_request.size  =0
> > > ..
> > > 
> > > Those lines in that variable just keep growing and growing, it is not that
> > > they are flushed, trimmed or anything. Each time i get the same message on
> > > serial output, it has one more section of error appended to "same-previous"
> > > serial output message and sysctl variable as well. Thus at some point serial
> > > output and sysctl contains a large block of those errors while VM is
> > > starting. So at some point the value of this sysctl could be reaching max
> > > allowed/available and this makes the system panic.  I do not know the reason
> > > for those errors, but actually if there was a patch to suppress them, this
> > > could be "solved". Another diff chunk might be related to this:
> > > +dev.xnb.1.xenstore_peer_path: /local/domain/7/device/vif/0
> > > +dev.xnb.1.xenbus_peer_domid: 7
> > > +dev.xnb.1.xenbus_connection_state: InitWait
> > > +dev.xnb.1.xenbus_dev_type: vif
> > > +dev.xnb.1.xenstore_path: backend/vif/7/0
> > > +dev.xnb.1.dump_rings:
> > > +dev.xnb.1.unit_test_results: xnb_rxpkt2rsp_empty:1765 Assertion Error:
> > > nr_reqs == 0
> > > +xnb_rxpkt2rsp_empty:1767 Assertion Error: memcmp(&rxb_backup,
> > > &xnb_unit_pvt.rxb, sizeof(rxb_backup)) == 0
> > > +xnb_rxpkt2rsp_empty:1769 Assertion Error: memcmp(&rxs_backup,
> > > xnb_unit_pvt.rxs, sizeof(rxs_backup)) == 0
> > > +52 Tests Passed
> > > +1 Tests FAILED
> > So you have failed tests for netback. Maybe the issue is with
> > netback rather than blkback.
> I am not sure where the problem is. Maybe there are two or more problems.
> Should they be discussed separately network related and disk related?
> 
> 
> > > What was suspicious about this is that i am using 13.0-RELEASE-p7, it is not
> > > a DEV version of XEN or anything, but there are sysctl variables with "dev"
> > > prefix. Could it be that ports have been accidentally compiled XEN version
> > > with some development flags turned on?
> > 'dev' here means device, not developer.
> Okay. Understood.
> 
> 
> > > So those were my conclusions.
> > > 
> > > What do you think? How should we proceed with this? Should i try to somehow
> > > build XEN from git sources?
> > I think that's unlikely to make any difference. I would think the
> > problem is with FreeBSD rather than Xen. Can you paste the config file
> > you are using the create the domain(s)?
> Yet again, as i said, i do not know where the problem is. My wording might
> be off when pointing that it's Xen, since Dom0 is FreeBSD and it panics not
> Xen. It's just that i do not know the boundaries and responsibilities of
> components clearly enough. And often by mentioning Xen i mean whole setup or
> FreeBSD module that interacts with Xen.
> 
> 
> > I've tried myself to trigger this crash by creating and destroying a
> > guest in a loop but didn't manage to trigger it. I was able to create
> > (and shutdown) 1328 guests successfully on a ZFS dom0 using 4GB of
> > RAM.
> No, just turning on/off single VM does not crash system. I had tried that as
> well before reporting bug. I had attached all /bin/sh scripts and configs
> that crash the system in bug report #261059 tar archive. Those scripts are
> really simple. I've been trying to minimize/trim everything off since then,
> but the simplest requirements for crash are executing 3 processes in
> parallel:
> 
> 1) loop that creates ZFS volumes at least of size 2GB and with dd writes
> data into them, /dev/zero is good. I've never run out of space, thus crash
> is not due to exhausted space in ZFS pool.
> 2) loop that creates VM1 with single disk, connects through ssh to that VM
> and writes 3GB data in /tmp, then reboots, starts VM, removes file from
> /tmp, then reboots
> 3) loop that starts/stops VM2 with 5 disks.
> 
> As for panic timing. Less RAM for Dom0, increases speed for panic, panic
> message is the same. For 8GB RAM on my machine i have to wait for approx. 10
> hours till panic, on 4G RAM around 6 hours. On resource starved Dom0 with
> 2GB, even shorter, but panic messages are more or less the same. I have
> crashed just by omitting step 1, but it takes longer. Running just on/off
> for both VMs, system does not crash. Thus it seems that some sort of ZFS
> load (VM internal or Dom0) is essential for panic to occur. I have managed
> to crash with VM which has 2 HDDs, thus it does not seem to be due to 4 HDD
> threshold emulated/virtio disks. It's just that with 5, it takes less time
> till crash. Usually panic occurs after when xbbd.$value, $value > 230.

Right, at some point you will run out of memory if resources are not
properly freed. Can you try to boot with "boot_verbose=YES" in
/boot/loader.conf and see if that gives you are more information?

Otherwise I might have to provide you with a patch to blkback in order
to attempt to detect why backends are not destroyed.

Since you stress the system quite badly, do you by any chance see 'xl'
processes getting terminated? Background xl processes being killed
will lead to backends not being properly shut down.

Regards, Roger.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YiocBR0LMkgC7bai>