Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Mar 2022 16:24:17 +0100
From:      Roger Pau =?utf-8?B?TW9ubsOp?= <roger.pau@citrix.com>
To:        Ze Dupsys <zedupsys@gmail.com>
Cc:        <freebsd-xen@freebsd.org>, <buhrow@nfbcal.org>
Subject:   Re: ZFS + FreeBSD XEN dom0 panic
Message-ID:  <YjSkIZmV%2Bt8Q3AEn@Air-de-Roger>
In-Reply-To: <ca88a8c8-3b4e-fbde-18a7-d4e5f61e8b2c@gmail.com>
References:  <Yh93uLIBqk5NC2xf@Air-de-Roger> <CAOEWpzfsajhbvXfAw5-F1p83jjmSggobANBEyeYFAfiumAWRCA@mail.gmail.com> <YiCa70%2BHQScsoaKX@Air-de-Roger> <3d4691a7-c4b3-1c91-9eaa-7af071561bb6@gmail.com> <YihojHNbzJagm4SI@Air-de-Roger> <5dfdecd5-f94d-29b4-791e-0adde5405cf5@gmail.com> <Yiocagc4dTc15/Y1@Air-de-Roger> <feb35237-555b-29dc-d2fd-1659b400d683@gmail.com> <Yi8IvnqWUoWBIsLB@Air-de-Roger> <ca88a8c8-3b4e-fbde-18a7-d4e5f61e8b2c@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 15, 2022 at 08:51:57AM +0200, Ze Dupsys wrote:
> On 2022.03.14. 11:19, Roger Pau Monné wrote:
> > On Mon, Mar 14, 2022 at 10:06:58AM +0200, Ze Dupsys wrote:
> > > ..
> > > 
> > > Why those lines starting "xnb(xnb_detach:1330):" do not have any message?
> > > Could it be that there is a bad pointer to message buffer that can not be
> > > printed? And then sometimes panic happens because access goes out of allowed
> > > memory region?
> > Some messages in netback are just "\n", likely leftovers from debug.
> Okay, found the lines, it is as you say. So this will not be an easy one.
> 
> 
> > Can you try to stress the system again but this time with guests not
> > having any network interfaces? (so that netback doesn't get used in
> > dom0).
> I'll try to come up with something. At the moment all commands to VMs are
> given through ssh.
> 
> 
> > Then if you could rebuild the FreeBSD dom0 kernel with the above patch
> > we might be able to get a bit more of info about blkback shutdown.
> I rebuilt 13.1 STABLE, with commenting out #undef and adding #define, thus
> line number will differ by single line. For this test i did not remove
> network interfaces, and did add DPRINTF messages to xnb_detach function as
> well, since i hoped to maybe catch something there, by printing pointers. I
> somewhat did not like that xnb_detach does not check for NULL return from
> device_get_softc, nor for device_t argument, so i though, maybe those
> crashes are something related to that. But i guess this will not be so easy,
> and maybe it is safe to assume that "device_t dev" is always valid in that
> context.
> 
> So i ran stress test, system did not crash as it happens often when more
> debugging info is printed, characteristics change. But it did leak sysctl
> xbbd variables. I'll attach all collected log files. sysctl and xl list
> commands differ in timing a little bit. xl list _02 is when all VMs are
> turned off. Sysctl only has keys without values, not to trigger xnb tests
> while reading all values.

So I've been staring at this for a while, and I'm not yet sure I
figured out exactly what's going on, but can you give a try to the
patch below?

Thanks, Roger.
---8<---
diff --git a/sys/xen/xenbus/xenbusb.c b/sys/xen/xenbus/xenbusb.c
index e026f8203ea1..a8b75f46b9cc 100644
--- a/sys/xen/xenbus/xenbusb.c
+++ b/sys/xen/xenbus/xenbusb.c
@@ -254,7 +254,7 @@ xenbusb_delete_child(device_t dev, device_t child)
 static void
 xenbusb_verify_device(device_t dev, device_t child)
 {
-	if (xs_exists(XST_NIL, xenbus_get_node(child), "") == 0) {
+	if (xs_exists(XST_NIL, xenbus_get_node(child), "state") == 0) {
 		/*
 		 * Device tree has been removed from Xenbus.
 		 * Tear down the device.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YjSkIZmV%2Bt8Q3AEn>