Date: Fri, 18 Mar 2022 16:24:17 +0100 From: Roger Pau =?utf-8?B?TW9ubsOp?= <roger.pau@citrix.com> To: Ze Dupsys <zedupsys@gmail.com> Cc: <freebsd-xen@freebsd.org>, <buhrow@nfbcal.org> Subject: Re: ZFS + FreeBSD XEN dom0 panic Message-ID: <YjSkIZmV%2Bt8Q3AEn@Air-de-Roger> In-Reply-To: <ca88a8c8-3b4e-fbde-18a7-d4e5f61e8b2c@gmail.com> References: <Yh93uLIBqk5NC2xf@Air-de-Roger> <CAOEWpzfsajhbvXfAw5-F1p83jjmSggobANBEyeYFAfiumAWRCA@mail.gmail.com> <YiCa70%2BHQScsoaKX@Air-de-Roger> <3d4691a7-c4b3-1c91-9eaa-7af071561bb6@gmail.com> <YihojHNbzJagm4SI@Air-de-Roger> <5dfdecd5-f94d-29b4-791e-0adde5405cf5@gmail.com> <Yiocagc4dTc15/Y1@Air-de-Roger> <feb35237-555b-29dc-d2fd-1659b400d683@gmail.com> <Yi8IvnqWUoWBIsLB@Air-de-Roger> <ca88a8c8-3b4e-fbde-18a7-d4e5f61e8b2c@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 15, 2022 at 08:51:57AM +0200, Ze Dupsys wrote: > On 2022.03.14. 11:19, Roger Pau Monné wrote: > > On Mon, Mar 14, 2022 at 10:06:58AM +0200, Ze Dupsys wrote: > > > .. > > > > > > Why those lines starting "xnb(xnb_detach:1330):" do not have any message? > > > Could it be that there is a bad pointer to message buffer that can not be > > > printed? And then sometimes panic happens because access goes out of allowed > > > memory region? > > Some messages in netback are just "\n", likely leftovers from debug. > Okay, found the lines, it is as you say. So this will not be an easy one. > > > > Can you try to stress the system again but this time with guests not > > having any network interfaces? (so that netback doesn't get used in > > dom0). > I'll try to come up with something. At the moment all commands to VMs are > given through ssh. > > > > Then if you could rebuild the FreeBSD dom0 kernel with the above patch > > we might be able to get a bit more of info about blkback shutdown. > I rebuilt 13.1 STABLE, with commenting out #undef and adding #define, thus > line number will differ by single line. For this test i did not remove > network interfaces, and did add DPRINTF messages to xnb_detach function as > well, since i hoped to maybe catch something there, by printing pointers. I > somewhat did not like that xnb_detach does not check for NULL return from > device_get_softc, nor for device_t argument, so i though, maybe those > crashes are something related to that. But i guess this will not be so easy, > and maybe it is safe to assume that "device_t dev" is always valid in that > context. > > So i ran stress test, system did not crash as it happens often when more > debugging info is printed, characteristics change. But it did leak sysctl > xbbd variables. I'll attach all collected log files. sysctl and xl list > commands differ in timing a little bit. xl list _02 is when all VMs are > turned off. Sysctl only has keys without values, not to trigger xnb tests > while reading all values. So I've been staring at this for a while, and I'm not yet sure I figured out exactly what's going on, but can you give a try to the patch below? Thanks, Roger. ---8<--- diff --git a/sys/xen/xenbus/xenbusb.c b/sys/xen/xenbus/xenbusb.c index e026f8203ea1..a8b75f46b9cc 100644 --- a/sys/xen/xenbus/xenbusb.c +++ b/sys/xen/xenbus/xenbusb.c @@ -254,7 +254,7 @@ xenbusb_delete_child(device_t dev, device_t child) static void xenbusb_verify_device(device_t dev, device_t child) { - if (xs_exists(XST_NIL, xenbus_get_node(child), "") == 0) { + if (xs_exists(XST_NIL, xenbus_get_node(child), "state") == 0) { /* * Device tree has been removed from Xenbus. * Tear down the device.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YjSkIZmV%2Bt8Q3AEn>