Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Apr 2022 17:22:39 +0200
From:      Roger Pau =?utf-8?B?TW9ubsOp?= <roger.pau@citrix.com>
To:        Ze Dupsys <zedupsys@gmail.com>
Cc:        <freebsd-xen@freebsd.org>, <buhrow@nfbcal.org>
Subject:   Re: ZFS + FreeBSD XEN dom0 panic
Message-ID:  <Ykxev3fangqRGQcn@Air-de-Roger>
In-Reply-To: <22643831-70d3-5a3e-f973-fb80957e80dc@gmail.com>
References:  <YjybrgeORadwBmjP@Air-de-Roger> <088c8222-063a-1db5-da83-a5a0168d66c6@gmail.com> <Yj16hdrxawD61mAL@Air-de-Roger> <639f7ce0-8a07-884c-c1cf-8257b9f3d9e8@gmail.com> <Yj7YrW9CG2aXT%2BiC@Air-de-Roger> <4da2302b-0745-ea1d-c868-5a8a5fc66b18@gmail.com> <Yj8lZWqeHbD%2BkfOQ@Air-de-Roger> <48b74c39-abb3-0a3e-91a8-b5ab1e1223ce@gmail.com> <YkAqxjiMM1M1QdgR@Air-de-Roger> <22643831-70d3-5a3e-f973-fb80957e80dc@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--4oNrmEULt/shPi4w
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On Sun, Apr 03, 2022 at 09:54:24AM +0300, Ze Dupsys wrote:
> On 2022.03.27. 12:13, Roger Pau Monné wrote:
> > ..
> > Thanks, unfortunately that patch was incomplete. I have an updated
> > version that I think is better now, and I've slightly tested it
> > (creating and destroying a domain with it doesn't seem to crash).
> > Appended patch at the end of the message.
> 
> Hi,
> 
> This patch was far better, i almost wanted to say that it works, stressed
> system with 2G RAM and it did not even have signs of sysctl-var leaks. There
> were too many things going on, thus i most probably will not be able to
> reproduce this case, but just before panic i did "xl list" command and it
> instantly crashed (new trace). What i noticed after restart, that again some
> default nightly script had made /var/backup/* files which made root file
> system full. Since test overlaid 2 nights, in first day when root disk was
> full, system did not panic, i just rm'ed /var/backup biggest file to make
> sure that there are no problems. When i did "xl list" both test VMs were
> running, state i do not know.
> 
> Full serial log in attachment, part 2 is where most interesting stuff is,
> ending is a bit mess, but:
> ..
> (XEN) d284v0: upcall vector 93
> Apr  3 08:10:11 lab-01 xenstored[937]: TDB: expand_file to 229376 failed (No
> space left on devicex)bbd24: Error 5 w
> riting backend/vbApd/284/51760/sectors
> r  3 08:10:11 lab-01 kernel: xbbd24: Fapid 9tal error. T37 ransitioning to
> Closing St(xensate
> tored), uid 0 inumber 2003596 on /: filesystekernel trap m ful12 lw
> ith interrupts disableApd
> 
> r
> Fatal trap 3 08: 110:11 2: page fault while in kerlab-nel mode
> 0cp1uid = 0 ; apicx id = 00
> enstofault virtual address	= 0red[9x20
> 37fault co]: code		= surpervisor read data, page not presentru
> inptistruction pointer	= 0x20:0xffffffff80c94e80on
>  destack pointer	        = 0x28:0xfffffe0051tected8803c0
> frame  pobyinter	        = 0x28:0xfffffe00518803d0
>  connecode segment		= basce 0x0, limit 0xfffff, type 0x1b
> tion			= DPL 0, pres 1 , long 1, def32 0, gran 1
> proces0: sor eflags	= resumerre, IOPL = 0 No sp
> current process		= 16 (xenwatch)
> traap number	ce	= 12
>   panic: page fault
> cpuid = 0
> time = 1648962612
> KDB: stack backtrace:
> #0 0xffffffff80c7c285 at kdb_backtrace+0x65
> #1 0xffffffff80c2e2e1 at vpanic+0x181
> #2 0xffffffff80c2e153 at panic+0x43
> #3 0xffffffff810c8b97 at trap+0xba7
> #4 0xffffffff810c8bef at trap+0xbff
> #5 0xffffffff810c8243 at trap+0x253
> #6 0xffffffff810a0848 at calltrap+0x8
> #7 0xffffffff80c0b87a at __mtx_unlock_sleep+0x7a
> #8 0xffffffff80a98724 at xbd_instance_create+0x7aa4
> #9 0xffffffff80a9abb0 at xbd_instance_create+0x9f30
> #10 0xffffffff80f95c64 at xenbusb_localend_changed+0x7c4
> #11 0xffffffff80ab0f04 at xs_unlock+0x704
> #12 0xffffffff80beaeee at fork_exit+0x7e
> #13 0xffffffff810a18be at fork_trampoline+0xe
> Uptime: 1d10h56m34s

Thanks, sorry for the late reply, somehow the message slip.

I've been able to get the file:line for those, and the trace is kind
of weird, I'm not sure I know what's going on TBH. It seems to me the
backend instance got freed while being in the process of connecting.

I've made some changes, that might mitigate this, but having not a
clear understanding of what's going on makes this harder.

I've pushed the changes to:

http://xenbits.xen.org/gitweb/?p=people/royger/freebsd.git;a=shortlog;h=refs/heads/for-leak

(This is on top of main branch).

I'm also attaching the two patches on this email.

Let me know if those make a difference to stabilize the system.

Thanks, Roger.

--4oNrmEULt/shPi4w
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment;
	filename="0001-xenbus-improve-device-tracking.patch"

>From 4cf5c9300bf8f9517b9b1acafcc95657dafd99de Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <royger@FreeBSD.org>
Date: Mon, 21 Mar 2022 12:47:20 +0100
Subject: [PATCH 1/2] xenbus: improve device tracking

xenbus needs to keep track of the devices exposed on xenstore, so that
it can trigger frontend and backend device creation.

Removal of backend devices is currently detected by checking the
existence of the device (backend) xenstore directory, but that's prone
to races as the device driver would usually add entries to such
directory itself, so under certain circumstances it's possible for a
driver to add node to the directory after the toolstack has removed
it.  This leads to devices not removed, which can eventually exhaust
the memory of FreeBSD.

Fix this by checking for the existence of the 'state' node instead of
the directory, as such directory will always be present when a device
is active, and will be removed by the toolstack when the device is
shut down.  In order to avoid any races with the updating of the
'state' node by FreeBSD and the toolstack removing it use a
transaction in xenbusb_write_ivar() for that purpose.

Reported by: Ze Dupsys <zedupsys@gmail.com>
Sponsored by: Citrix Systems R&D
---
 sys/xen/xenbus/xenbusb.c | 55 +++++++++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/sys/xen/xenbus/xenbusb.c b/sys/xen/xenbus/xenbusb.c
index e026f8203ea1..b038b63bd289 100644
--- a/sys/xen/xenbus/xenbusb.c
+++ b/sys/xen/xenbus/xenbusb.c
@@ -254,7 +254,7 @@ xenbusb_delete_child(device_t dev, device_t child)
 static void
 xenbusb_verify_device(device_t dev, device_t child)
 {
-	if (xs_exists(XST_NIL, xenbus_get_node(child), "") == 0) {
+	if (xs_exists(XST_NIL, xenbus_get_node(child), "state") == 0) {
 		/*
 		 * Device tree has been removed from Xenbus.
 		 * Tear down the device.
@@ -907,6 +907,7 @@ xenbusb_write_ivar(device_t dev, device_t child, int index, uintptr_t value)
 	case XENBUS_IVAR_STATE:
 	{
 		int error;
+		struct xs_transaction xst;
 
 		newstate = (enum xenbus_state)value;
 		sx_xlock(&ivars->xd_lock);
@@ -915,31 +916,37 @@ xenbusb_write_ivar(device_t dev, device_t child, int index, uintptr_t value)
 			goto out;
 		}
 
-		error = xs_scanf(XST_NIL, ivars->xd_node, "state",
-		    NULL, "%d", &currstate);
-		if (error)
-			goto out;
-
 		do {
-			error = xs_printf(XST_NIL, ivars->xd_node, "state",
-			    "%d", newstate);
-		} while (error == EAGAIN);
-		if (error) {
-			/*
-			 * Avoid looping through xenbus_dev_fatal()
-			 * which calls xenbus_write_ivar to set the
-			 * state to closing.
-			 */
-			if (newstate != XenbusStateClosing)
-				xenbus_dev_fatal(dev, error,
-						 "writing new state");
-			goto out;
-		}
+			error = xs_transaction_start(&xst);
+			if (error != 0)
+				goto out;
+
+			error = xs_scanf(xst, ivars->xd_node, "state", NULL,
+			    "%d", &currstate);
+			if (error)
+				goto out;
+
+			do {
+				error = xs_printf(xst, ivars->xd_node, "state",
+				    "%d", newstate);
+			} while (error == EAGAIN);
+			if (error) {
+				/*
+				 * Avoid looping through xenbus_dev_fatal()
+				 * which calls xenbus_write_ivar to set the
+				 * state to closing.
+				 */
+				if (newstate != XenbusStateClosing)
+					xenbus_dev_fatal(dev, error,
+					    "writing new state");
+				goto out;
+			}
+		} while (xs_transaction_end(xst, 0));
 		ivars->xd_state = newstate;
 
-		if ((ivars->xd_flags & XDF_CONNECTING) != 0
-		 && (newstate == XenbusStateClosed
-		  || newstate == XenbusStateConnected)) {
+		if ((ivars->xd_flags & XDF_CONNECTING) != 0 &&
+		    (newstate == XenbusStateClosed ||
+		    newstate == XenbusStateConnected)) {
 			struct xenbusb_softc *xbs;
 
 			ivars->xd_flags &= ~XDF_CONNECTING;
@@ -949,6 +956,8 @@ xenbusb_write_ivar(device_t dev, device_t child, int index, uintptr_t value)
 
 		wakeup(&ivars->xd_state);
 	out:
+		if (error != 0)
+			xs_transaction_end(xst, 1);
 		sx_xunlock(&ivars->xd_lock);
 		return (error);
 	}
-- 
2.35.1


--4oNrmEULt/shPi4w
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment;
	filename="0002-xen-blkback-fix-tear-down-issues.patch"

>From 1525f8ea0b35edf6df7633ac217c32711583caec Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <royger@FreeBSD.org>
Date: Sun, 27 Mar 2022 10:43:42 +0200
Subject: [PATCH 2/2] xen/blkback: fix tear-down issues

Handle tearing down a blkback that hasn't been fully initialized. This
requires carefully checking that fields are allocated before trying to
access them.  Also communication memory is allocated before setting
XBBF_RING_CONNECTED, so gating it's freeing on XBBF_RING_CONNECTED
being set is wrong and will lead to memory leaks.

Also stop using xbb_disconnect() in error paths. Use xenbus_dev_fatal
and let the normal disconnection procedure take care of the cleanup.

Reported by: Ze Dupsys <zedupsys@gmail.com>
Sponsored by: Citrix Systems R&D
---
 sys/dev/xen/blkback/blkback.c | 63 +++++++++++++++++------------------
 1 file changed, 30 insertions(+), 33 deletions(-)

diff --git a/sys/dev/xen/blkback/blkback.c b/sys/dev/xen/blkback/blkback.c
index 33414295bf5e..15e4bbe78fc0 100644
--- a/sys/dev/xen/blkback/blkback.c
+++ b/sys/dev/xen/blkback/blkback.c
@@ -2774,19 +2774,12 @@ xbb_free_communication_mem(struct xbb_softc *xbb)
 static int
 xbb_disconnect(struct xbb_softc *xbb)
 {
-	struct gnttab_unmap_grant_ref  ops[XBB_MAX_RING_PAGES];
-	struct gnttab_unmap_grant_ref *op;
-	u_int			       ring_idx;
-	int			       error;
-
 	DPRINTF("\n");
 
-	if ((xbb->flags & XBBF_RING_CONNECTED) == 0)
-		return (0);
-
 	mtx_unlock(&xbb->lock);
 	xen_intr_unbind(&xbb->xen_intr_handle);
-	taskqueue_drain(xbb->io_taskqueue, &xbb->io_task); 
+	if (xbb->io_taskqueue != NULL)
+		taskqueue_drain(xbb->io_taskqueue, &xbb->io_task);
 	mtx_lock(&xbb->lock);
 
 	/*
@@ -2796,19 +2789,28 @@ xbb_disconnect(struct xbb_softc *xbb)
 	if (xbb->active_request_count != 0)
 		return (EAGAIN);
 
-	for (ring_idx = 0, op = ops;
-	     ring_idx < xbb->ring_config.ring_pages;
-	     ring_idx++, op++) {
-		op->host_addr    = xbb->ring_config.gnt_addr
-			         + (ring_idx * PAGE_SIZE);
-		op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx];
-		op->handle	 = xbb->ring_config.handle[ring_idx];
-	}
+	if (xbb->flags & XBBF_RING_CONNECTED) {
+		struct gnttab_unmap_grant_ref  ops[XBB_MAX_RING_PAGES];
+		struct gnttab_unmap_grant_ref *op;
+		unsigned int ring_idx;
+		int error;
+
+		for (ring_idx = 0, op = ops;
+		     ring_idx < xbb->ring_config.ring_pages;
+		     ring_idx++, op++) {
+			op->host_addr    = xbb->ring_config.gnt_addr
+				         + (ring_idx * PAGE_SIZE);
+			op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx];
+			op->handle	 = xbb->ring_config.handle[ring_idx];
+		}
 
-	error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops,
-					  xbb->ring_config.ring_pages);
-	if (error != 0)
-		panic("Grant table op failed (%d)", error);
+		error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops,
+						  xbb->ring_config.ring_pages);
+		if (error != 0)
+			panic("Grant table op failed (%d)", error);
+
+		xbb->flags &= ~XBBF_RING_CONNECTED;
+	}
 
 	xbb_free_communication_mem(xbb);
 
@@ -2839,7 +2841,6 @@ xbb_disconnect(struct xbb_softc *xbb)
 		xbb->request_lists = NULL;
 	}
 
-	xbb->flags &= ~XBBF_RING_CONNECTED;
 	return (0);
 }
 
@@ -2963,7 +2964,6 @@ xbb_connect_ring(struct xbb_softc *xbb)
 					  INTR_TYPE_BIO | INTR_MPSAFE,
 					  &xbb->xen_intr_handle);
 	if (error) {
-		(void)xbb_disconnect(xbb);
 		xenbus_dev_fatal(xbb->dev, error, "binding event channel");
 		return (error);
 	}
@@ -3338,6 +3338,13 @@ xbb_connect(struct xbb_softc *xbb)
 		return;
 	}
 
+	error = xbb_publish_backend_info(xbb);
+	if (error != 0) {
+		xenbus_dev_fatal(xbb->dev, error,
+		    "Unable to publish device information");
+		return;
+	}
+
 	error = xbb_alloc_requests(xbb);
 	if (error != 0) {
 		/* Specific errors are reported by xbb_alloc_requests(). */
@@ -3359,16 +3366,6 @@ xbb_connect(struct xbb_softc *xbb)
 		return;
 	}
 
-	if (xbb_publish_backend_info(xbb) != 0) {
-		/*
-		 * If we can't publish our data, we cannot participate
-		 * in this connection, and waiting for a front-end state
-		 * change will not help the situation.
-		 */
-		(void)xbb_disconnect(xbb);
-		return;
-	}
-
 	/* Ready for I/O. */
 	xenbus_set_state(xbb->dev, XenbusStateConnected);
 }
-- 
2.35.1


--4oNrmEULt/shPi4w--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Ykxev3fangqRGQcn>