Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Feb 2017 15:12:25 +0000
From:      "roger.pau@citrix.com" <roger.pau@citrix.com>
To:        Liuyingdong <liuyingdong@huawei.com>
Cc:        "freebsd-xen@freebsd.org" <freebsd-xen@freebsd.org>, Suoben <suoben@huawei.com>, "Zhaojun (Euler)" <zhao.zhaojun@huawei.com>, Wanglinkai <wanglinkai@huawei.com>, chuzhaosong <chuzhaosong@huawei.com>, "Wangchunfeng (Ivan)" <chunfeng.wang@huawei.com>, "Gaoxiaodong (Leo)" <leo.gaoxiaodong@huawei.com>
Subject:   Multiple patch review (was: =?utf-8?B?UmU6?= =?utf-8?B?IOetlOWkjTog562U5aSNOiBbUEFUQ0hdbmV0ZnJvbnQ6IG5lZWQgcmVsZWFz?= =?utf-8?Q?e?= all resources) after adding and removing NICs time and again
Message-ID:  <20170210151225.ez2upgqws3pfjxa2@dhcp-3-221.uk.xensource.com>
In-Reply-To: <etPan.5899fc03.6e4b4555.3888@localhost>
References:  <etPan.5899fc03.6e4b4555.3888@localhost>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 07, 2017 at 04:55:34PM +0000, Liuyingdong wrote:
> Hi Roger,
>          I am so sorry and please review tne below URL:
> https://lists.freebsd.org/pipermail/freebsd-xen/2017-February/002957.html

Hello,

Thanks for the patches, and sorry for the delay. It seems like you have not
applied some of my comments, so I will have to re-post them here. If some of
the comments don't apply for whatever reason, please reply back and explain
why, or else this is not going to progress in an useful way for any of us.

I would also request you to look into using `git send-email`, reviewing your
patches as attachments is not very comfortable. Or else, you could create an
account to https://reviews.freebsd.org/ and upload the patches there assigning
me as a reviewer.

When replying, can you also make sure that you reply in-line to my comments, or
else it's very hard to follow the conversation.

---
> From 0af05ac01be853340b3b53862de9853d8d44c0c6 Mon Sep 17 00:00:00 2001
> From: liuyingdong <liuyingdong@huawei.com>
> Date: Thu, 2 Feb 2017 19:40:48 +0200
> Subject: introduce suspend_cancel mechanism for frontend devices.
> 1.1 On a cancelled suspend, xen frontend devices need know their state is invariant.
> 1.2 On a cancelled suspend the vcpu_info location does not change 
> (it's still in the per-cpu area registered by xen_hvm_cpu_init()). 
> So do not call xen_hvm_init_shared_info_page() which would make the kernel think its back in the shared info. 
> With the wrong vcpu_info, events cannot be received and the domain will hang after a cancelled suspend.
> 
> ---
>  dev/xen/blkfront/blkfront.c |  5 +++++
>  dev/xen/control/control.c   | 11 +++++++++--
>  dev/xen/netfront/netfront.c | 21 +++++++++++++++++++++
>  x86/xen/hvm.c               | 16 ++++++++++------
>  xen/xenbus/xenbusb.c        | 35 +++++++++++++++++++----------------
>  xen/xenbus/xenbusvar.h      |  2 ++
>  6 files changed, 66 insertions(+), 24 deletions(-)
> 
> diff --git a/dev/xen/blkfront/blkfront.c b/dev/xen/blkfront/blkfront.c
> index 9eca220..47cd83f 100644
> --- a/dev/xen/blkfront/blkfront.c
> +++ b/dev/xen/blkfront/blkfront.c
> @@ -1537,6 +1537,11 @@ xbd_resume(device_t dev)
>  {
>  	struct xbd_softc *sc = device_get_softc(dev);
>  
> +	if(g_suspend_cancelled) {
> +		sc->xbd_state = XBD_STATE_CONNECTED;
> +		return (0);
> +	}
> +
>  	DPRINTK("xbd_resume: %s\n", xenbus_get_node(dev));
>  
>  	xbd_free(sc);
> diff --git a/dev/xen/control/control.c b/dev/xen/control/control.c
> index 58fefcc..4f8471f 100644
> --- a/dev/xen/control/control.c
> +++ b/dev/xen/control/control.c
> @@ -190,6 +190,11 @@ xctrl_reboot()
>  	shutdown_nice(0);
>  }
>  
> +static void _set_suspend_cancelled(bool suspend_cancelled)
> +{
> +	g_suspend_cancelled = suspend_cancelled;
> +}
> +
>  static void
>  xctrl_suspend()
>  {
> @@ -278,7 +283,9 @@ xctrl_suspend()
>  	/*
>  	 * Reset grant table info.
>  	 */
> -	gnttab_resume(NULL);
> +	if(suspend_cancelled == 0)  {
> +		gnttab_resume(NULL);
> +	}
>  
>  #ifdef SMP
>  	if (!CPU_EMPTY(&cpu_suspend_map)) {
> @@ -296,6 +303,7 @@ xctrl_suspend()
>  	 * FreeBSD really needs to add DEVICE_SUSPEND_CANCEL or
>  	 * similar.
>  	 */
> +	_set_suspend_cancelled(suspend_cancelled != 0);

No need for this helper, you can just s/suspend_cancelled/xen_suspend_called/
and make it global.

>  	DEVICE_RESUME(root_bus);
>  	mtx_unlock(&Giant);
>  
> @@ -319,7 +327,6 @@ xctrl_suspend()
>  #endif
>  
>  	resume_all_proc();
> -

Stray change.

>  	EVENTHANDLER_INVOKE(power_resume);
>  
>  	if (bootverbose)
> diff --git a/dev/xen/netfront/netfront.c b/dev/xen/netfront/netfront.c
> index 459712a..537018a 100644
> --- a/dev/xen/netfront/netfront.c
> +++ b/dev/xen/netfront/netfront.c
> @@ -153,6 +153,10 @@ static int xn_get_responses(struct netfront_rxq *,
>      struct netfront_rx_info *, RING_IDX, RING_IDX *,
>      struct mbuf **);
>  
> +#ifdef INET
> +static void netfront_send_fake_arp(device_t dev, struct netfront_info *info);
> +#endif
> +
>  #define virt_to_mfn(x) (vtophys(x) >> PAGE_SHIFT)
>  
>  #define INVALID_P2M_ENTRY (~0UL)
> @@ -440,6 +444,23 @@ netfront_resume(device_t dev)
>  {
>  	struct netfront_info *info = device_get_softc(dev);
>  
> +	if(g_suspend_cancelled) {
> +		u_int i;
> +		for (i = 0; i < info->num_queues; i++) {
> +		    XN_RX_LOCK(&info->rxq[i]);
> +		    XN_TX_LOCK(&info->txq[i]);

Bad indentation, you should use tabs (not spaces).

> +		}
> +		netfront_carrier_on(info);
> +		for (i = 0; i < info->num_queues; i++) {
> +		    XN_RX_UNLOCK(&info->rxq[i]);
> +		    XN_TX_UNLOCK(&info->txq[i]);

Bad indentation, you should use tabs (not spaces).

> +		}
> +#ifdef INET
> +		netfront_send_fake_arp(dev, info);
> +#endif

I don't think you need to send an ARP on a cancelled suspend, the bridge should
already have the mac address of the virtual interface in it's cache right?

> +		return (0);
> +	}
> +
>  	netif_disconnect_backend(info);
>  	return (0);
>  }
> diff --git a/x86/xen/hvm.c b/x86/xen/hvm.c
> index e10659e..c96b7be 100644
> --- a/x86/xen/hvm.c
> +++ b/x86/xen/hvm.c
> @@ -297,10 +297,9 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
>  	int error;
>  	int i;
>  
> -	if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND)
> -		return;
> -
> -	error = xen_hvm_init_hypercall_stubs(init_type);
> +	if (init_type != XEN_HVM_INIT_CANCELLED_SUSPEND) {
> +		error = xen_hvm_init_hypercall_stubs(init_type);
> +	}
>  
>  	switch (init_type) {
>  	case XEN_HVM_INIT_COLD:
> @@ -331,6 +330,8 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
>  		CPU_FOREACH(i)
>  			DPCPU_ID_SET(i, vcpu_info, NULL);
>  		break;
> +	case XEN_HVM_INIT_CANCELLED_SUSPEND:
> +		break;
>  	default:
>  		panic("Unsupported HVM initialization type");
>  	}
> @@ -344,7 +345,9 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
>  	 * is passed inside the start_info struct and is already set, so this
>  	 * functions are no-ops.
>  	 */
> -	xen_hvm_init_shared_info_page();
> +	if (init_type != XEN_HVM_INIT_CANCELLED_SUSPEND) {
> +		xen_hvm_init_shared_info_page();
> +	}
>  	xen_hvm_disable_emulated_devices();

I'm not sure I follow why this change to xen_hvm_init is needed, on a cancelled
suspend you shouldn't need to re-set the callback vector or unplug any emulated
devices.

>  
> @@ -361,7 +364,8 @@ xen_hvm_resume(bool suspend_cancelled)
>  	    XEN_HVM_INIT_CANCELLED_SUSPEND : XEN_HVM_INIT_RESUME);
>  
>  	/* Register vcpu_info area for CPU#0. */
> -	xen_hvm_cpu_init();
> +	if(!suspend_cancelled)
> +		xen_hvm_cpu_init();

Seeing this here, why don't we just avoid calling xen_hvm_resume from
xctrl_suspend on a cancelled suspend?

>   
>  static void
> diff --git a/xen/xenbus/xenbusb.c b/xen/xenbus/xenbusb.c
> index 3a0e543..fa9ba61 100644
> --- a/xen/xenbus/xenbusb.c
> +++ b/xen/xenbus/xenbusb.c
> @@ -791,29 +791,32 @@ xenbusb_resume(device_t dev)
>  			if (device_get_state(kids[i]) == DS_NOTPRESENT)
>  				continue;
>  
> -			ivars = device_get_ivars(kids[i]);
> +			if(!g_suspend_cancelled) {
> +				ivars = device_get_ivars(kids[i]);
>  
> -			xs_unregister_watch(&ivars->xd_otherend_watch);
> -			xenbus_set_state(kids[i], XenbusStateInitialising);
> +				xs_unregister_watch(&ivars->xd_otherend_watch);
> +				xenbus_set_state(kids[i], XenbusStateInitialising);
>  
> -			/*
> -			 * Find the new backend details and
> -			 * re-register our watch.
> -			 */
> -			error = XENBUSB_GET_OTHEREND_NODE(dev, ivars);
> -			if (error)
> -				return (error);
> +				/*
> +				 * Find the new backend details and
> +				 * re-register our watch.
> +				 */
> +				error = XENBUSB_GET_OTHEREND_NODE(dev, ivars);
> +				if (error)
> +					return (error);
>  
> -			statepath = malloc(ivars->xd_otherend_path_len
> -			    + strlen("/state") + 1, M_XENBUS, M_WAITOK);
> -			sprintf(statepath, "%s/state", ivars->xd_otherend_path);
> +				statepath = malloc(ivars->xd_otherend_path_len
> +				    + strlen("/state") + 1, M_XENBUS, M_WAITOK);
> +				sprintf(statepath, "%s/state", ivars->xd_otherend_path);
>  
> -			free(ivars->xd_otherend_watch.node, M_XENBUS);
> -			ivars->xd_otherend_watch.node = statepath;
> +				free(ivars->xd_otherend_watch.node, M_XENBUS);
> +				ivars->xd_otherend_watch.node = statepath;
> +			}
>  
>  			DEVICE_RESUME(kids[i]);
>  
> -			xs_register_watch(&ivars->xd_otherend_watch);
> +			if(!g_suspend_cancelled)
> +				xs_register_watch(&ivars->xd_otherend_watch);
>  #if 0

Why don't you just add:

if (xenbusb_suspend_cancelled == 1) {
	DEVICE_RESUME(kids[i]);
	continue;
}

To the top of the loop? This would prevent adding a bunch of nested
conditionals.

>  			/*
>  			 * Can't do this yet since we are running in
> diff --git a/xen/xenbus/xenbusvar.h b/xen/xenbus/xenbusvar.h
> index 377d60c..02a5bfa 100644
> --- a/xen/xenbus/xenbusvar.h
> +++ b/xen/xenbus/xenbusvar.h
> @@ -99,6 +99,8 @@ XENBUS_ACCESSOR(state,		STATE,			enum xenbus_state)
>  XENBUS_ACCESSOR(otherend_id,	OTHEREND_ID,		int)
>  XENBUS_ACCESSOR(otherend_path,	OTHEREND_PATH,		const char *)
>  
> +bool g_suspend_cancelled;

This should be called xen_suspend_cancelled, and it would be better to place it
in xen/xen-os.h instead. Also I'm not sure how that even compiles, isn't this
missing an "extern" keyword in front?

> +
>  /**
>   * Return the state of a XenBus device.
>   *
> -- 
> 2.11.0.windows.3
---
> From f4a0660ce77f72e0d0c08f3316ed4d8f78f1b8ab Mon Sep 17 00:00:00 2001
> From: liuyingdong <liuyingdong@huawei.com>
> Date: Thu, 2 Feb 2017 18:51:08 +0200
> Subject: If there is a user process which maybe often reads and writes xenstore, the application has been suspended 
> after stop_all_proc is called. It held xs.request_mutex lock but in the following functions 
> xs_write and xs_suspend will not get the lock. So the VM will hang. 
> In order to prevent this from happening, this process need wait 
> until stop_all_proc has finished during live migration.
> 
> ---
>  dev/xen/control/control.c        |  2 ++
>  dev/xen/xenstore/xenstore.c      | 31 +++++++++++++++++++++++++++++++
>  dev/xen/xenstore/xenstore_dev.c  |  7 +++++++
>  xen/xenstore/xenstore_internal.h |  4 ++++
>  4 files changed, 44 insertions(+)
> 
> diff --git a/dev/xen/control/control.c b/dev/xen/control/control.c
> index ae13c6c..58fefcc 100644
> --- a/dev/xen/control/control.c
> +++ b/dev/xen/control/control.c
> @@ -147,6 +147,7 @@ __FBSDID("$FreeBSD$");
>  #include <xen/interface/grant_table.h>
>  
>  #include <xen/xenbus/xenbusvar.h>
> +#include <xen/xenstore/xenstore_internal.h>
>  
>  /*--------------------------- Forward Declarations --------------------------*/
>  /** Function signature for shutdown event handlers. */
> @@ -199,7 +199,9 @@ xctrl_suspend()
>  	int suspend_cancelled;
>  
>  	EVENTHANDLER_INVOKE(power_suspend_early);
> +	xs_lock();
>  	stop_all_proc();
> +	xs_unlock();

This seems fine, although I think that a better solution would be to simply
don't use such kind of locks, and allow concurrent access to the ring by
properly using the xenstore ids, but that involves a non-trivial amount of
code.

>  	EVENTHANDLER_INVOKE(power_suspend);
>  
>  #ifdef EARLY_AP_STARTUP
> diff --git a/dev/xen/xenstore/xenstore.c b/dev/xen/xenstore/xenstore.c
> index 4f89b74..d44f064 100644
> --- a/dev/xen/xenstore/xenstore.c
> +++ b/dev/xen/xenstore/xenstore.c
> @@ -1255,6 +1255,37 @@ xs_suspend(device_t dev)
>  	return (0);
>  }
>  
> +int
> +xs_try_lock(void)
> +{
> +	/*
> +	sx_try_slock() and sx_try_xlock() will return 0 if the shared/exclusive
> +		 lock cannot be acquired immediately; otherwise the shared/exclusive lock
> +		 will be acquired and a non-zero value will be returned.
> +	*/
> +	return sx_try_xlock(&xs.request_mutex);
> +}

I don't think you need this at all, see below...

[...]
> diff --git a/dev/xen/xenstore/xenstore_dev.c b/dev/xen/xenstore/xenstore_dev.c
> index ce62140..d6b97c0 100644
> --- a/dev/xen/xenstore/xenstore_dev.c
> +++ b/dev/xen/xenstore/xenstore_dev.c
> @@ -128,6 +128,13 @@ xs_dev_write(struct cdev *dev, struct uio *uio, int ioflag)
>  	if (error != 0)
>  		return (error);
>  
> +	while(!xs_try_lock()) {
> +		error = tsleep(u, PCATCH, "xsdwrite", hz/10);
> +		if (error && error != EWOULDBLOCK)
> +			return (error);
> +	}
> +	xs_unlock();

Why do you need this try_lock/unlock construction here? It is not protecting
you against anything, the more that you don't perform any actual work with the
lock held.

>  	if ((len + u->len) > sizeof(u->u.buffer))
>  		return (EINVAL);
>  
> diff --git a/xen/xenstore/xenstore_internal.h b/xen/xenstore/xenstore_internal.h
> index 3355c27..31db935 100644
> --- a/xen/xenstore/xenstore_internal.h
> +++ b/xen/xenstore/xenstore_internal.h
> @@ -34,3 +34,7 @@
>  
>  /* Used by the XenStore character device to borrow kernel's store connection. */
>  int xs_dev_request_and_reply(struct xsd_sockmsg *msg, void **result);
> +
> +int xs_try_lock(void);
> +void xs_lock(void);
> +void xs_unlock(void);
> -- 
> 2.11.0.windows.3
---
> From 78215c0b117fb7b651cf7e57a6a323407413ddde Mon Sep 17 00:00:00 2001
> From: liuyingdong <liuyingdong@huawei.com>
> Date: Thu, 2 Feb 2017 19:52:33 +0200
> Subject: Because wrong order of device resume, VM will hang after live migration. 
> attach the Xen PV timer to the nexus directly as it was done before.
> 
> ---
>  dev/xen/timer/timer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/dev/xen/timer/timer.c b/dev/xen/timer/timer.c
> index 0b26847..ae8f603 100644
> --- a/dev/xen/timer/timer.c
> +++ b/dev/xen/timer/timer.c
> @@ -546,5 +546,5 @@ static driver_t xentimer_driver = {
>  	sizeof(struct xentimer_softc),
>  };
>  
> -DRIVER_MODULE(xentimer, xenpv, xentimer_driver, xentimer_devclass, 0, 0);
> -MODULE_DEPEND(xentimer, xenpv, 1, 1, 1);
> +DRIVER_MODULE(xentimer, nexus, xentimer_driver, xentimer_devclass, 0, 0);
> +MODULE_DEPEND(xentimer, nexus, 1, 1, 1);

No, this is not how this should be solved. I understand that you might not want
to implement the full solution proposed in freebsd-arch [0], but going back to
Xen devices attaching directly to the nexus is also not an option IMHO.

What I would accept is marking the Xen time-counter as not safe for suspension,
so that FreeBSD itself will switch to a different timer when suspending. This
simply means removing the TC_FLAGS_SUSPEND_SAFE from the timecounter flags.
Please add a "FIXME" comment, explaining why this is disabled, and when it
could be enabled again. Could you please test that, and see if it solves your
issues?

Thanks, Roger

[0] https://lists.freebsd.org/pipermail/freebsd-arch/2016-December/018041.html



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170210151225.ez2upgqws3pfjxa2>