Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Feb 2012 01:45:49 +0000 (UTC)
From:      "Justin T. Gibbs" <gibbs@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-9@freebsd.org
Subject:   svn commit: r231851 - in stable/9/sys: dev/xen/blkback dev/xen/blkfront xen/interface/io xen/xenbus
Message-ID:  <201202170145.q1H1jnQ3044170@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: gibbs
Date: Fri Feb 17 01:45:49 2012
New Revision: 231851
URL: http://svn.freebsd.org/changeset/base/231851

Log:
  MFC r231743,r231837,r231839: Xen PV block interface enhancements
  
  r231743
  =======
  Enhance documentation, improve interoperability, and fix defects in
  FreeBSD's front and back Xen blkif interface drivers.
  
  sys/dev/xen/blkfront/block.h:
  sys/dev/xen/blkfront/blkfront.c:
  sys/dev/xen/blkback/blkback.c:
  	Replace FreeBSD specific multi-page ring impelementation with
  	support for both the Citrix and Amazon/RedHat versions of this
  	extension.
  
  sys/dev/xen/blkfront/blkfront.c:
  	o Add a per-instance sysctl tree that exposes all negotiated
  	  transport parameters (ring pages, max number of requests,
  	  max request size, max number of segments).
  	o In blkfront_vdevice_to_unit() add a missing return statement
  	  so that we properly identify the unit number for high numbered
  	  xvd devices.
  
  sys/dev/xen/blkback/blkback.c:
  	o Add static dtrace probes for several events in this driver.
  	o Defer connection shutdown processing until the front-end
  	  enters the closed state.  This avoids prematurely tearing
  	  down the connection when buggy front-ends transition to the
  	  closing state, even though the device is open and they
  	  veto the close request from the tool stack.
  	o Add nodes for maximum request size and the number of active
  	  ring pages to the exising, per-instance, sysctl tree.
  	o Miscelaneous style cleanup.
  
  sys/xen/interface/io/blkif.h:
  	o Add extensive documentation of the XenStore nodes used to
  	  implement the blkif interface.
  	o Document the startup sequence between a front and back driver.
  	o Add structures and documenatation for the "discard" feature
  	  (AKA Trim).
  	o Cleanup some definitions related to FreeBSD's request
  	  number/size/segment-limit extension.
  
  sys/dev/xen/blkfront/blkfront.c:
  sys/dev/xen/blkback/blkback.c:
  sys/xen/xenbus/xenbusvar.h:
  	Add the convenience function xenbus_get_otherend_state() and
  	use it to simplify some logic in both block-front and block-back.
  
  r231837
  =======
  sys/dev/xen/blkback/blkback.c:
  	Fix typo in a printf string: "specificed" -> "specified".
  
  r231839
  =======
  Fix a bug in the calculation of the maximum I/O request size.
  The previous code did not limit the I/O request size based on
  the maximum number of segments supported by the back-end.  In
  current practice, since the only back-end supporting chained
  requests is the FreeBSD implementation, this limit was never
  exceeded.
  
  sys/dev/xen/blkfront/block.h:
  	Add two macros, XBF_SEGS_TO_SIZE() and XBF_SIZE_TO_SEGS(),
  	to centralize the logic of reserving a segment to deal with
  	non-page-aligned I/Os.
  
  sys/dev/xen/blkfront/blkfront.c:
  	o When negotiating transfer parameters, limit the
  	  max_request_size we use and publish, if it is greater
  	  than the maximum, unaligned, I/O we can support with
  	  the number of segments advertised by the backend.
  	o Don't unilaterally reduce the I/O size published to
  	  the disk layer by a single page.  max_request_size
  	  is already properly limited in the transfer parameter
  	  negotiation code.
  	o Fix typos in printf strings:
  		"max_requests_segments" -> "max_request_segments"
  		"specificed" -> "specified"

Modified:
  stable/9/sys/dev/xen/blkback/blkback.c
  stable/9/sys/dev/xen/blkfront/blkfront.c
  stable/9/sys/dev/xen/blkfront/block.h
  stable/9/sys/xen/interface/io/blkif.h
  stable/9/sys/xen/xenbus/xenbusvar.h
Directory Properties:
  stable/9/sys/   (props changed)

Modified: stable/9/sys/dev/xen/blkback/blkback.c
==============================================================================
--- stable/9/sys/dev/xen/blkback/blkback.c	Fri Feb 17 01:23:58 2012	(r231850)
+++ stable/9/sys/dev/xen/blkback/blkback.c	Fri Feb 17 01:45:49 2012	(r231851)
@@ -40,6 +40,8 @@ __FBSDID("$FreeBSD$");
  *        a FreeBSD domain to other domains.
  */
 
+#include "opt_kdtrace.h"
+
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
@@ -63,6 +65,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/mount.h>
 #include <sys/sysctl.h>
 #include <sys/bitstring.h>
+#include <sys/sdt.h>
 
 #include <geom/geom.h>
 
@@ -124,7 +127,7 @@ __FBSDID("$FreeBSD$");
 MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "Xen Block Back Driver Data");
 
 #ifdef XBB_DEBUG
-#define DPRINTF(fmt, args...) \
+#define DPRINTF(fmt, args...)					\
     printf("xbb(%s:%d): " fmt, __FUNCTION__, __LINE__, ##args)
 #else
 #define DPRINTF(fmt, args...) do {} while(0)
@@ -134,7 +137,7 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X
  * The maximum mapped region size per request we will allow in a negotiated
  * block-front/back communication channel.
  */
-#define	XBB_MAX_REQUEST_SIZE		\
+#define	XBB_MAX_REQUEST_SIZE					\
 	MIN(MAXPHYS, BLKIF_MAX_SEGMENTS_PER_REQUEST * PAGE_SIZE)
 
 /**
@@ -142,9 +145,9 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X
  * segment blocks) per request we will allow in a negotiated block-front/back
  * communication channel.
  */
-#define	XBB_MAX_SEGMENTS_PER_REQUEST			\
-	(MIN(UIO_MAXIOV,				\
-	     MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,	\
+#define	XBB_MAX_SEGMENTS_PER_REQUEST				\
+	(MIN(UIO_MAXIOV,					\
+	     MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,		\
 		 (XBB_MAX_REQUEST_SIZE / PAGE_SIZE) + 1)))
 
 /**
@@ -980,9 +983,10 @@ xbb_get_gntaddr(struct xbb_xen_reqlist *
 static uint8_t *
 xbb_get_kva(struct xbb_softc *xbb, int nr_pages)
 {
-	intptr_t first_clear, num_clear;
+	intptr_t first_clear;
+	intptr_t num_clear;
 	uint8_t *free_kva;
-	int i;
+	int      i;
 
 	KASSERT(nr_pages != 0, ("xbb_get_kva of zero length"));
 
@@ -1681,19 +1685,19 @@ xbb_dispatch_io(struct xbb_softc *xbb, s
 			req_ring_idx++;
 			switch (xbb->abi) {
 			case BLKIF_PROTOCOL_NATIVE:
-				sg = BLKRING_GET_SG_REQUEST(&xbb->rings.native,
-							    req_ring_idx);
+				sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.native,
+							   req_ring_idx);
 				break;
 			case BLKIF_PROTOCOL_X86_32:
 			{
-				sg = BLKRING_GET_SG_REQUEST(&xbb->rings.x86_32,
-							    req_ring_idx);
+				sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.x86_32,
+							   req_ring_idx);
 				break;
 			}
 			case BLKIF_PROTOCOL_X86_64:
 			{
-				sg = BLKRING_GET_SG_REQUEST(&xbb->rings.x86_64,
-							    req_ring_idx);
+				sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.x86_64,
+							   req_ring_idx);
 				break;
 			}
 			default:
@@ -1817,8 +1821,8 @@ xbb_run_queue(void *context, int pending
 	struct xbb_xen_reqlist *reqlist;
 
 
-	xbb	      = (struct xbb_softc *)context;
-	rings	      = &xbb->rings;
+	xbb   = (struct xbb_softc *)context;
+	rings = &xbb->rings;
 
 	/*
 	 * Work gather and dispatch loop.  Note that we have a bias here
@@ -2032,6 +2036,13 @@ xbb_intr(void *arg)
 	taskqueue_enqueue(xbb->io_taskqueue, &xbb->io_task); 
 }
 
+SDT_PROVIDER_DEFINE(xbb);
+SDT_PROBE_DEFINE1(xbb, kernel, xbb_dispatch_dev, flush, flush, "int");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_dev, read, read, "int", "uint64_t",
+		  "uint64_t");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_dev, write, write, "int",
+		  "uint64_t", "uint64_t");
+
 /*----------------------------- Backend Handlers -----------------------------*/
 /**
  * Backend handler for character device access.
@@ -2087,6 +2098,9 @@ xbb_dispatch_dev(struct xbb_softc *xbb, 
 
 		nreq->pendcnt	 = 1;
 
+		SDT_PROBE1(xbb, kernel, xbb_dispatch_dev, flush,
+			   device_get_unit(xbb->dev));
+
 		(*dev_data->csw->d_strategy)(bio);
 
 		return (0);
@@ -2181,6 +2195,17 @@ xbb_dispatch_dev(struct xbb_softc *xbb, 
 			       bios[bio_idx]->bio_bcount);
 		}
 #endif
+		if (operation == BIO_READ) {
+			SDT_PROBE3(xbb, kernel, xbb_dispatch_dev, read,
+				   device_get_unit(xbb->dev),
+				   bios[bio_idx]->bio_offset,
+				   bios[bio_idx]->bio_length);
+		} else if (operation == BIO_WRITE) {
+			SDT_PROBE3(xbb, kernel, xbb_dispatch_dev, write,
+				   device_get_unit(xbb->dev),
+				   bios[bio_idx]->bio_offset,
+				   bios[bio_idx]->bio_length);
+		}
 		(*dev_data->csw->d_strategy)(bios[bio_idx]);
 	}
 
@@ -2193,6 +2218,12 @@ fail_free_bios:
 	return (error);
 }
 
+SDT_PROBE_DEFINE1(xbb, kernel, xbb_dispatch_file, flush, flush, "int");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_file, read, read, "int", "uint64_t",
+		  "uint64_t");
+SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_file, write, write, "int",
+		  "uint64_t", "uint64_t");
+
 /**
  * Backend handler for file access.
  *
@@ -2237,6 +2268,9 @@ xbb_dispatch_file(struct xbb_softc *xbb,
 	case BIO_FLUSH: {
 		struct mount *mountpoint;
 
+		SDT_PROBE1(xbb, kernel, xbb_dispatch_file, flush,
+			   device_get_unit(xbb->dev));
+
 		vfs_is_locked = VFS_LOCK_GIANT(xbb->vn->v_mount);
 
 		(void) vn_start_write(xbb->vn, &mountpoint, V_WAIT);
@@ -2336,6 +2370,10 @@ xbb_dispatch_file(struct xbb_softc *xbb,
 	switch (operation) {
 	case BIO_READ:
 
+		SDT_PROBE3(xbb, kernel, xbb_dispatch_file, read,
+			   device_get_unit(xbb->dev), xuio.uio_offset,
+			   xuio.uio_resid);
+
 		vn_lock(xbb->vn, LK_EXCLUSIVE | LK_RETRY);
 
 		/*
@@ -2366,6 +2404,10 @@ xbb_dispatch_file(struct xbb_softc *xbb,
 	case BIO_WRITE: {
 		struct mount *mountpoint;
 
+		SDT_PROBE3(xbb, kernel, xbb_dispatch_file, write,
+			   device_get_unit(xbb->dev), xuio.uio_offset,
+			   xuio.uio_resid);
+
 		(void)vn_start_write(xbb->vn, &mountpoint, V_WAIT);
 
 		vn_lock(xbb->vn, LK_EXCLUSIVE | LK_RETRY);
@@ -3028,6 +3070,8 @@ xbb_collect_frontend_info(struct xbb_sof
 	const char *otherend_path;
 	int	    error;
 	u_int	    ring_idx;
+	u_int	    ring_page_order;
+	size_t	    ring_size;
 
 	otherend_path = xenbus_get_otherend_path(xbb->dev);
 
@@ -3035,23 +3079,19 @@ xbb_collect_frontend_info(struct xbb_sof
 	 * Protocol defaults valid even if all negotiation fails.
 	 */
 	xbb->ring_config.ring_pages = 1;
-	xbb->max_requests	    = BLKIF_MAX_RING_REQUESTS(PAGE_SIZE);
 	xbb->max_request_segments   = BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK;
 	xbb->max_request_size	    = xbb->max_request_segments * PAGE_SIZE;
 
 	/*
 	 * Mandatory data (used in all versions of the protocol) first.
 	 */
-	error = xs_gather(XST_NIL, otherend_path,
-			  "ring-ref", "%" PRIu32,
-			  &xbb->ring_config.ring_ref[0],
-			  "event-channel", "%" PRIu32,
-			  &xbb->ring_config.evtchn,
-			  NULL);
+	error = xs_scanf(XST_NIL, otherend_path,
+			 "event-channel", NULL, "%" PRIu32,
+			 &xbb->ring_config.evtchn);
 	if (error != 0) {
 		xenbus_dev_fatal(xbb->dev, error,
-				 "Unable to retrieve ring information from "
-				 "frontend %s.  Unable to connect.",
+				 "Unable to retrieve event-channel information "
+				 "from frontend %s.  Unable to connect.",
 				 xenbus_get_otherend_path(xbb->dev));
 		return (error);
 	}
@@ -3065,10 +3105,20 @@ xbb_collect_frontend_info(struct xbb_sof
 	 *       we must use independant calls in order to guarantee
 	 *       we don't miss information in a sparsly populated front-end
 	 *       tree.
+	 *
+	 * \note xs_scanf() does not update variables for unmatched
+	 *       fields.
 	 */
+	ring_page_order = 0;
 	(void)xs_scanf(XST_NIL, otherend_path,
-		       "ring-pages", NULL, "%u",
+		       "ring-page-order", NULL, "%u",
+		       &ring_page_order);
+	xbb->ring_config.ring_pages = 1 << ring_page_order;
+	(void)xs_scanf(XST_NIL, otherend_path,
+		       "num-ring-pages", NULL, "%u",
 		       &xbb->ring_config.ring_pages);
+	ring_size = PAGE_SIZE * xbb->ring_config.ring_pages;
+	xbb->max_requests = BLKIF_MAX_RING_REQUESTS(ring_size);
 
 	(void)xs_scanf(XST_NIL, otherend_path,
 		       "max-requests", NULL, "%u",
@@ -3084,7 +3134,7 @@ xbb_collect_frontend_info(struct xbb_sof
 
 	if (xbb->ring_config.ring_pages	> XBB_MAX_RING_PAGES) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed ring-pages of %u "
+				 "Front-end specified ring-pages of %u "
 				 "exceeds backend limit of %zu.  "
 				 "Unable to connect.",
 				 xbb->ring_config.ring_pages,
@@ -3092,7 +3142,7 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	} else if (xbb->max_requests > XBB_MAX_REQUESTS) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed max_requests of %u "
+				 "Front-end specified max_requests of %u "
 				 "exceeds backend limit of %u.  "
 				 "Unable to connect.",
 				 xbb->max_requests,
@@ -3100,7 +3150,7 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	} else if (xbb->max_request_segments > XBB_MAX_SEGMENTS_PER_REQUEST) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed max_requests_segments "
+				 "Front-end specified max_requests_segments "
 				 "of %u exceeds backend limit of %u.  "
 				 "Unable to connect.",
 				 xbb->max_request_segments,
@@ -3108,7 +3158,7 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	} else if (xbb->max_request_size > XBB_MAX_REQUEST_SIZE) {
 		xenbus_dev_fatal(xbb->dev, EINVAL,
-				 "Front-end specificed max_request_size "
+				 "Front-end specified max_request_size "
 				 "of %u exceeds backend limit of %u.  "
 				 "Unable to connect.",
 				 xbb->max_request_size,
@@ -3116,22 +3166,39 @@ xbb_collect_frontend_info(struct xbb_sof
 		return (EINVAL);
 	}
 
-	/* If using a multi-page ring, pull in the remaining references. */
-	for (ring_idx = 1; ring_idx < xbb->ring_config.ring_pages; ring_idx++) {
-		char ring_ref_name[]= "ring_refXX";
-
-		snprintf(ring_ref_name, sizeof(ring_ref_name),
-			 "ring-ref%u", ring_idx);
-		error = xs_scanf(XST_NIL, otherend_path,
-				 ring_ref_name, NULL, "%" PRIu32,
-			         &xbb->ring_config.ring_ref[ring_idx]);
+	if (xbb->ring_config.ring_pages	== 1) {
+		error = xs_gather(XST_NIL, otherend_path,
+				  "ring-ref", "%" PRIu32,
+				  &xbb->ring_config.ring_ref[0],
+				  NULL);
 		if (error != 0) {
 			xenbus_dev_fatal(xbb->dev, error,
-					 "Failed to retriev grant reference "
-					 "for page %u of shared ring.  Unable "
-					 "to connect.", ring_idx);
+					 "Unable to retrieve ring information "
+					 "from frontend %s.  Unable to "
+					 "connect.",
+					 xenbus_get_otherend_path(xbb->dev));
 			return (error);
 		}
+	} else {
+		/* Multi-page ring format. */
+		for (ring_idx = 0; ring_idx < xbb->ring_config.ring_pages;
+		     ring_idx++) {
+			char ring_ref_name[]= "ring_refXX";
+
+			snprintf(ring_ref_name, sizeof(ring_ref_name),
+				 "ring-ref%u", ring_idx);
+			error = xs_scanf(XST_NIL, otherend_path,
+					 ring_ref_name, NULL, "%" PRIu32,
+					 &xbb->ring_config.ring_ref[ring_idx]);
+			if (error != 0) {
+				xenbus_dev_fatal(xbb->dev, error,
+						 "Failed to retriev grant "
+						 "reference for page %u of "
+						 "shared ring.  Unable "
+						 "to connect.", ring_idx);
+				return (error);
+			}
+		}
 	}
 
 	error = xs_gather(XST_NIL, otherend_path,
@@ -3197,8 +3264,8 @@ xbb_alloc_requests(struct xbb_softc *xbb
 static int
 xbb_alloc_request_lists(struct xbb_softc *xbb)
 {
-	int i;
 	struct xbb_xen_reqlist *reqlist;
+	int			i;
 
 	/*
 	 * If no requests can be merged, we need 1 request list per
@@ -3318,7 +3385,7 @@ xbb_publish_backend_info(struct xbb_soft
 static void
 xbb_connect(struct xbb_softc *xbb)
 {
-	int		      error;
+	int error;
 
 	if (xenbus_get_state(xbb->dev) == XenbusStateConnected)
 		return;
@@ -3399,7 +3466,8 @@ xbb_connect(struct xbb_softc *xbb)
 static int
 xbb_shutdown(struct xbb_softc *xbb)
 {
-	int error;
+	XenbusState frontState;
+	int	    error;
 
 	DPRINTF("\n");
 
@@ -3413,6 +3481,20 @@ xbb_shutdown(struct xbb_softc *xbb)
 	if ((xbb->flags & XBBF_IN_SHUTDOWN) != 0)
 		return (EAGAIN);
 
+	xbb->flags |= XBBF_IN_SHUTDOWN;
+	mtx_unlock(&xbb->lock);
+
+	if (xenbus_get_state(xbb->dev) < XenbusStateClosing)
+		xenbus_set_state(xbb->dev, XenbusStateClosing);
+
+	frontState = xenbus_get_otherend_state(xbb->dev);
+	mtx_lock(&xbb->lock);
+	xbb->flags &= ~XBBF_IN_SHUTDOWN;
+
+	/* The front can submit I/O until entering the closed state. */
+	if (frontState < XenbusStateClosed)
+		return (EAGAIN);
+
 	DPRINTF("\n");
 
 	/* Indicate shutdown is in progress. */
@@ -3434,19 +3516,6 @@ xbb_shutdown(struct xbb_softc *xbb)
 
 	DPRINTF("\n");
 
-	/*
-	 * Before unlocking mutex, set this flag to prevent other threads from
-	 * getting into this function
-	 */
-	xbb->flags |= XBBF_IN_SHUTDOWN;
-	mtx_unlock(&xbb->lock);
-
-	if (xenbus_get_state(xbb->dev) < XenbusStateClosing)
-		xenbus_set_state(xbb->dev, XenbusStateClosing);
-
-	mtx_lock(&xbb->lock);
-	xbb->flags &= ~XBBF_IN_SHUTDOWN;
-
 	/* Indicate to xbb_detach() that is it safe to proceed. */
 	wakeup(xbb);
 
@@ -3573,6 +3642,16 @@ xbb_setup_sysctl(struct xbb_softc *xbb)
 		        "max_request_segments", CTLFLAG_RD,
 		        &xbb->max_request_segments, 0,
 		        "maximum number of pages per requests (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_request_size", CTLFLAG_RD,
+		        &xbb->max_request_size, 0,
+		        "maximum size in bytes of a request (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "ring_pages", CTLFLAG_RD,
+		        &xbb->ring_config.ring_pages, 0,
+		        "communication channel pages (negotiated)");
 }
 
 /**
@@ -3587,6 +3666,7 @@ xbb_attach(device_t dev)
 {
 	struct xbb_softc	*xbb;
 	int			 error;
+	u_int			 max_ring_page_order;
 
 	DPRINTF("Attaching to %s\n", xenbus_get_node(dev));
 
@@ -3621,6 +3701,10 @@ xbb_attach(device_t dev)
 		return (error);
 	}
 
+	/*
+	 * Amazon EC2 client compatility.  They refer to max-ring-pages
+	 * instead of to max-ring-page-order.
+	 */
 	error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev),
 			  "max-ring-pages", "%zu", XBB_MAX_RING_PAGES);
 	if (error) {
@@ -3629,6 +3713,15 @@ xbb_attach(device_t dev)
 		return (error);
 	}
 
+	max_ring_page_order = flsl(XBB_MAX_RING_PAGES) - 1;
+	error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev),
+			  "max-ring-page-order", "%u", max_ring_page_order);
+	if (error) {
+		xbb_attach_failed(xbb, error, "writing %s/max-ring-page-order",
+				  xenbus_get_node(xbb->dev));
+		return (error);
+	}
+
 	error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev),
 			  "max-requests", "%u", XBB_MAX_REQUESTS);
 	if (error) {
@@ -3862,12 +3955,16 @@ xbb_frontend_changed(device_t dev, Xenbu
 		xbb_connect(xbb);
 		break;
 	case XenbusStateClosing:
+		/*
+		 * Frontend has acknowledged Closing request.
+		 * Wait for Closed state.
+		 */
+		break;
 	case XenbusStateClosed:
 		mtx_lock(&xbb->lock);
 		xbb_shutdown(xbb);
 		mtx_unlock(&xbb->lock);
-		if (frontend_state == XenbusStateClosed)
-			xenbus_set_state(xbb->dev, XenbusStateClosed);
+		xenbus_set_state(xbb->dev, XenbusStateClosed);
 		break;
 	default:
 		xenbus_dev_fatal(xbb->dev, EINVAL, "saw state %d at frontend",

Modified: stable/9/sys/dev/xen/blkfront/blkfront.c
==============================================================================
--- stable/9/sys/dev/xen/blkfront/blkfront.c	Fri Feb 17 01:23:58 2012	(r231850)
+++ stable/9/sys/dev/xen/blkfront/blkfront.c	Fri Feb 17 01:45:49 2012	(r231851)
@@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/bus.h>
 #include <sys/conf.h>
 #include <sys/module.h>
+#include <sys/sysctl.h>
 
 #include <machine/bus.h>
 #include <sys/rman.h>
@@ -139,7 +140,7 @@ static	int	xb_dump(void *, void *, vm_of
  * with blkfront as the emulated drives, easing transition slightly.
  */
 static void
-blkfront_vdevice_to_unit(int vdevice, int *unit, const char **name)
+blkfront_vdevice_to_unit(uint32_t vdevice, int *unit, const char **name)
 {
 	static struct vdev_info {
 		int major;
@@ -186,6 +187,7 @@ blkfront_vdevice_to_unit(int vdevice, in
 	if (vdevice & (1 << 28)) {
 		*unit = (vdevice & ((1 << 28) - 1)) >> 8;
 		*name = "xbd";
+		return;
 	}
 
 	for (i = 0; info[i].major; i++) {
@@ -407,6 +409,40 @@ blkfront_probe(device_t dev)
 	return (ENXIO);
 }
 
+static void
+xb_setup_sysctl(struct xb_softc *xb)
+{
+	struct sysctl_ctx_list *sysctl_ctx = NULL;
+	struct sysctl_oid      *sysctl_tree = NULL;
+	
+	sysctl_ctx = device_get_sysctl_ctx(xb->xb_dev);
+	if (sysctl_ctx == NULL)
+		return;
+
+	sysctl_tree = device_get_sysctl_tree(xb->xb_dev);
+	if (sysctl_tree == NULL)
+		return;
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_requests", CTLFLAG_RD, &xb->max_requests, -1,
+		        "maximum outstanding requests (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_request_segments", CTLFLAG_RD,
+		        &xb->max_request_segments, 0,
+		        "maximum number of pages per requests (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "max_request_size", CTLFLAG_RD,
+		        &xb->max_request_size, 0,
+		        "maximum size in bytes of a request (negotiated)");
+
+	SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO,
+		        "ring_pages", CTLFLAG_RD,
+		        &xb->ring_pages, 0,
+		        "communication channel pages (negotiated)");
+}
+
 /*
  * Setup supplies the backend dir, virtual device.  We place an event
  * channel and shared frame entries.  We watch backend to wait if it's
@@ -417,14 +453,14 @@ blkfront_attach(device_t dev)
 {
 	struct xb_softc *sc;
 	const char *name;
+	uint32_t vdevice;
 	int error;
-	int vdevice;
 	int i;
 	int unit;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	error = xs_scanf(XST_NIL, xenbus_get_node(dev),
-	    "virtual-device", NULL, "%i", &vdevice);
+	    "virtual-device", NULL, "%" PRIu32, &vdevice);
 	if (error) {
 		xenbus_dev_fatal(dev, error, "reading virtual-device");
 		device_printf(dev, "Couldn't determine virtual device.\n");
@@ -449,6 +485,8 @@ blkfront_attach(device_t dev)
 	sc->vdevice = vdevice;
 	sc->connected = BLKIF_STATE_DISCONNECTED;
 
+	xb_setup_sysctl(sc);
+
 	/* Wait for backend device to publish its protocol capabilities. */
 	xenbus_set_state(dev, XenbusStateInitialising);
 
@@ -501,6 +539,7 @@ blkfront_initialize(struct xb_softc *sc)
 {
 	const char *otherend_path;
 	const char *node_path;
+	uint32_t max_ring_page_order;
 	int error;
 	int i;
 
@@ -513,10 +552,10 @@ blkfront_initialize(struct xb_softc *sc)
 	 * Protocol defaults valid even if negotiation for a
 	 * setting fails.
 	 */
+	max_ring_page_order = 0;
 	sc->ring_pages = 1;
-	sc->max_requests = BLKIF_MAX_RING_REQUESTS(PAGE_SIZE);
 	sc->max_request_segments = BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK;
-	sc->max_request_size = (sc->max_request_segments - 1) * PAGE_SIZE;
+	sc->max_request_size = XBF_SEGS_TO_SIZE(sc->max_request_segments);
 	sc->max_request_blocks = BLKIF_SEGS_TO_BLOCKS(sc->max_request_segments);
 
 	/*
@@ -526,13 +565,25 @@ blkfront_initialize(struct xb_softc *sc)
 	 *       we must use independant calls in order to guarantee
 	 *       we don't miss information in a sparsly populated back-end
 	 *       tree.
+	 *
+	 * \note xs_scanf() does not update variables for unmatched
+	 *	 fields.
 	 */
 	otherend_path = xenbus_get_otherend_path(sc->xb_dev);
 	node_path = xenbus_get_node(sc->xb_dev);
+
+	/* Support both backend schemes for relaying ring page limits. */
+	(void)xs_scanf(XST_NIL, otherend_path,
+		       "max-ring-page-order", NULL, "%" PRIu32,
+		       &max_ring_page_order);
+	sc->ring_pages = 1 << max_ring_page_order;
 	(void)xs_scanf(XST_NIL, otherend_path,
 		       "max-ring-pages", NULL, "%" PRIu32,
 		       &sc->ring_pages);
+	if (sc->ring_pages < 1)
+		sc->ring_pages = 1;
 
+	sc->max_requests = BLKIF_MAX_RING_REQUESTS(sc->ring_pages * PAGE_SIZE);
 	(void)xs_scanf(XST_NIL, otherend_path,
 		       "max-requests", NULL, "%" PRIu32,
 		       &sc->max_requests);
@@ -552,6 +603,16 @@ blkfront_initialize(struct xb_softc *sc)
 		sc->ring_pages = XBF_MAX_RING_PAGES;
 	}
 
+	if (powerof2(sc->ring_pages) == 0) {
+		uint32_t new_page_limit;
+
+		new_page_limit = 0x01 << (fls(sc->ring_pages) - 1);
+		device_printf(sc->xb_dev, "Back-end specified ring-pages of "
+			      "%u is not a power of 2. Limited to %u.\n",
+			      sc->ring_pages, new_page_limit);
+		sc->ring_pages = new_page_limit;
+	}
+
 	if (sc->max_requests > XBF_MAX_REQUESTS) {
 		device_printf(sc->xb_dev, "Back-end specified max_requests of "
 			      "%u limited to front-end limit of %u.\n",
@@ -560,8 +621,8 @@ blkfront_initialize(struct xb_softc *sc)
 	}
 
 	if (sc->max_request_segments > XBF_MAX_SEGMENTS_PER_REQUEST) {
-		device_printf(sc->xb_dev, "Back-end specificed "
-			      "max_requests_segments of %u limited to "
+		device_printf(sc->xb_dev, "Back-end specified "
+			      "max_request_segments of %u limited to "
 			      "front-end limit of %u.\n",
 			      sc->max_request_segments,
 			      XBF_MAX_SEGMENTS_PER_REQUEST);
@@ -569,12 +630,23 @@ blkfront_initialize(struct xb_softc *sc)
 	}
 
 	if (sc->max_request_size > XBF_MAX_REQUEST_SIZE) {
-		device_printf(sc->xb_dev, "Back-end specificed "
+		device_printf(sc->xb_dev, "Back-end specified "
 			      "max_request_size of %u limited to front-end "
 			      "limit of %u.\n", sc->max_request_size,
 			      XBF_MAX_REQUEST_SIZE);
 		sc->max_request_size = XBF_MAX_REQUEST_SIZE;
 	}
+ 
+ 	if (sc->max_request_size > XBF_SEGS_TO_SIZE(sc->max_request_segments)) {
+ 		device_printf(sc->xb_dev, "Back-end specified "
+ 			      "max_request_size of %u limited to front-end "
+ 			      "limit of %u.  (Too few segments.)\n",
+ 			      sc->max_request_size,
+ 			      XBF_SEGS_TO_SIZE(sc->max_request_segments));
+ 		sc->max_request_size =
+ 		    XBF_SEGS_TO_SIZE(sc->max_request_segments);
+ 	}
+
 	sc->max_request_blocks = BLKIF_SEGS_TO_BLOCKS(sc->max_request_segments);
 
 	/* Allocate datastructures based on negotiated values. */
@@ -625,11 +697,20 @@ blkfront_initialize(struct xb_softc *sc)
 	if (setup_blkring(sc) != 0)
 		return;
 
+	/* Support both backend schemes for relaying ring page limits. */
 	error = xs_printf(XST_NIL, node_path,
-			 "ring-pages","%u", sc->ring_pages);
+			 "num-ring-pages","%u", sc->ring_pages);
 	if (error) {
 		xenbus_dev_fatal(sc->xb_dev, error,
-				 "writing %s/ring-pages",
+				 "writing %s/num-ring-pages",
+				 node_path);
+		return;
+	}
+	error = xs_printf(XST_NIL, node_path,
+			 "ring-page-order","%u", fls(sc->ring_pages) - 1);
+	if (error) {
+		xenbus_dev_fatal(sc->xb_dev, error,
+				 "writing %s/ring-page-order",
 				 node_path);
 		return;
 	}
@@ -711,25 +792,31 @@ setup_blkring(struct xb_softc *sc)
 			return (error);
 		}
 	}
-	error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev),
-			  "ring-ref","%u", sc->ring_ref[0]);
-	if (error) {
-		xenbus_dev_fatal(sc->xb_dev, error, "writing %s/ring-ref",
-				 xenbus_get_node(sc->xb_dev));
-		return (error);
-	}
-	for (i = 1; i < sc->ring_pages; i++) {
-		char ring_ref_name[]= "ring_refXX";
-
-		snprintf(ring_ref_name, sizeof(ring_ref_name), "ring-ref%u", i);
+	if (sc->ring_pages == 1) {
 		error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev),
-				 ring_ref_name, "%u", sc->ring_ref[i]);
+				  "ring-ref", "%u", sc->ring_ref[0]);
 		if (error) {
-			xenbus_dev_fatal(sc->xb_dev, error, "writing %s/%s",
-					 xenbus_get_node(sc->xb_dev),
-					 ring_ref_name);
+			xenbus_dev_fatal(sc->xb_dev, error,
+					 "writing %s/ring-ref",
+					 xenbus_get_node(sc->xb_dev));
 			return (error);
 		}
+	} else {
+		for (i = 0; i < sc->ring_pages; i++) {
+			char ring_ref_name[]= "ring_refXX";
+
+			snprintf(ring_ref_name, sizeof(ring_ref_name),
+				 "ring-ref%u", i);
+			error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev),
+					 ring_ref_name, "%u", sc->ring_ref[i]);
+			if (error) {
+				xenbus_dev_fatal(sc->xb_dev, error,
+						 "writing %s/%s",
+						 xenbus_get_node(sc->xb_dev),
+						 ring_ref_name);
+				return (error);
+			}
+		}
 	}
 
 	error = bind_listening_port_to_irqhandler(
@@ -795,7 +882,7 @@ blkfront_connect(struct xb_softc *sc)
 	unsigned int binfo;
 	int err, feature_barrier;
 
-        if( (sc->connected == BLKIF_STATE_CONNECTED) || 
+	if( (sc->connected == BLKIF_STATE_CONNECTED) || 
 	    (sc->connected == BLKIF_STATE_SUSPENDED) )
 		return;
 
@@ -923,15 +1010,13 @@ blkif_close(struct disk *dp)
 		return (ENXIO);
 	sc->xb_flags &= ~XB_OPEN;
 	if (--(sc->users) == 0) {
-		/* Check whether we have been instructed to close.  We will
-		   have ignored this request initially, as the device was
-		   still mounted. */
-		device_t dev = sc->xb_dev;
-		XenbusState state =
-			xenbus_read_driver_state(xenbus_get_otherend_path(dev));
-
-		if (state == XenbusStateClosing)
-			blkfront_closing(dev);
+		/*
+		 * Check whether we have been instructed to close.  We will
+		 * have ignored this request initially, as the device was
+		 * still mounted.
+		 */
+		if (xenbus_get_otherend_state(sc->xb_dev) == XenbusStateClosing)
+			blkfront_closing(sc->xb_dev);
 	}
 	return (0);
 }
@@ -1033,7 +1118,7 @@ blkif_queue_cb(void *arg, bus_dma_segmen
 	struct xb_command *cm;
 	blkif_request_t	*ring_req;
 	struct blkif_request_segment *sg;
-        struct blkif_request_segment *last_block_sg;
+	struct blkif_request_segment *last_block_sg;
 	grant_ref_t *sg_ref;
 	vm_paddr_t buffer_ma;
 	uint64_t fsect, lsect;
@@ -1104,12 +1189,12 @@ blkif_queue_cb(void *arg, bus_dma_segmen
 			nsegs--;
 		}
 		block_segs = MIN(nsegs, BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK);
-                if (block_segs == 0)
-                        break;
+		if (block_segs == 0)
+			break;
 
-                sg = BLKRING_GET_SG_REQUEST(&sc->ring, sc->ring.req_prod_pvt);
+		sg = BLKRING_GET_SEG_BLOCK(&sc->ring, sc->ring.req_prod_pvt);
 		sc->ring.req_prod_pvt++;
-                last_block_sg = sg + block_segs;
+		last_block_sg = sg + block_segs;
 	}
 
 	if (cm->operation == BLKIF_OP_READ)

Modified: stable/9/sys/dev/xen/blkfront/block.h
==============================================================================
--- stable/9/sys/dev/xen/blkfront/block.h	Fri Feb 17 01:23:58 2012	(r231850)
+++ stable/9/sys/dev/xen/blkfront/block.h	Fri Feb 17 01:45:49 2012	(r231851)
@@ -35,6 +35,32 @@
 #include <xen/blkif.h>
 
 /**
+ * Given a number of blkif segments, compute the maximum I/O size supported.
+ *
+ * \note This calculation assumes that all but the first and last segments 
+ *       of the I/O are fully utilized.
+ *
+ * \note We reserve a segement from the maximum supported by the transport to
+ *       guarantee we can handle an unaligned transfer without the need to
+ *       use a bounce buffer.
+ */
+#define	XBF_SEGS_TO_SIZE(segs)						\
+	(((segs) - 1) * PAGE_SIZE)
+
+/**
+ * Compute the maximum number of blkif segments requried to represent
+ * an I/O of the given size.
+ *
+ * \note This calculation assumes that all but the first and last segments
+ *       of the I/O are fully utilized.
+ *
+ * \note We reserve a segement to guarantee we can handle an unaligned
+ *       transfer without the need to use a bounce buffer.
+ */
+#define	XBF_SIZE_TO_SEGS(size)						\
+	((size / PAGE_SIZE) + 1)
+
+/**
  * The maximum number of outstanding requests blocks (request headers plus
  * additional segment blocks) we will allow in a negotiated block-front/back
  * communication channel.
@@ -44,22 +70,18 @@
 /**
  * The maximum mapped region size per request we will allow in a negotiated
  * block-front/back communication channel.
- *
- * \note We reserve a segement from the maximum supported by the transport to
- *       guarantee we can handle an unaligned transfer without the need to
- *       use a bounce buffer..
  */
-#define	XBF_MAX_REQUEST_SIZE		\
-	MIN(MAXPHYS, (BLKIF_MAX_SEGMENTS_PER_REQUEST - 1) * PAGE_SIZE)
+#define	XBF_MAX_REQUEST_SIZE						\
+	MIN(MAXPHYS, XBF_SEGS_TO_SIZE(BLKIF_MAX_SEGMENTS_PER_REQUEST))
 
 /**
  * The maximum number of segments (within a request header and accompanying
  * segment blocks) per request we will allow in a negotiated block-front/back
  * communication channel.
  */
-#define	XBF_MAX_SEGMENTS_PER_REQUEST		\
-	(MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,	\
-	     (XBF_MAX_REQUEST_SIZE / PAGE_SIZE) + 1))
+#define	XBF_MAX_SEGMENTS_PER_REQUEST					\
+	(MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST,				\
+	     XBF_SIZE_TO_SEGS(XBF_MAX_REQUEST_SIZE)))
 
 /**
  * The maximum number of shared memory ring pages we will allow in a

Modified: stable/9/sys/xen/interface/io/blkif.h
==============================================================================
--- stable/9/sys/xen/interface/io/blkif.h	Fri Feb 17 01:23:58 2012	(r231850)
+++ stable/9/sys/xen/interface/io/blkif.h	Fri Feb 17 01:45:49 2012	(r231851)
@@ -1,8 +1,8 @@
 /******************************************************************************
  * blkif.h
- * 
+ *
  * Unified block-device I/O interface for Xen guest OSes.
- * 
+ *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to
  * deal in the Software without restriction, including without limitation the
@@ -22,6 +22,7 @@
  * DEALINGS IN THE SOFTWARE.
  *
  * Copyright (c) 2003-2004, Keir Fraser
+ * Copyright (c) 2012, Spectra Logic Corporation
  */
 
 #ifndef __XEN_PUBLIC_IO_BLKIF_H__
@@ -35,7 +36,7 @@
  * notification can be made conditional on req_event (i.e., the generic
  * hold-off mechanism provided by the ring macros). Backends must set
  * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()).
- * 
+ *
  * Back->front notifications: When enqueuing a new response, sending a
  * notification can be made conditional on rsp_event (i.e., the generic
  * hold-off mechanism provided by the ring macros). Frontends must set
@@ -48,37 +49,413 @@
 #define blkif_sector_t uint64_t
 
 /*
+ * Feature and Parameter Negotiation
+ * =================================
+ * The two halves of a Xen block driver utilize nodes within the XenStore to
+ * communicate capabilities and to negotiate operating parameters.  This
+ * section enumerates these nodes which reside in the respective front and
+ * backend portions of the XenStore, following the XenBus convention.
+ *
+ * All data in the XenStore is stored as strings.  Nodes specifying numeric
+ * values are encoded in decimal.  Integer value ranges listed below are
+ * expressed as fixed sized integer types capable of storing the conversion
+ * of a properly formated node string, without loss of information.
+ *
+ * Any specified default value is in effect if the corresponding XenBus node
+ * is not present in the XenStore.
+ *
+ * XenStore nodes in sections marked "PRIVATE" are solely for use by the
+ * driver side whose XenBus tree contains them.
+ *
+ * See the XenBus state transition diagram below for details on when XenBus
+ * nodes must be published and when they can be queried.
+ *
+ *****************************************************************************
+ *                            Backend XenBus Nodes
+ *****************************************************************************
+ *
+ *------------------ Backend Device Identification (PRIVATE) ------------------
+ *
+ * mode
+ *      Values:         "r" (read only), "w" (writable)
+ *
+ *      The read or write access permissions to the backing store to be
+ *      granted to the frontend.
+ *
+ * params
+ *      Values:         string
+ *
+ *      A free formatted string providing sufficient information for the
+ *      backend driver to open the backing device.  (e.g. the path to the
+ *      file or block device representing the backing store.)
+ *
+ * type
+ *      Values:         "file", "phy", "tap"
+ *
+ *      The type of the backing device/object.
+ *
+ *--------------------------------- Features ---------------------------------
+ *
+ * feature-barrier
+ *      Values:         0/1 (boolean)
+ *      Default Value:  0
+ *
+ *      A value of "1" indicates that the backend can process requests
+ *      containing the BLKIF_OP_WRITE_BARRIER request opcode.  Requests
+ *      of this type may still be returned at any time with the
+ *      BLKIF_RSP_EOPNOTSUPP result code.
+ *
+ * feature-flush-cache
+ *      Values:         0/1 (boolean)
+ *      Default Value:  0
+ *
+ *      A value of "1" indicates that the backend can process requests
+ *      containing the BLKIF_OP_FLUSH_DISKCACHE request opcode.  Requests
+ *      of this type may still be returned at any time with the
+ *      BLKIF_RSP_EOPNOTSUPP result code.
+ *
+ * feature-discard
+ *      Values:         0/1 (boolean)
+ *      Default Value:  0
+ *
+ *      A value of "1" indicates that the backend can process requests
+ *      containing the BLKIF_OP_DISCARD request opcode.  Requests
+ *      of this type may still be returned at any time with the
+ *      BLKIF_RSP_EOPNOTSUPP result code.
+ *
+ *----------------------- Request Transport Parameters ------------------------
+ *
+ * max-ring-page-order
+ *      Values:         <uint32_t>
+ *      Default Value:  0
+ *      Notes:          1, 3
+ *
+ *      The maximum supported size of the request ring buffer in units of
+ *      lb(machine pages). (e.g. 0 == 1 page,  1 = 2 pages, 2 == 4 pages,
+ *      etc.).
+ *
+ * max-ring-pages
+ *      Values:         <uint32_t>
+ *      Default Value:  1
+ *      Notes:          2, 3
+ *
+ *      The maximum supported size of the request ring buffer in units of
+ *      machine pages.  The value must be a power of 2.
+ *
+ * max-requests         <uint32_t>
+ *      Default Value:  BLKIF_MAX_RING_REQUESTS(PAGE_SIZE)
+ *      Maximum Value:  BLKIF_MAX_RING_REQUESTS(PAGE_SIZE * max-ring-pages)
+ *
+ *      The maximum number of concurrent, logical requests that will be
+ *      issued by the backend.
+ *
+ *      Note: A logical request may span multiple ring entries.
+ *
+ * max-request-segments
+ *      Values:         <uint8_t>
+ *      Default Value:  BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201202170145.q1H1jnQ3044170>