Date: Fri, 17 Feb 2012 01:45:49 +0000 (UTC) From: "Justin T. Gibbs" <gibbs@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-9@freebsd.org Subject: svn commit: r231851 - in stable/9/sys: dev/xen/blkback dev/xen/blkfront xen/interface/io xen/xenbus Message-ID: <201202170145.q1H1jnQ3044170@svn.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: gibbs Date: Fri Feb 17 01:45:49 2012 New Revision: 231851 URL: http://svn.freebsd.org/changeset/base/231851 Log: MFC r231743,r231837,r231839: Xen PV block interface enhancements r231743 ======= Enhance documentation, improve interoperability, and fix defects in FreeBSD's front and back Xen blkif interface drivers. sys/dev/xen/blkfront/block.h: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkback/blkback.c: Replace FreeBSD specific multi-page ring impelementation with support for both the Citrix and Amazon/RedHat versions of this extension. sys/dev/xen/blkfront/blkfront.c: o Add a per-instance sysctl tree that exposes all negotiated transport parameters (ring pages, max number of requests, max request size, max number of segments). o In blkfront_vdevice_to_unit() add a missing return statement so that we properly identify the unit number for high numbered xvd devices. sys/dev/xen/blkback/blkback.c: o Add static dtrace probes for several events in this driver. o Defer connection shutdown processing until the front-end enters the closed state. This avoids prematurely tearing down the connection when buggy front-ends transition to the closing state, even though the device is open and they veto the close request from the tool stack. o Add nodes for maximum request size and the number of active ring pages to the exising, per-instance, sysctl tree. o Miscelaneous style cleanup. sys/xen/interface/io/blkif.h: o Add extensive documentation of the XenStore nodes used to implement the blkif interface. o Document the startup sequence between a front and back driver. o Add structures and documenatation for the "discard" feature (AKA Trim). o Cleanup some definitions related to FreeBSD's request number/size/segment-limit extension. sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkback/blkback.c: sys/xen/xenbus/xenbusvar.h: Add the convenience function xenbus_get_otherend_state() and use it to simplify some logic in both block-front and block-back. r231837 ======= sys/dev/xen/blkback/blkback.c: Fix typo in a printf string: "specificed" -> "specified". r231839 ======= Fix a bug in the calculation of the maximum I/O request size. The previous code did not limit the I/O request size based on the maximum number of segments supported by the back-end. In current practice, since the only back-end supporting chained requests is the FreeBSD implementation, this limit was never exceeded. sys/dev/xen/blkfront/block.h: Add two macros, XBF_SEGS_TO_SIZE() and XBF_SIZE_TO_SEGS(), to centralize the logic of reserving a segment to deal with non-page-aligned I/Os. sys/dev/xen/blkfront/blkfront.c: o When negotiating transfer parameters, limit the max_request_size we use and publish, if it is greater than the maximum, unaligned, I/O we can support with the number of segments advertised by the backend. o Don't unilaterally reduce the I/O size published to the disk layer by a single page. max_request_size is already properly limited in the transfer parameter negotiation code. o Fix typos in printf strings: "max_requests_segments" -> "max_request_segments" "specificed" -> "specified" Modified: stable/9/sys/dev/xen/blkback/blkback.c stable/9/sys/dev/xen/blkfront/blkfront.c stable/9/sys/dev/xen/blkfront/block.h stable/9/sys/xen/interface/io/blkif.h stable/9/sys/xen/xenbus/xenbusvar.h Directory Properties: stable/9/sys/ (props changed) Modified: stable/9/sys/dev/xen/blkback/blkback.c ============================================================================== --- stable/9/sys/dev/xen/blkback/blkback.c Fri Feb 17 01:23:58 2012 (r231850) +++ stable/9/sys/dev/xen/blkback/blkback.c Fri Feb 17 01:45:49 2012 (r231851) @@ -40,6 +40,8 @@ __FBSDID("$FreeBSD$"); * a FreeBSD domain to other domains. */ +#include "opt_kdtrace.h" + #include <sys/param.h> #include <sys/systm.h> #include <sys/kernel.h> @@ -63,6 +65,7 @@ __FBSDID("$FreeBSD$"); #include <sys/mount.h> #include <sys/sysctl.h> #include <sys/bitstring.h> +#include <sys/sdt.h> #include <geom/geom.h> @@ -124,7 +127,7 @@ __FBSDID("$FreeBSD$"); MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "Xen Block Back Driver Data"); #ifdef XBB_DEBUG -#define DPRINTF(fmt, args...) \ +#define DPRINTF(fmt, args...) \ printf("xbb(%s:%d): " fmt, __FUNCTION__, __LINE__, ##args) #else #define DPRINTF(fmt, args...) do {} while(0) @@ -134,7 +137,7 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X * The maximum mapped region size per request we will allow in a negotiated * block-front/back communication channel. */ -#define XBB_MAX_REQUEST_SIZE \ +#define XBB_MAX_REQUEST_SIZE \ MIN(MAXPHYS, BLKIF_MAX_SEGMENTS_PER_REQUEST * PAGE_SIZE) /** @@ -142,9 +145,9 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X * segment blocks) per request we will allow in a negotiated block-front/back * communication channel. */ -#define XBB_MAX_SEGMENTS_PER_REQUEST \ - (MIN(UIO_MAXIOV, \ - MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST, \ +#define XBB_MAX_SEGMENTS_PER_REQUEST \ + (MIN(UIO_MAXIOV, \ + MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST, \ (XBB_MAX_REQUEST_SIZE / PAGE_SIZE) + 1))) /** @@ -980,9 +983,10 @@ xbb_get_gntaddr(struct xbb_xen_reqlist * static uint8_t * xbb_get_kva(struct xbb_softc *xbb, int nr_pages) { - intptr_t first_clear, num_clear; + intptr_t first_clear; + intptr_t num_clear; uint8_t *free_kva; - int i; + int i; KASSERT(nr_pages != 0, ("xbb_get_kva of zero length")); @@ -1681,19 +1685,19 @@ xbb_dispatch_io(struct xbb_softc *xbb, s req_ring_idx++; switch (xbb->abi) { case BLKIF_PROTOCOL_NATIVE: - sg = BLKRING_GET_SG_REQUEST(&xbb->rings.native, - req_ring_idx); + sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.native, + req_ring_idx); break; case BLKIF_PROTOCOL_X86_32: { - sg = BLKRING_GET_SG_REQUEST(&xbb->rings.x86_32, - req_ring_idx); + sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.x86_32, + req_ring_idx); break; } case BLKIF_PROTOCOL_X86_64: { - sg = BLKRING_GET_SG_REQUEST(&xbb->rings.x86_64, - req_ring_idx); + sg = BLKRING_GET_SEG_BLOCK(&xbb->rings.x86_64, + req_ring_idx); break; } default: @@ -1817,8 +1821,8 @@ xbb_run_queue(void *context, int pending struct xbb_xen_reqlist *reqlist; - xbb = (struct xbb_softc *)context; - rings = &xbb->rings; + xbb = (struct xbb_softc *)context; + rings = &xbb->rings; /* * Work gather and dispatch loop. Note that we have a bias here @@ -2032,6 +2036,13 @@ xbb_intr(void *arg) taskqueue_enqueue(xbb->io_taskqueue, &xbb->io_task); } +SDT_PROVIDER_DEFINE(xbb); +SDT_PROBE_DEFINE1(xbb, kernel, xbb_dispatch_dev, flush, flush, "int"); +SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_dev, read, read, "int", "uint64_t", + "uint64_t"); +SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_dev, write, write, "int", + "uint64_t", "uint64_t"); + /*----------------------------- Backend Handlers -----------------------------*/ /** * Backend handler for character device access. @@ -2087,6 +2098,9 @@ xbb_dispatch_dev(struct xbb_softc *xbb, nreq->pendcnt = 1; + SDT_PROBE1(xbb, kernel, xbb_dispatch_dev, flush, + device_get_unit(xbb->dev)); + (*dev_data->csw->d_strategy)(bio); return (0); @@ -2181,6 +2195,17 @@ xbb_dispatch_dev(struct xbb_softc *xbb, bios[bio_idx]->bio_bcount); } #endif + if (operation == BIO_READ) { + SDT_PROBE3(xbb, kernel, xbb_dispatch_dev, read, + device_get_unit(xbb->dev), + bios[bio_idx]->bio_offset, + bios[bio_idx]->bio_length); + } else if (operation == BIO_WRITE) { + SDT_PROBE3(xbb, kernel, xbb_dispatch_dev, write, + device_get_unit(xbb->dev), + bios[bio_idx]->bio_offset, + bios[bio_idx]->bio_length); + } (*dev_data->csw->d_strategy)(bios[bio_idx]); } @@ -2193,6 +2218,12 @@ fail_free_bios: return (error); } +SDT_PROBE_DEFINE1(xbb, kernel, xbb_dispatch_file, flush, flush, "int"); +SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_file, read, read, "int", "uint64_t", + "uint64_t"); +SDT_PROBE_DEFINE3(xbb, kernel, xbb_dispatch_file, write, write, "int", + "uint64_t", "uint64_t"); + /** * Backend handler for file access. * @@ -2237,6 +2268,9 @@ xbb_dispatch_file(struct xbb_softc *xbb, case BIO_FLUSH: { struct mount *mountpoint; + SDT_PROBE1(xbb, kernel, xbb_dispatch_file, flush, + device_get_unit(xbb->dev)); + vfs_is_locked = VFS_LOCK_GIANT(xbb->vn->v_mount); (void) vn_start_write(xbb->vn, &mountpoint, V_WAIT); @@ -2336,6 +2370,10 @@ xbb_dispatch_file(struct xbb_softc *xbb, switch (operation) { case BIO_READ: + SDT_PROBE3(xbb, kernel, xbb_dispatch_file, read, + device_get_unit(xbb->dev), xuio.uio_offset, + xuio.uio_resid); + vn_lock(xbb->vn, LK_EXCLUSIVE | LK_RETRY); /* @@ -2366,6 +2404,10 @@ xbb_dispatch_file(struct xbb_softc *xbb, case BIO_WRITE: { struct mount *mountpoint; + SDT_PROBE3(xbb, kernel, xbb_dispatch_file, write, + device_get_unit(xbb->dev), xuio.uio_offset, + xuio.uio_resid); + (void)vn_start_write(xbb->vn, &mountpoint, V_WAIT); vn_lock(xbb->vn, LK_EXCLUSIVE | LK_RETRY); @@ -3028,6 +3070,8 @@ xbb_collect_frontend_info(struct xbb_sof const char *otherend_path; int error; u_int ring_idx; + u_int ring_page_order; + size_t ring_size; otherend_path = xenbus_get_otherend_path(xbb->dev); @@ -3035,23 +3079,19 @@ xbb_collect_frontend_info(struct xbb_sof * Protocol defaults valid even if all negotiation fails. */ xbb->ring_config.ring_pages = 1; - xbb->max_requests = BLKIF_MAX_RING_REQUESTS(PAGE_SIZE); xbb->max_request_segments = BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK; xbb->max_request_size = xbb->max_request_segments * PAGE_SIZE; /* * Mandatory data (used in all versions of the protocol) first. */ - error = xs_gather(XST_NIL, otherend_path, - "ring-ref", "%" PRIu32, - &xbb->ring_config.ring_ref[0], - "event-channel", "%" PRIu32, - &xbb->ring_config.evtchn, - NULL); + error = xs_scanf(XST_NIL, otherend_path, + "event-channel", NULL, "%" PRIu32, + &xbb->ring_config.evtchn); if (error != 0) { xenbus_dev_fatal(xbb->dev, error, - "Unable to retrieve ring information from " - "frontend %s. Unable to connect.", + "Unable to retrieve event-channel information " + "from frontend %s. Unable to connect.", xenbus_get_otherend_path(xbb->dev)); return (error); } @@ -3065,10 +3105,20 @@ xbb_collect_frontend_info(struct xbb_sof * we must use independant calls in order to guarantee * we don't miss information in a sparsly populated front-end * tree. + * + * \note xs_scanf() does not update variables for unmatched + * fields. */ + ring_page_order = 0; (void)xs_scanf(XST_NIL, otherend_path, - "ring-pages", NULL, "%u", + "ring-page-order", NULL, "%u", + &ring_page_order); + xbb->ring_config.ring_pages = 1 << ring_page_order; + (void)xs_scanf(XST_NIL, otherend_path, + "num-ring-pages", NULL, "%u", &xbb->ring_config.ring_pages); + ring_size = PAGE_SIZE * xbb->ring_config.ring_pages; + xbb->max_requests = BLKIF_MAX_RING_REQUESTS(ring_size); (void)xs_scanf(XST_NIL, otherend_path, "max-requests", NULL, "%u", @@ -3084,7 +3134,7 @@ xbb_collect_frontend_info(struct xbb_sof if (xbb->ring_config.ring_pages > XBB_MAX_RING_PAGES) { xenbus_dev_fatal(xbb->dev, EINVAL, - "Front-end specificed ring-pages of %u " + "Front-end specified ring-pages of %u " "exceeds backend limit of %zu. " "Unable to connect.", xbb->ring_config.ring_pages, @@ -3092,7 +3142,7 @@ xbb_collect_frontend_info(struct xbb_sof return (EINVAL); } else if (xbb->max_requests > XBB_MAX_REQUESTS) { xenbus_dev_fatal(xbb->dev, EINVAL, - "Front-end specificed max_requests of %u " + "Front-end specified max_requests of %u " "exceeds backend limit of %u. " "Unable to connect.", xbb->max_requests, @@ -3100,7 +3150,7 @@ xbb_collect_frontend_info(struct xbb_sof return (EINVAL); } else if (xbb->max_request_segments > XBB_MAX_SEGMENTS_PER_REQUEST) { xenbus_dev_fatal(xbb->dev, EINVAL, - "Front-end specificed max_requests_segments " + "Front-end specified max_requests_segments " "of %u exceeds backend limit of %u. " "Unable to connect.", xbb->max_request_segments, @@ -3108,7 +3158,7 @@ xbb_collect_frontend_info(struct xbb_sof return (EINVAL); } else if (xbb->max_request_size > XBB_MAX_REQUEST_SIZE) { xenbus_dev_fatal(xbb->dev, EINVAL, - "Front-end specificed max_request_size " + "Front-end specified max_request_size " "of %u exceeds backend limit of %u. " "Unable to connect.", xbb->max_request_size, @@ -3116,22 +3166,39 @@ xbb_collect_frontend_info(struct xbb_sof return (EINVAL); } - /* If using a multi-page ring, pull in the remaining references. */ - for (ring_idx = 1; ring_idx < xbb->ring_config.ring_pages; ring_idx++) { - char ring_ref_name[]= "ring_refXX"; - - snprintf(ring_ref_name, sizeof(ring_ref_name), - "ring-ref%u", ring_idx); - error = xs_scanf(XST_NIL, otherend_path, - ring_ref_name, NULL, "%" PRIu32, - &xbb->ring_config.ring_ref[ring_idx]); + if (xbb->ring_config.ring_pages == 1) { + error = xs_gather(XST_NIL, otherend_path, + "ring-ref", "%" PRIu32, + &xbb->ring_config.ring_ref[0], + NULL); if (error != 0) { xenbus_dev_fatal(xbb->dev, error, - "Failed to retriev grant reference " - "for page %u of shared ring. Unable " - "to connect.", ring_idx); + "Unable to retrieve ring information " + "from frontend %s. Unable to " + "connect.", + xenbus_get_otherend_path(xbb->dev)); return (error); } + } else { + /* Multi-page ring format. */ + for (ring_idx = 0; ring_idx < xbb->ring_config.ring_pages; + ring_idx++) { + char ring_ref_name[]= "ring_refXX"; + + snprintf(ring_ref_name, sizeof(ring_ref_name), + "ring-ref%u", ring_idx); + error = xs_scanf(XST_NIL, otherend_path, + ring_ref_name, NULL, "%" PRIu32, + &xbb->ring_config.ring_ref[ring_idx]); + if (error != 0) { + xenbus_dev_fatal(xbb->dev, error, + "Failed to retriev grant " + "reference for page %u of " + "shared ring. Unable " + "to connect.", ring_idx); + return (error); + } + } } error = xs_gather(XST_NIL, otherend_path, @@ -3197,8 +3264,8 @@ xbb_alloc_requests(struct xbb_softc *xbb static int xbb_alloc_request_lists(struct xbb_softc *xbb) { - int i; struct xbb_xen_reqlist *reqlist; + int i; /* * If no requests can be merged, we need 1 request list per @@ -3318,7 +3385,7 @@ xbb_publish_backend_info(struct xbb_soft static void xbb_connect(struct xbb_softc *xbb) { - int error; + int error; if (xenbus_get_state(xbb->dev) == XenbusStateConnected) return; @@ -3399,7 +3466,8 @@ xbb_connect(struct xbb_softc *xbb) static int xbb_shutdown(struct xbb_softc *xbb) { - int error; + XenbusState frontState; + int error; DPRINTF("\n"); @@ -3413,6 +3481,20 @@ xbb_shutdown(struct xbb_softc *xbb) if ((xbb->flags & XBBF_IN_SHUTDOWN) != 0) return (EAGAIN); + xbb->flags |= XBBF_IN_SHUTDOWN; + mtx_unlock(&xbb->lock); + + if (xenbus_get_state(xbb->dev) < XenbusStateClosing) + xenbus_set_state(xbb->dev, XenbusStateClosing); + + frontState = xenbus_get_otherend_state(xbb->dev); + mtx_lock(&xbb->lock); + xbb->flags &= ~XBBF_IN_SHUTDOWN; + + /* The front can submit I/O until entering the closed state. */ + if (frontState < XenbusStateClosed) + return (EAGAIN); + DPRINTF("\n"); /* Indicate shutdown is in progress. */ @@ -3434,19 +3516,6 @@ xbb_shutdown(struct xbb_softc *xbb) DPRINTF("\n"); - /* - * Before unlocking mutex, set this flag to prevent other threads from - * getting into this function - */ - xbb->flags |= XBBF_IN_SHUTDOWN; - mtx_unlock(&xbb->lock); - - if (xenbus_get_state(xbb->dev) < XenbusStateClosing) - xenbus_set_state(xbb->dev, XenbusStateClosing); - - mtx_lock(&xbb->lock); - xbb->flags &= ~XBBF_IN_SHUTDOWN; - /* Indicate to xbb_detach() that is it safe to proceed. */ wakeup(xbb); @@ -3573,6 +3642,16 @@ xbb_setup_sysctl(struct xbb_softc *xbb) "max_request_segments", CTLFLAG_RD, &xbb->max_request_segments, 0, "maximum number of pages per requests (negotiated)"); + + SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, + "max_request_size", CTLFLAG_RD, + &xbb->max_request_size, 0, + "maximum size in bytes of a request (negotiated)"); + + SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, + "ring_pages", CTLFLAG_RD, + &xbb->ring_config.ring_pages, 0, + "communication channel pages (negotiated)"); } /** @@ -3587,6 +3666,7 @@ xbb_attach(device_t dev) { struct xbb_softc *xbb; int error; + u_int max_ring_page_order; DPRINTF("Attaching to %s\n", xenbus_get_node(dev)); @@ -3621,6 +3701,10 @@ xbb_attach(device_t dev) return (error); } + /* + * Amazon EC2 client compatility. They refer to max-ring-pages + * instead of to max-ring-page-order. + */ error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev), "max-ring-pages", "%zu", XBB_MAX_RING_PAGES); if (error) { @@ -3629,6 +3713,15 @@ xbb_attach(device_t dev) return (error); } + max_ring_page_order = flsl(XBB_MAX_RING_PAGES) - 1; + error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev), + "max-ring-page-order", "%u", max_ring_page_order); + if (error) { + xbb_attach_failed(xbb, error, "writing %s/max-ring-page-order", + xenbus_get_node(xbb->dev)); + return (error); + } + error = xs_printf(XST_NIL, xenbus_get_node(xbb->dev), "max-requests", "%u", XBB_MAX_REQUESTS); if (error) { @@ -3862,12 +3955,16 @@ xbb_frontend_changed(device_t dev, Xenbu xbb_connect(xbb); break; case XenbusStateClosing: + /* + * Frontend has acknowledged Closing request. + * Wait for Closed state. + */ + break; case XenbusStateClosed: mtx_lock(&xbb->lock); xbb_shutdown(xbb); mtx_unlock(&xbb->lock); - if (frontend_state == XenbusStateClosed) - xenbus_set_state(xbb->dev, XenbusStateClosed); + xenbus_set_state(xbb->dev, XenbusStateClosed); break; default: xenbus_dev_fatal(xbb->dev, EINVAL, "saw state %d at frontend", Modified: stable/9/sys/dev/xen/blkfront/blkfront.c ============================================================================== --- stable/9/sys/dev/xen/blkfront/blkfront.c Fri Feb 17 01:23:58 2012 (r231850) +++ stable/9/sys/dev/xen/blkfront/blkfront.c Fri Feb 17 01:45:49 2012 (r231851) @@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$"); #include <sys/bus.h> #include <sys/conf.h> #include <sys/module.h> +#include <sys/sysctl.h> #include <machine/bus.h> #include <sys/rman.h> @@ -139,7 +140,7 @@ static int xb_dump(void *, void *, vm_of * with blkfront as the emulated drives, easing transition slightly. */ static void -blkfront_vdevice_to_unit(int vdevice, int *unit, const char **name) +blkfront_vdevice_to_unit(uint32_t vdevice, int *unit, const char **name) { static struct vdev_info { int major; @@ -186,6 +187,7 @@ blkfront_vdevice_to_unit(int vdevice, in if (vdevice & (1 << 28)) { *unit = (vdevice & ((1 << 28) - 1)) >> 8; *name = "xbd"; + return; } for (i = 0; info[i].major; i++) { @@ -407,6 +409,40 @@ blkfront_probe(device_t dev) return (ENXIO); } +static void +xb_setup_sysctl(struct xb_softc *xb) +{ + struct sysctl_ctx_list *sysctl_ctx = NULL; + struct sysctl_oid *sysctl_tree = NULL; + + sysctl_ctx = device_get_sysctl_ctx(xb->xb_dev); + if (sysctl_ctx == NULL) + return; + + sysctl_tree = device_get_sysctl_tree(xb->xb_dev); + if (sysctl_tree == NULL) + return; + + SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, + "max_requests", CTLFLAG_RD, &xb->max_requests, -1, + "maximum outstanding requests (negotiated)"); + + SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, + "max_request_segments", CTLFLAG_RD, + &xb->max_request_segments, 0, + "maximum number of pages per requests (negotiated)"); + + SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, + "max_request_size", CTLFLAG_RD, + &xb->max_request_size, 0, + "maximum size in bytes of a request (negotiated)"); + + SYSCTL_ADD_UINT(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, + "ring_pages", CTLFLAG_RD, + &xb->ring_pages, 0, + "communication channel pages (negotiated)"); +} + /* * Setup supplies the backend dir, virtual device. We place an event * channel and shared frame entries. We watch backend to wait if it's @@ -417,14 +453,14 @@ blkfront_attach(device_t dev) { struct xb_softc *sc; const char *name; + uint32_t vdevice; int error; - int vdevice; int i; int unit; /* FIXME: Use dynamic device id if this is not set. */ error = xs_scanf(XST_NIL, xenbus_get_node(dev), - "virtual-device", NULL, "%i", &vdevice); + "virtual-device", NULL, "%" PRIu32, &vdevice); if (error) { xenbus_dev_fatal(dev, error, "reading virtual-device"); device_printf(dev, "Couldn't determine virtual device.\n"); @@ -449,6 +485,8 @@ blkfront_attach(device_t dev) sc->vdevice = vdevice; sc->connected = BLKIF_STATE_DISCONNECTED; + xb_setup_sysctl(sc); + /* Wait for backend device to publish its protocol capabilities. */ xenbus_set_state(dev, XenbusStateInitialising); @@ -501,6 +539,7 @@ blkfront_initialize(struct xb_softc *sc) { const char *otherend_path; const char *node_path; + uint32_t max_ring_page_order; int error; int i; @@ -513,10 +552,10 @@ blkfront_initialize(struct xb_softc *sc) * Protocol defaults valid even if negotiation for a * setting fails. */ + max_ring_page_order = 0; sc->ring_pages = 1; - sc->max_requests = BLKIF_MAX_RING_REQUESTS(PAGE_SIZE); sc->max_request_segments = BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK; - sc->max_request_size = (sc->max_request_segments - 1) * PAGE_SIZE; + sc->max_request_size = XBF_SEGS_TO_SIZE(sc->max_request_segments); sc->max_request_blocks = BLKIF_SEGS_TO_BLOCKS(sc->max_request_segments); /* @@ -526,13 +565,25 @@ blkfront_initialize(struct xb_softc *sc) * we must use independant calls in order to guarantee * we don't miss information in a sparsly populated back-end * tree. + * + * \note xs_scanf() does not update variables for unmatched + * fields. */ otherend_path = xenbus_get_otherend_path(sc->xb_dev); node_path = xenbus_get_node(sc->xb_dev); + + /* Support both backend schemes for relaying ring page limits. */ + (void)xs_scanf(XST_NIL, otherend_path, + "max-ring-page-order", NULL, "%" PRIu32, + &max_ring_page_order); + sc->ring_pages = 1 << max_ring_page_order; (void)xs_scanf(XST_NIL, otherend_path, "max-ring-pages", NULL, "%" PRIu32, &sc->ring_pages); + if (sc->ring_pages < 1) + sc->ring_pages = 1; + sc->max_requests = BLKIF_MAX_RING_REQUESTS(sc->ring_pages * PAGE_SIZE); (void)xs_scanf(XST_NIL, otherend_path, "max-requests", NULL, "%" PRIu32, &sc->max_requests); @@ -552,6 +603,16 @@ blkfront_initialize(struct xb_softc *sc) sc->ring_pages = XBF_MAX_RING_PAGES; } + if (powerof2(sc->ring_pages) == 0) { + uint32_t new_page_limit; + + new_page_limit = 0x01 << (fls(sc->ring_pages) - 1); + device_printf(sc->xb_dev, "Back-end specified ring-pages of " + "%u is not a power of 2. Limited to %u.\n", + sc->ring_pages, new_page_limit); + sc->ring_pages = new_page_limit; + } + if (sc->max_requests > XBF_MAX_REQUESTS) { device_printf(sc->xb_dev, "Back-end specified max_requests of " "%u limited to front-end limit of %u.\n", @@ -560,8 +621,8 @@ blkfront_initialize(struct xb_softc *sc) } if (sc->max_request_segments > XBF_MAX_SEGMENTS_PER_REQUEST) { - device_printf(sc->xb_dev, "Back-end specificed " - "max_requests_segments of %u limited to " + device_printf(sc->xb_dev, "Back-end specified " + "max_request_segments of %u limited to " "front-end limit of %u.\n", sc->max_request_segments, XBF_MAX_SEGMENTS_PER_REQUEST); @@ -569,12 +630,23 @@ blkfront_initialize(struct xb_softc *sc) } if (sc->max_request_size > XBF_MAX_REQUEST_SIZE) { - device_printf(sc->xb_dev, "Back-end specificed " + device_printf(sc->xb_dev, "Back-end specified " "max_request_size of %u limited to front-end " "limit of %u.\n", sc->max_request_size, XBF_MAX_REQUEST_SIZE); sc->max_request_size = XBF_MAX_REQUEST_SIZE; } + + if (sc->max_request_size > XBF_SEGS_TO_SIZE(sc->max_request_segments)) { + device_printf(sc->xb_dev, "Back-end specified " + "max_request_size of %u limited to front-end " + "limit of %u. (Too few segments.)\n", + sc->max_request_size, + XBF_SEGS_TO_SIZE(sc->max_request_segments)); + sc->max_request_size = + XBF_SEGS_TO_SIZE(sc->max_request_segments); + } + sc->max_request_blocks = BLKIF_SEGS_TO_BLOCKS(sc->max_request_segments); /* Allocate datastructures based on negotiated values. */ @@ -625,11 +697,20 @@ blkfront_initialize(struct xb_softc *sc) if (setup_blkring(sc) != 0) return; + /* Support both backend schemes for relaying ring page limits. */ error = xs_printf(XST_NIL, node_path, - "ring-pages","%u", sc->ring_pages); + "num-ring-pages","%u", sc->ring_pages); if (error) { xenbus_dev_fatal(sc->xb_dev, error, - "writing %s/ring-pages", + "writing %s/num-ring-pages", + node_path); + return; + } + error = xs_printf(XST_NIL, node_path, + "ring-page-order","%u", fls(sc->ring_pages) - 1); + if (error) { + xenbus_dev_fatal(sc->xb_dev, error, + "writing %s/ring-page-order", node_path); return; } @@ -711,25 +792,31 @@ setup_blkring(struct xb_softc *sc) return (error); } } - error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev), - "ring-ref","%u", sc->ring_ref[0]); - if (error) { - xenbus_dev_fatal(sc->xb_dev, error, "writing %s/ring-ref", - xenbus_get_node(sc->xb_dev)); - return (error); - } - for (i = 1; i < sc->ring_pages; i++) { - char ring_ref_name[]= "ring_refXX"; - - snprintf(ring_ref_name, sizeof(ring_ref_name), "ring-ref%u", i); + if (sc->ring_pages == 1) { error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev), - ring_ref_name, "%u", sc->ring_ref[i]); + "ring-ref", "%u", sc->ring_ref[0]); if (error) { - xenbus_dev_fatal(sc->xb_dev, error, "writing %s/%s", - xenbus_get_node(sc->xb_dev), - ring_ref_name); + xenbus_dev_fatal(sc->xb_dev, error, + "writing %s/ring-ref", + xenbus_get_node(sc->xb_dev)); return (error); } + } else { + for (i = 0; i < sc->ring_pages; i++) { + char ring_ref_name[]= "ring_refXX"; + + snprintf(ring_ref_name, sizeof(ring_ref_name), + "ring-ref%u", i); + error = xs_printf(XST_NIL, xenbus_get_node(sc->xb_dev), + ring_ref_name, "%u", sc->ring_ref[i]); + if (error) { + xenbus_dev_fatal(sc->xb_dev, error, + "writing %s/%s", + xenbus_get_node(sc->xb_dev), + ring_ref_name); + return (error); + } + } } error = bind_listening_port_to_irqhandler( @@ -795,7 +882,7 @@ blkfront_connect(struct xb_softc *sc) unsigned int binfo; int err, feature_barrier; - if( (sc->connected == BLKIF_STATE_CONNECTED) || + if( (sc->connected == BLKIF_STATE_CONNECTED) || (sc->connected == BLKIF_STATE_SUSPENDED) ) return; @@ -923,15 +1010,13 @@ blkif_close(struct disk *dp) return (ENXIO); sc->xb_flags &= ~XB_OPEN; if (--(sc->users) == 0) { - /* Check whether we have been instructed to close. We will - have ignored this request initially, as the device was - still mounted. */ - device_t dev = sc->xb_dev; - XenbusState state = - xenbus_read_driver_state(xenbus_get_otherend_path(dev)); - - if (state == XenbusStateClosing) - blkfront_closing(dev); + /* + * Check whether we have been instructed to close. We will + * have ignored this request initially, as the device was + * still mounted. + */ + if (xenbus_get_otherend_state(sc->xb_dev) == XenbusStateClosing) + blkfront_closing(sc->xb_dev); } return (0); } @@ -1033,7 +1118,7 @@ blkif_queue_cb(void *arg, bus_dma_segmen struct xb_command *cm; blkif_request_t *ring_req; struct blkif_request_segment *sg; - struct blkif_request_segment *last_block_sg; + struct blkif_request_segment *last_block_sg; grant_ref_t *sg_ref; vm_paddr_t buffer_ma; uint64_t fsect, lsect; @@ -1104,12 +1189,12 @@ blkif_queue_cb(void *arg, bus_dma_segmen nsegs--; } block_segs = MIN(nsegs, BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK); - if (block_segs == 0) - break; + if (block_segs == 0) + break; - sg = BLKRING_GET_SG_REQUEST(&sc->ring, sc->ring.req_prod_pvt); + sg = BLKRING_GET_SEG_BLOCK(&sc->ring, sc->ring.req_prod_pvt); sc->ring.req_prod_pvt++; - last_block_sg = sg + block_segs; + last_block_sg = sg + block_segs; } if (cm->operation == BLKIF_OP_READ) Modified: stable/9/sys/dev/xen/blkfront/block.h ============================================================================== --- stable/9/sys/dev/xen/blkfront/block.h Fri Feb 17 01:23:58 2012 (r231850) +++ stable/9/sys/dev/xen/blkfront/block.h Fri Feb 17 01:45:49 2012 (r231851) @@ -35,6 +35,32 @@ #include <xen/blkif.h> /** + * Given a number of blkif segments, compute the maximum I/O size supported. + * + * \note This calculation assumes that all but the first and last segments + * of the I/O are fully utilized. + * + * \note We reserve a segement from the maximum supported by the transport to + * guarantee we can handle an unaligned transfer without the need to + * use a bounce buffer. + */ +#define XBF_SEGS_TO_SIZE(segs) \ + (((segs) - 1) * PAGE_SIZE) + +/** + * Compute the maximum number of blkif segments requried to represent + * an I/O of the given size. + * + * \note This calculation assumes that all but the first and last segments + * of the I/O are fully utilized. + * + * \note We reserve a segement to guarantee we can handle an unaligned + * transfer without the need to use a bounce buffer. + */ +#define XBF_SIZE_TO_SEGS(size) \ + ((size / PAGE_SIZE) + 1) + +/** * The maximum number of outstanding requests blocks (request headers plus * additional segment blocks) we will allow in a negotiated block-front/back * communication channel. @@ -44,22 +70,18 @@ /** * The maximum mapped region size per request we will allow in a negotiated * block-front/back communication channel. - * - * \note We reserve a segement from the maximum supported by the transport to - * guarantee we can handle an unaligned transfer without the need to - * use a bounce buffer.. */ -#define XBF_MAX_REQUEST_SIZE \ - MIN(MAXPHYS, (BLKIF_MAX_SEGMENTS_PER_REQUEST - 1) * PAGE_SIZE) +#define XBF_MAX_REQUEST_SIZE \ + MIN(MAXPHYS, XBF_SEGS_TO_SIZE(BLKIF_MAX_SEGMENTS_PER_REQUEST)) /** * The maximum number of segments (within a request header and accompanying * segment blocks) per request we will allow in a negotiated block-front/back * communication channel. */ -#define XBF_MAX_SEGMENTS_PER_REQUEST \ - (MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST, \ - (XBF_MAX_REQUEST_SIZE / PAGE_SIZE) + 1)) +#define XBF_MAX_SEGMENTS_PER_REQUEST \ + (MIN(BLKIF_MAX_SEGMENTS_PER_REQUEST, \ + XBF_SIZE_TO_SEGS(XBF_MAX_REQUEST_SIZE))) /** * The maximum number of shared memory ring pages we will allow in a Modified: stable/9/sys/xen/interface/io/blkif.h ============================================================================== --- stable/9/sys/xen/interface/io/blkif.h Fri Feb 17 01:23:58 2012 (r231850) +++ stable/9/sys/xen/interface/io/blkif.h Fri Feb 17 01:45:49 2012 (r231851) @@ -1,8 +1,8 @@ /****************************************************************************** * blkif.h - * + * * Unified block-device I/O interface for Xen guest OSes. - * + * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to * deal in the Software without restriction, including without limitation the @@ -22,6 +22,7 @@ * DEALINGS IN THE SOFTWARE. * * Copyright (c) 2003-2004, Keir Fraser + * Copyright (c) 2012, Spectra Logic Corporation */ #ifndef __XEN_PUBLIC_IO_BLKIF_H__ @@ -35,7 +36,7 @@ * notification can be made conditional on req_event (i.e., the generic * hold-off mechanism provided by the ring macros). Backends must set * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()). - * + * * Back->front notifications: When enqueuing a new response, sending a * notification can be made conditional on rsp_event (i.e., the generic * hold-off mechanism provided by the ring macros). Frontends must set @@ -48,37 +49,413 @@ #define blkif_sector_t uint64_t /* + * Feature and Parameter Negotiation + * ================================= + * The two halves of a Xen block driver utilize nodes within the XenStore to + * communicate capabilities and to negotiate operating parameters. This + * section enumerates these nodes which reside in the respective front and + * backend portions of the XenStore, following the XenBus convention. + * + * All data in the XenStore is stored as strings. Nodes specifying numeric + * values are encoded in decimal. Integer value ranges listed below are + * expressed as fixed sized integer types capable of storing the conversion + * of a properly formated node string, without loss of information. + * + * Any specified default value is in effect if the corresponding XenBus node + * is not present in the XenStore. + * + * XenStore nodes in sections marked "PRIVATE" are solely for use by the + * driver side whose XenBus tree contains them. + * + * See the XenBus state transition diagram below for details on when XenBus + * nodes must be published and when they can be queried. + * + ***************************************************************************** + * Backend XenBus Nodes + ***************************************************************************** + * + *------------------ Backend Device Identification (PRIVATE) ------------------ + * + * mode + * Values: "r" (read only), "w" (writable) + * + * The read or write access permissions to the backing store to be + * granted to the frontend. + * + * params + * Values: string + * + * A free formatted string providing sufficient information for the + * backend driver to open the backing device. (e.g. the path to the + * file or block device representing the backing store.) + * + * type + * Values: "file", "phy", "tap" + * + * The type of the backing device/object. + * + *--------------------------------- Features --------------------------------- + * + * feature-barrier + * Values: 0/1 (boolean) + * Default Value: 0 + * + * A value of "1" indicates that the backend can process requests + * containing the BLKIF_OP_WRITE_BARRIER request opcode. Requests + * of this type may still be returned at any time with the + * BLKIF_RSP_EOPNOTSUPP result code. + * + * feature-flush-cache + * Values: 0/1 (boolean) + * Default Value: 0 + * + * A value of "1" indicates that the backend can process requests + * containing the BLKIF_OP_FLUSH_DISKCACHE request opcode. Requests + * of this type may still be returned at any time with the + * BLKIF_RSP_EOPNOTSUPP result code. + * + * feature-discard + * Values: 0/1 (boolean) + * Default Value: 0 + * + * A value of "1" indicates that the backend can process requests + * containing the BLKIF_OP_DISCARD request opcode. Requests + * of this type may still be returned at any time with the + * BLKIF_RSP_EOPNOTSUPP result code. + * + *----------------------- Request Transport Parameters ------------------------ + * + * max-ring-page-order + * Values: <uint32_t> + * Default Value: 0 + * Notes: 1, 3 + * + * The maximum supported size of the request ring buffer in units of + * lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, + * etc.). + * + * max-ring-pages + * Values: <uint32_t> + * Default Value: 1 + * Notes: 2, 3 + * + * The maximum supported size of the request ring buffer in units of + * machine pages. The value must be a power of 2. + * + * max-requests <uint32_t> + * Default Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE) + * Maximum Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE * max-ring-pages) + * + * The maximum number of concurrent, logical requests that will be + * issued by the backend. + * + * Note: A logical request may span multiple ring entries. + * + * max-request-segments + * Values: <uint8_t> + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201202170145.q1H1jnQ3044170>