From owner-svn-src-stable@FreeBSD.ORG Wed Feb 15 14:23:02 2012 Return-Path: Delivered-To: svn-src-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 866291065670; Wed, 15 Feb 2012 14:23:02 +0000 (UTC) (envelope-from ken@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 6B7398FC08; Wed, 15 Feb 2012 14:23:02 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id q1FEN21C065707; Wed, 15 Feb 2012 14:23:02 GMT (envelope-from ken@svn.freebsd.org) Received: (from ken@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id q1FEN2Gj065696; Wed, 15 Feb 2012 14:23:02 GMT (envelope-from ken@svn.freebsd.org) Message-Id: <201202151423.q1FEN2Gj065696@svn.freebsd.org> From: "Kenneth D. Merry" Date: Wed, 15 Feb 2012 14:23:02 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-8@freebsd.org X-SVN-Group: stable-8 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r231759 - in stable/8: share/man/man4 sys/amd64/conf sys/conf sys/dev/acpica sys/dev/esp sys/dev/twa sys/dev/xen/balloon sys/dev/xen/blkback sys/dev/xen/blkfront sys/dev/xen/console sys... X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 14:23:02 -0000 Author: ken Date: Wed Feb 15 14:23:01 2012 New Revision: 231759 URL: http://svn.freebsd.org/changeset/base/231759 Log: MFC r215818, r216405, r216437, r216448, r216956, r221827, r222975, r223059, r225343, r225704, r225705, r225706, r225707, r225709, r226029, r220647, r230183, r230587, r230916, r228526, r230879: Bring Xen support in stable/8 up to parity with head. Almost all outstanding Xen support differences between head and stable/8 are included, except for the just added r231743. r215818 | cperciva | 2010-11-25 08:05:21 -0700 (Thu, 25 Nov 2010) | 5 lines Rename HYPERVISOR_multicall (which performs the multicall hypercall) to _HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which invokes _HYPERVISOR_multicall and checks that the individual hypercalls all succeeded. r216405 | rwatson | 2010-12-13 05:15:46 -0700 (Mon, 13 Dec 2010) | 7 lines Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching its similar disabling of adaptive mutexes and rwlocks. The existing comment on why this is the case also applies to sx locks. MFC after: 3 days Discussed with: attilio r216437 | gibbs | 2010-12-14 10:23:49 -0700 (Tue, 14 Dec 2010) | 2 lines Remove spurious printf left over from debugging our XenStore support. r216448 | gibbs | 2010-12-14 13:57:40 -0700 (Tue, 14 Dec 2010) | 4 lines Fix a typo in a comment. Noticed by: Attila Nagy r216956 | rwatson | 2011-01-04 07:49:54 -0700 (Tue, 04 Jan 2011) | 8 lines Make "options XENHVM" compile for i386, not just amd64 -- a largely mechanical change. This opens the door for using PV device drivers under Xen HVM on i386, as well as more general harmonisation of i386 and amd64 Xen support in FreeBSD. Reviewed by: cperciva MFC after: 3 weeks r221827 | mav | 2011-05-12 21:40:16 -0600 (Thu, 12 May 2011) | 2 lines Fix msleep() usage in Xen balloon driver to not wake up on every HZ tick. r222975 | gibbs | 2011-06-10 22:59:01 -0600 (Fri, 10 Jun 2011) | 63 lines Monitor and emit events for XenStore changes to XenBus trees of the devices we manage. These changes can be due to writes we make ourselves or due to changes made by the control domain. The goal of these changes is to insure that all state transitions can be detected regardless of their source and to allow common device policies (e.g. "onlined" backend devices) to be centralized in the XenBus bus code. sys/xen/xenbus/xenbusvar.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: Add a new method for XenBus drivers "localend_changed". This method is invoked whenever a write is detected to a device's XenBus tree. The default implementation of this method is a no-op. sys/xen/xenbus/xenbus_if.m: sys/dev/xen/netfront/netfront.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkback/blkback.c: Change the signature of the "otherend_changed" method. This notification cannot fail, so it should return void. sys/xen/xenbus/xenbusb_back.c: Add "online" device handling to the XenBus Back Bus support code. An online backend device remains active after a front-end detaches as a reconnect is expected to occur in the near future. sys/xen/interface/io/xenbus.h: Add comment block further explaining the meaning and driver responsibilities associated with the XenBus Closed state. sys/xen/xenbus/xenbusb.c: sys/xen/xenbus/xenbusb.h: sys/xen/xenbus/xenbusb_back.c: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusb_if.m: o Register a XenStore watch against the local XenBus tree for all devices. o Cache the string length of the path to our local tree. o Allow the xenbus front and back drivers to hook/filter both local and otherend watch processing. o Update the device ivar version of "state" when we detect a XenStore update of that node. sys/dev/xen/control/control.c: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusb.c: sys/xen/xenbus/xenbusb.h: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstorevar.h: Allow clients of the XenStore watch mechanism to attach a single uintptr_t worth of client data to the watch. This removes the need to carefully place client watch data within enclosing objects so that a cast or offsetof calculation can be used to convert from watch to enclosing object. Sponsored by: Spectra Logic Corporation MFC after: 1 week r223059 | gibbs | 2011-06-13 14:36:29 -0600 (Mon, 13 Jun 2011) | 36 lines Several enhancements to the Xen block back driver. sys/dev/xen/blkback/blkback.c: o Implement front-end request coalescing. This greatly improves the performance of front-end clients that are unaware of the dynamic request-size/number of requests negotiation available in the FreeBSD backend driver. This required a large restructuring in how this driver records in-flight transactions and how those transactions are mapped into kernel KVA. For example, the driver now includes a mini "KVA manager" that allocates ranges of contiguous KVA to patches of requests that are physically contiguous in the backing store so that a single bio or UIO segment can be used to represent the I/O. o Refuse to open any backend files or devices if the system has yet to mount root. This avoids a panic. o Properly handle "onlined" devices. An "onlined" backend device stays attached to its backing store across front-end disconnections. This feature is intended to reduce latency when a front-end does a hand-off to another driver (e.g. PV aware bootloader to OS kernel) or during a VM reboot. o Harden the driver against a pathological/buggy front-end by carefully vetting front-end XenStore data such as the front-end state. o Add sysctls that report the negotiated number of segments per-request and the number of requests that can be concurrently in flight. Submitted by: kdm Reviewed by: gibbs Sponsored by: Spectra Logic Corporation MFC after: 1 week r225343 | rwatson | 2011-09-02 11:36:01 -0600 (Fri, 02 Sep 2011) | 7 lines Add support for alternative break-to-debugger support on the Xen console. This should help debug boot-time hangs experienced in 9.0-BETA. MFC after: 3 weeks Tested by: sbruno Approved by: re (kib) r225704 | gibbs | 2011-09-20 17:44:34 -0600 (Tue, 20 Sep 2011) | 29 lines Properly handle suspend/resume events in the Xen device framework. Sponsored by: BQ Internet sys/xen/xenbus/xenbusb.c: o In xenbusb_resume(), publish the state transition of the resuming device into XenbusStateIntiailising so that the remote peer can see it. Recording the state locally is not sufficient to trigger a re-connect sequence. o In xenbusb_resume(), defer new-bus resume processing until after the remote peer's XenStore address has been updated. The drivers may need to refer to this information during resume processing. sys/xen/xenbus/xenbusb_back.c: sys/xen/xenbus/xenbusb_front.c: Register xenbusb_resume() rather than bus_generic_resume() as the handler for device_resume events. sys/xen/xenstore/xenstore.c: o Fix grammer in a comment. o In xs_suspend(), pass suspend events on to the child devices (e.g. xenbusb_front/back, that are attached to the XenStore. Approved by: re MFC after: 1 week r225705 | gibbs | 2011-09-20 18:02:44 -0600 (Tue, 20 Sep 2011) | 35 lines Add suspend/resume support to the Xen blkfront driver. Sponsored by: BQ Internet sys/dev/xen/blkfront/block.h: sys/dev/xen/blkfront/blkfront.c: Remove now unused blkif_vdev_t from the blkfront soft. sys/dev/xen/blkfront/blkfront.c: o In blkfront_suspend(), indicate the desire to suspend by changing the softc connected state to SUSPENDED, and then wait for any I/O pending on the remote peer to drain. Cancel suspend processing if I/O does not drain within 30 seconds. o Enable and update blkfront_resume(). Since I/O is drained prior to the suspension of the VM, the complicated recovery process performed by other Xen blkfront implementations is avoided. We simply tear down the connection to our old peer, and then re-connect. o In blkif_initialize(), fix a resource leak and botched return if we cannot allocate shadow memory for our requests. o In blkfront_backend_changed(), correct our response to the XenbusStateInitialised state. This state indicates that our backend peer has published sufficient data for blkfront to publish ring information and other XenStore data, not that a connection can occur. Blkfront now will only perform connection processing in response to the XenbusStateConnected state. This corrects an issue where blkfront connected before the backend was ready during resume processing. Approved by: re MFC after: 1 week r225706 | gibbs | 2011-09-20 18:06:02 -0600 (Tue, 20 Sep 2011) | 11 lines [ Forced commit. Actual changes accidentally included in r225704 ] sys/dev/xen/control/control.c: Fix locking violations in Xen HVM suspend processing and have it perform similar actions to those performed during an ACPI triggered suspend. Sponsored by: BQ Internet Approved by: re MFC after: 1 week r225707 | gibbs | 2011-09-20 18:08:25 -0600 (Tue, 20 Sep 2011) | 21 lines Correct suspend/resume support in the Netfront driver. Sponsored by: BQ Internet sys/dev/xen/netfront/netfront.c: o Implement netfront_suspend(), a specialized suspend handler for the netfront driver. This routine simply disables the carrier so the driver is idle during system suspend processing. o Fix a leak when re-initializing LRO during a link reset. o In netif_release_tx_bufs(), when cleaning up the grant references for our TX ring, use gnttab_end_foreign_access_ref instead of attempting to grant the page again. o In netif_release_tx_bufs(), we do not track mbufs associated with mbuf chains, but instead just free each mbuf directly. Use m_free(), not m_freem(), to avoid double frees of mbufs. o Refactor some code to enhance clarity. Approved by: re MFC after: 1 week r225709 | gibbs | 2011-09-20 18:15:29 -0600 (Tue, 20 Sep 2011) | 19 lines Update netfront so that it queries and honors published back-end features. sys/dev/xen/netfront/netfront.c: o Add xn_query_features() which reads the XenStore and records the TSO, LRO, and chained ring-request support of the backend. o Rename xn_configure_lro() to xn_configure_features() and use this routine to manage the setup of TSO, LRO, and checksum offload. o In create_netdev(), initialize if_capabilities and if_hwassist to the capabilities found on all backends. Delegate configuration of if_capenable and the TSO flag if if_hwassist to xn_configure_features(). Reported by: Hugo Silva (fix inspired by patch provided) Approved by: re MFC after: 1 week r226029 | jkim | 2011-10-04 17:53:47 -0600 (Tue, 04 Oct 2011) | 2 lines Add strnlen() to libkern. r220647 | jkim | 2011-04-14 16:17:39 -0600 (Thu, 14 Apr 2011) | 4 lines Add event handlers for (ACPI) suspend/resume events. Suspend event handlers are invoked right before device drivers go into sleep state and resume event handlers are invoked right after all device drivers are waken up. r230183 | cperciva | 2012-01-15 19:38:45 -0700 (Sun, 15 Jan 2012) | 3 lines Make XENHVM work on i386. The __ffs() function counts bits starting from zero, unlike ffs(3), which starts counting from 1. r230587 | ken | 2012-01-26 09:35:09 -0700 (Thu, 26 Jan 2012) | 38 lines Xen netback driver rewrite. share/man/man4/Makefile, share/man/man4/xnb.4, sys/dev/xen/netback/netback.c, sys/dev/xen/netback/netback_unit_tests.c: Rewrote the netback driver for xen to attach properly via newbus and work properly in both HVM and PVM mode (only HVM is tested). Works with the in-tree FreeBSD netfront driver or the Windows netfront driver from SuSE. Has not been extensively tested with a Linux netfront driver. Does not implement LRO, TSO, or polling. Includes unit tests that may be run through sysctl after compiling with XNB_DEBUG defined. sys/dev/xen/blkback/blkback.c, sys/xen/interface/io/netif.h: Comment elaboration. sys/kern/uipc_mbuf.c: Fix page fault in kernel mode when calling m_print() on a null mbuf. Since m_print() is only used for debugging, there are no performance concerns for extra error checking code. sys/kern/subr_scanf.c: Add the "hh" and "ll" width specifiers from C99 to scanf(). A few callers were already using "ll" even though scanf() was handling it as "l". Submitted by: Alan Somers Submitted by: John Suykerbuyk Sponsored by: Spectra Logic MFC after: 1 week Reviewed by: ken r230916 | ken | 2012-02-02 10:54:35 -0700 (Thu, 02 Feb 2012) | 13 lines Fix the netback driver build for i386. netback.c: Add missing VM includes. xen/xenvar.h, xen/xenpmap.h: Move some XENHVM macros from to on i386 to match the amd64 headers. conf/files: Add netback to the build. Submitted by: jhb MFC after: 3 days r228526 | kevlo | 2011-12-14 23:29:13 -0700 (Wed, 14 Dec 2011) | 2 lines s/timout/timeout r230879 | ken | 2012-02-01 13:19:33 -0700 (Wed, 01 Feb 2012) | 4 lines Add the GSO prefix descriptor define. MFC after: 3 days Added: stable/8/share/man/man4/xnb.4 - copied unchanged from r230587, head/share/man/man4/xnb.4 stable/8/sys/dev/xen/netback/netback_unit_tests.c - copied unchanged from r230587, head/sys/dev/xen/netback/netback_unit_tests.c stable/8/sys/libkern/strnlen.c - copied unchanged from r226029, head/sys/libkern/strnlen.c Modified: stable/8/share/man/man4/Makefile stable/8/sys/amd64/conf/XENHVM stable/8/sys/conf/files stable/8/sys/dev/acpica/acpi.c stable/8/sys/dev/esp/ncr53c9x.c stable/8/sys/dev/twa/tw_osl.h stable/8/sys/dev/xen/balloon/balloon.c stable/8/sys/dev/xen/blkback/blkback.c stable/8/sys/dev/xen/blkfront/blkfront.c stable/8/sys/dev/xen/blkfront/block.h stable/8/sys/dev/xen/console/console.c stable/8/sys/dev/xen/control/control.c stable/8/sys/dev/xen/netback/netback.c stable/8/sys/dev/xen/netfront/netfront.c stable/8/sys/dev/xen/xenpci/evtchn.c stable/8/sys/i386/include/pcpu.h stable/8/sys/i386/include/pmap.h stable/8/sys/i386/include/xen/hypercall.h stable/8/sys/i386/include/xen/xen-os.h stable/8/sys/i386/include/xen/xenpmap.h stable/8/sys/i386/include/xen/xenvar.h stable/8/sys/i386/xen/xen_machdep.c stable/8/sys/kern/subr_scanf.c stable/8/sys/kern/uipc_mbuf.c stable/8/sys/sys/eventhandler.h stable/8/sys/sys/libkern.h stable/8/sys/xen/interface/io/netif.h stable/8/sys/xen/interface/io/xenbus.h stable/8/sys/xen/xenbus/xenbus.c stable/8/sys/xen/xenbus/xenbus_if.m stable/8/sys/xen/xenbus/xenbusb.c stable/8/sys/xen/xenbus/xenbusb.h stable/8/sys/xen/xenbus/xenbusb_back.c stable/8/sys/xen/xenbus/xenbusb_front.c stable/8/sys/xen/xenbus/xenbusb_if.m stable/8/sys/xen/xenbus/xenbusvar.h stable/8/sys/xen/xenstore/xenstore.c stable/8/sys/xen/xenstore/xenstorevar.h Directory Properties: stable/8/ (props changed) stable/8/share/ (props changed) stable/8/share/man/ (props changed) stable/8/share/man/man4/ (props changed) stable/8/sys/ (props changed) Modified: stable/8/share/man/man4/Makefile ============================================================================== --- stable/8/share/man/man4/Makefile Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/share/man/man4/Makefile Wed Feb 15 14:23:01 2012 (r231759) @@ -508,6 +508,7 @@ MAN= aac.4 \ ${_xen.4} \ xhci.4 \ xl.4 \ + ${_xnb.4} \ xpt.4 \ zero.4 \ zyd.4 @@ -696,6 +697,7 @@ _urtw.4= urtw.4 _viawd.4= viawd.4 _wpi.4= wpi.4 _xen.4= xen.4 +_xnb.4= xnb.4 MLINKS+=lindev.4 full.4 .endif Copied: stable/8/share/man/man4/xnb.4 (from r230587, head/share/man/man4/xnb.4) ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ stable/8/share/man/man4/xnb.4 Wed Feb 15 14:23:01 2012 (r231759, copy of r230587, head/share/man/man4/xnb.4) @@ -0,0 +1,134 @@ +.\" Copyright (c) 2012 Spectra Logic Corporation +.\" All rights reserved. +.\" +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions, and the following disclaimer, +.\" without modification. +.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer +.\" substantially similar to the "NO WARRANTY" disclaimer below +.\" ("Disclaimer") and any redistribution must be conditioned upon +.\" including a substantially similar Disclaimer requirement for further +.\" binary redistribution. +.\" +.\" NO WARRANTY +.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR +.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, +.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING +.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +.\" POSSIBILITY OF SUCH DAMAGES. +.\" +.\" Authors: Alan Somers (Spectra Logic Corporation) +.\" +.\" $FreeBSD$ +.\" + +.Dd January 6, 2012 +.Dt XNB 4 +.Os +.Sh NAME +.Nm xnb +.Nd "Xen Paravirtualized Backend Ethernet Driver" +.Sh SYNOPSIS +To compile this driver into the kernel, place the following lines in your +kernel configuration file: +.Bd -ragged -offset indent +.Cd "options XENHVM" +.Cd "device xenpci" +.Ed +.Sh DESCRIPTION +The +.Nm +driver provides the back half of a paravirtualized +.Xr xen 4 +network connection. The netback and netfront drivers appear to their +respective operating systems as Ethernet devices linked by a crossover cable. +Typically, +.Nm +will run on Domain 0 and the netfront driver will run on a guest domain. +However, it is also possible to run +.Nm +on a guest domain. It may be bridged or routed to provide the netfront's +domain access to other guest domains or to a physical network. +.Pp +In most respects, the +.Nm +device appears to the OS as an other Ethernet device. It can be configured at +runtime entirely with +.Xr ifconfig 8 +\&. In particular, it supports MAC changing, arbitrary MTU sizes, checksum +offload for IP, UDP, and TCP for both receive and transmit, and TSO. However, +see +.Sx CAVEATS +before enabling txcsum, rxcsum, or tso. +.Sh SYSCTL VARIABLES +The following read-only variables are available via +.Xr sysctl 8 : +.Bl -tag -width indent +.It Va dev.xnb.%d.dump_rings +Displays information about the ring buffers used to pass requests between the +netfront and netback. Mostly useful for debugging, but can also be used to +get traffic statistics. +.It Va dev.xnb.%d.unit_test_results +Runs a builtin suite of unit tests and displays the results. Does not affect +the operation of the driver in any way. Note that the test suite simulates +error conditions; this will result in error messages being printed to the +system system log. +.Sh CAVEATS +Packets sent through Xennet pass over shared memory, so the protocol includes +no form of link-layer checksum or CRC. Furthermore, Xennet drivers always +report to their hosts that they support receive and transmit checksum +offloading. They "offload" the checksum calculation by simply skipping it. +That works fine for packets that are exchanged between two domains on the same +machine. However, when a Xennet interface is bridged to a physical interface, +a correct checksum must be attached to any packets bound for that physical +interface. Currently, FreeBSD lacks any mechanism for an ethernet device to +inform the OS that newly received packets are valid even though their checksums +are not. So if the netfront driver is configured to offload checksum +calculations, it will pass non-checksumed packets to +.Nm +, which must then calculate the checksum in software before passing the packet +to the OS. +.Pp +For this reason, it is recommended that if +.Nm +is bridged to a physcal interface, then transmit checksum offloading should be +disabled on the netfront. The Xennet protocol does not have any mechanism for +the netback to request the netfront to do this; the operator must do it +manually. +.Sh SEE ALSO +.Xr arp 4 , +.Xr netintro 4 , +.Xr ng_ether 4 , +.Xr ifconfig 8 , +.Xr xen 4 +.Sh HISTORY +The +.Nm +device driver first appeared in +.Fx 10.0 +. +.Sh AUTHORS +The +.Nm +driver was written by +.An Alan Somers +.Aq alans@spectralogic.com +and +.An John Suykerbuyk +.Aq johns@spectralogic.com +.Sh BUGS +The +.Nm +driver does not properly checksum UDP datagrams that span more than one +Ethernet frame. Nor does it correctly checksum IPv6 packets. To workaround +that bug, disable transmit checksum offloading on the netfront driver. Modified: stable/8/sys/amd64/conf/XENHVM ============================================================================== --- stable/8/sys/amd64/conf/XENHVM Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/sys/amd64/conf/XENHVM Wed Feb 15 14:23:01 2012 (r231759) @@ -17,6 +17,7 @@ makeoptions MODULES_OVERRIDE="" # options NO_ADAPTIVE_MUTEXES options NO_ADAPTIVE_RWLOCKS +options NO_ADAPTIVE_SX # Xen HVM support options XENHVM Modified: stable/8/sys/conf/files ============================================================================== --- stable/8/sys/conf/files Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/sys/conf/files Wed Feb 15 14:23:01 2012 (r231759) @@ -2377,6 +2377,7 @@ libkern/strlcpy.c standard libkern/strlen.c standard libkern/strncmp.c standard libkern/strncpy.c standard +libkern/strnlen.c standard libkern/strsep.c standard libkern/strspn.c standard libkern/strstr.c standard @@ -3040,6 +3041,7 @@ dev/xen/blkback/blkback.c optional xen | dev/xen/console/console.c optional xen dev/xen/console/xencons_ring.c optional xen dev/xen/control/control.c optional xen | xenhvm +dev/xen/netback/netback.c optional xen | xenhvm dev/xen/netfront/netfront.c optional xen | xenhvm dev/xen/xenpci/xenpci.c optional xenpci dev/xen/xenpci/evtchn.c optional xenpci Modified: stable/8/sys/dev/acpica/acpi.c ============================================================================== --- stable/8/sys/dev/acpica/acpi.c Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/sys/dev/acpica/acpi.c Wed Feb 15 14:23:01 2012 (r231759) @@ -2538,6 +2538,8 @@ acpi_EnterSleepState(struct acpi_softc * return_ACPI_STATUS (AE_OK); } + EVENTHANDLER_INVOKE(power_suspend); + if (smp_started) { thread_lock(curthread); sched_bind(curthread, 0); @@ -2629,6 +2631,8 @@ backout: thread_unlock(curthread); } + EVENTHANDLER_INVOKE(power_resume); + /* Allow another sleep request after a while. */ timeout(acpi_sleep_enable, sc, hz * ACPI_MINIMUM_AWAKETIME); Modified: stable/8/sys/dev/esp/ncr53c9x.c ============================================================================== --- stable/8/sys/dev/esp/ncr53c9x.c Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/sys/dev/esp/ncr53c9x.c Wed Feb 15 14:23:01 2012 (r231759) @@ -316,7 +316,7 @@ ncr53c9x_attach(struct ncr53c9x_softc *s * The recommended timeout is 250ms. This register is loaded * with a value calculated as follows, from the docs: * - * (timout period) x (CLK frequency) + * (timeout period) x (CLK frequency) * reg = ------------------------------------- * 8192 x (Clock Conversion Factor) * Modified: stable/8/sys/dev/twa/tw_osl.h ============================================================================== --- stable/8/sys/dev/twa/tw_osl.h Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/sys/dev/twa/tw_osl.h Wed Feb 15 14:23:01 2012 (r231759) @@ -153,7 +153,7 @@ struct twa_softc { struct mtx sim_lock_handle;/* sim lock shared with cam */ struct mtx *sim_lock;/* ptr to sim lock */ - struct callout watchdog_callout[2]; /* For command timout */ + struct callout watchdog_callout[2]; /* For command timeout */ TW_UINT32 watchdog_index; #ifdef TW_OSL_DEBUG Modified: stable/8/sys/dev/xen/balloon/balloon.c ============================================================================== --- stable/8/sys/dev/xen/balloon/balloon.c Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/sys/dev/xen/balloon/balloon.c Wed Feb 15 14:23:01 2012 (r231759) @@ -41,8 +41,8 @@ __FBSDID("$FreeBSD$"); #include #include -#include #include +#include #include #include @@ -147,12 +147,6 @@ balloon_retrieve(void) return page; } -static void -balloon_alarm(void *unused) -{ - wakeup(balloon_process); -} - static unsigned long current_target(void) { @@ -378,6 +372,8 @@ balloon_process(void *unused) mtx_lock(&balloon_mutex); for (;;) { + int sleep_time; + do { credit = current_target() - bs.current_pages; if (credit > 0) @@ -389,9 +385,12 @@ balloon_process(void *unused) /* Schedule more work if there is some still to be done. */ if (current_target() != bs.current_pages) - timeout(balloon_alarm, NULL, ticks + hz); + sleep_time = hz; + else + sleep_time = 0; - msleep(balloon_process, &balloon_mutex, 0, "balloon", -1); + msleep(balloon_process, &balloon_mutex, 0, "balloon", + sleep_time); } mtx_unlock(&balloon_mutex); } @@ -474,9 +473,6 @@ balloon_init(void *arg) bs.hard_limit = ~0UL; kproc_create(balloon_process, NULL, NULL, 0, 0, "balloon"); -// init_timer(&balloon_timer); -// balloon_timer.data = 0; -// balloon_timer.function = balloon_alarm; #ifndef XENHVM /* Initialise the balloon with excess memory space. */ Modified: stable/8/sys/dev/xen/blkback/blkback.c ============================================================================== --- stable/8/sys/dev/xen/blkback/blkback.c Wed Feb 15 13:40:10 2012 (r231758) +++ stable/8/sys/dev/xen/blkback/blkback.c Wed Feb 15 14:23:01 2012 (r231759) @@ -1,5 +1,5 @@ /*- - * Copyright (c) 2009-2010 Spectra Logic Corporation + * Copyright (c) 2009-2011 Spectra Logic Corporation * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -61,6 +61,8 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include +#include #include @@ -153,9 +155,19 @@ MALLOC_DEFINE(M_XENBLOCKBACK, "xbbd", "X #define XBB_MAX_RING_PAGES \ BLKIF_RING_PAGES(BLKIF_SEGS_TO_BLOCKS(XBB_MAX_SEGMENTS_PER_REQUEST) \ * XBB_MAX_REQUESTS) +/** + * The maximum number of ring pages that we can allow per request list. + * We limit this to the maximum number of segments per request, because + * that is already a reasonable number of segments to aggregate. This + * number should never be smaller than XBB_MAX_SEGMENTS_PER_REQUEST, + * because that would leave situations where we can't dispatch even one + * large request. + */ +#define XBB_MAX_SEGMENTS_PER_REQLIST XBB_MAX_SEGMENTS_PER_REQUEST /*--------------------------- Forward Declarations ---------------------------*/ struct xbb_softc; +struct xbb_xen_req; static void xbb_attach_failed(struct xbb_softc *xbb, int err, const char *fmt, ...) __attribute__((format(printf, 3, 4))); @@ -163,16 +175,15 @@ static int xbb_shutdown(struct xbb_soft static int xbb_detach(device_t dev); /*------------------------------ Data Structures -----------------------------*/ -/** - * \brief Object tracking an in-flight I/O from a Xen VBD consumer. - */ -struct xbb_xen_req { - /** - * Linked list links used to aggregate idle request in the - * request free pool (xbb->request_free_slist). - */ - SLIST_ENTRY(xbb_xen_req) links; +STAILQ_HEAD(xbb_xen_req_list, xbb_xen_req); + +typedef enum { + XBB_REQLIST_NONE = 0x00, + XBB_REQLIST_MAPPED = 0x01 +} xbb_reqlist_flags; + +struct xbb_xen_reqlist { /** * Back reference to the parent block back instance for this * request. Used during bio_done handling. @@ -180,17 +191,71 @@ struct xbb_xen_req { struct xbb_softc *xbb; /** - * The remote domain's identifier for this I/O request. + * BLKIF_OP code for this request. */ - uint64_t id; + int operation; + + /** + * Set to BLKIF_RSP_* to indicate request status. + * + * This field allows an error status to be recorded even if the + * delivery of this status must be deferred. Deferred reporting + * is necessary, for example, when an error is detected during + * completion processing of one bio when other bios for this + * request are still outstanding. + */ + int status; + + /** + * Number of 512 byte sectors not transferred. + */ + int residual_512b_sectors; + + /** + * Starting sector number of the first request in the list. + */ + off_t starting_sector_number; + + /** + * If we're going to coalesce, the next contiguous sector would be + * this one. + */ + off_t next_contig_sector; + + /** + * Number of child requests in the list. + */ + int num_children; + + /** + * Number of I/O requests dispatched to the backend. + */ + int pendcnt; + + /** + * Total number of segments for requests in the list. + */ + int nr_segments; + + /** + * Flags for this particular request list. + */ + xbb_reqlist_flags flags; /** * Kernel virtual address space reserved for this request - * structure and used to map the remote domain's pages for + * list structure and used to map the remote domain's pages for * this I/O, into our domain's address space. */ uint8_t *kva; + /** + * Base, psuedo-physical address, corresponding to the start + * of this request's kva region. + */ + uint64_t gnt_base; + + #ifdef XBB_USE_BOUNCE_BUFFERS /** * Pre-allocated domain local memory used to proxy remote @@ -200,53 +265,91 @@ struct xbb_xen_req { #endif /** - * Base, psuedo-physical address, corresponding to the start - * of this request's kva region. + * Array of grant handles (one per page) used to map this request. */ - uint64_t gnt_base; + grant_handle_t *gnt_handles; + + /** + * Device statistics request ordering type (ordered or simple). + */ + devstat_tag_type ds_tag_type; + + /** + * Device statistics request type (read, write, no_data). + */ + devstat_trans_flags ds_trans_type; + + /** + * The start time for this request. + */ + struct bintime ds_t0; + + /** + * Linked list of contiguous requests with the same operation type. + */ + struct xbb_xen_req_list contig_req_list; + + /** + * Linked list links used to aggregate idle requests in the + * request list free pool (xbb->reqlist_free_stailq) and pending + * requests waiting for execution (xbb->reqlist_pending_stailq). + */ + STAILQ_ENTRY(xbb_xen_reqlist) links; +}; + +STAILQ_HEAD(xbb_xen_reqlist_list, xbb_xen_reqlist); + +/** + * \brief Object tracking an in-flight I/O from a Xen VBD consumer. + */ +struct xbb_xen_req { + /** + * Linked list links used to aggregate requests into a reqlist + * and to store them in the request free pool. + */ + STAILQ_ENTRY(xbb_xen_req) links; + + /** + * The remote domain's identifier for this I/O request. + */ + uint64_t id; /** * The number of pages currently mapped for this request. */ - int nr_pages; + int nr_pages; /** * The number of 512 byte sectors comprising this requests. */ - int nr_512b_sectors; + int nr_512b_sectors; /** * The number of struct bio requests still outstanding for this * request on the backend device. This field is only used for * device (rather than file) backed I/O. */ - int pendcnt; + int pendcnt; /** * BLKIF_OP code for this request. */ - int operation; + int operation; /** - * BLKIF_RSP status code for this request. - * - * This field allows an error status to be recorded even if the - * delivery of this status must be deferred. Deferred reporting - * is necessary, for example, when an error is detected during - * completion processing of one bio when other bios for this - * request are still outstanding. + * Storage used for non-native ring requests. */ - int status; + blkif_request_t ring_req_storage; /** - * Device statistics request ordering type (ordered or simple). + * Pointer to the Xen request in the ring. */ - devstat_tag_type ds_tag_type; + blkif_request_t *ring_req; /** - * Device statistics request type (read, write, no_data). + * Consumer index for this request. */ - devstat_trans_flags ds_trans_type; + RING_IDX req_ring_idx; /** * The start time for this request. @@ -254,9 +357,9 @@ struct xbb_xen_req { struct bintime ds_t0; /** - * Array of grant handles (one per page) used to map this request. + * Pointer back to our parent request list. */ - grant_handle_t *gnt_handles; + struct xbb_xen_reqlist *reqlist; }; SLIST_HEAD(xbb_xen_req_slist, xbb_xen_req); @@ -321,7 +424,10 @@ typedef enum XBBF_RESOURCE_SHORTAGE = 0x04, /** Connection teardown in progress. */ - XBBF_SHUTDOWN = 0x08 + XBBF_SHUTDOWN = 0x08, + + /** A thread is already performing shutdown processing. */ + XBBF_IN_SHUTDOWN = 0x10 } xbb_flag_t; /** Backend device type. */ @@ -399,7 +505,7 @@ struct xbb_file_data { * Only a single file based request is outstanding per-xbb instance, * so we only need one of these. */ - struct iovec xiovecs[XBB_MAX_SEGMENTS_PER_REQUEST]; + struct iovec xiovecs[XBB_MAX_SEGMENTS_PER_REQLIST]; #ifdef XBB_USE_BOUNCE_BUFFERS /** @@ -411,7 +517,7 @@ struct xbb_file_data { * bounce-out the read data. This array serves as the temporary * storage for this saved data. */ - struct iovec saved_xiovecs[XBB_MAX_SEGMENTS_PER_REQUEST]; + struct iovec saved_xiovecs[XBB_MAX_SEGMENTS_PER_REQLIST]; /** * \brief Array of memoized bounce buffer kva offsets used @@ -422,7 +528,7 @@ struct xbb_file_data { * the request sg elements is unavoidable. We memoize the computed * bounce address here to reduce the cost of the second walk. */ - void *xiovecs_vaddr[XBB_MAX_SEGMENTS_PER_REQUEST]; + void *xiovecs_vaddr[XBB_MAX_SEGMENTS_PER_REQLIST]; #endif /* XBB_USE_BOUNCE_BUFFERS */ }; @@ -437,9 +543,9 @@ union xbb_backend_data { /** * Function signature of backend specific I/O handlers. */ -typedef int (*xbb_dispatch_t)(struct xbb_softc *xbb, blkif_request_t *ring_req, - struct xbb_xen_req *req, int nseg, - int operation, int flags); +typedef int (*xbb_dispatch_t)(struct xbb_softc *xbb, + struct xbb_xen_reqlist *reqlist, int operation, + int flags); /** * Per-instance configuration data. @@ -467,14 +573,23 @@ struct xbb_softc { xbb_dispatch_t dispatch_io; /** The number of requests outstanding on the backend device/file. */ - u_int active_request_count; + int active_request_count; /** Free pool of request tracking structures. */ - struct xbb_xen_req_slist request_free_slist; + struct xbb_xen_req_list request_free_stailq; /** Array, sized at connection time, of request tracking structures. */ struct xbb_xen_req *requests; + /** Free pool of request list structures. */ + struct xbb_xen_reqlist_list reqlist_free_stailq; + + /** List of pending request lists awaiting execution. */ + struct xbb_xen_reqlist_list reqlist_pending_stailq; + + /** Array, sized at connection time, of request list structures. */ + struct xbb_xen_reqlist *request_lists; + /** * Global pool of kva used for mapping remote domain ring * and I/O transaction data. @@ -487,6 +602,15 @@ struct xbb_softc { /** The size of the global kva pool. */ int kva_size; + /** The size of the KVA area used for request lists. */ + int reqlist_kva_size; + + /** The number of pages of KVA used for request lists */ + int reqlist_kva_pages; + + /** Bitmap of free KVA pages */ + bitstr_t *kva_free; + /** * \brief Cached value of the front-end's domain id. * @@ -508,12 +632,12 @@ struct xbb_softc { int abi; /** - * \brief The maximum number of requests allowed to be in - * flight at a time. + * \brief The maximum number of requests and request lists allowed + * to be in flight at a time. * * This value is negotiated via the XenStore. */ - uint32_t max_requests; + u_int max_requests; /** * \brief The maximum number of segments (1 page per segment) @@ -521,7 +645,15 @@ struct xbb_softc { * * This value is negotiated via the XenStore. */ - uint32_t max_request_segments; + u_int max_request_segments; + + /** + * \brief Maximum number of segments per request list. + * + * This value is derived from and will generally be larger than + * max_request_segments. + */ + u_int max_reqlist_segments; /** * The maximum size of any request to this back-end @@ -529,7 +661,13 @@ struct xbb_softc { * * This value is negotiated via the XenStore. */ - uint32_t max_request_size; + u_int max_request_size; + + /** + * The maximum size of any request list. This is derived directly + * from max_reqlist_segments. + */ + u_int max_reqlist_size; /** Various configuration and state bit flags. */ xbb_flag_t flags; @@ -574,6 +712,7 @@ struct xbb_softc { struct vnode *vn; union xbb_backend_data backend; + /** The native sector size of the backend. */ u_int sector_size; @@ -598,7 +737,14 @@ struct xbb_softc { * * Ring processing is serialized so we only need one of these. */ - struct xbb_sg xbb_sgs[XBB_MAX_SEGMENTS_PER_REQUEST]; + struct xbb_sg xbb_sgs[XBB_MAX_SEGMENTS_PER_REQLIST]; + + /** + * Temporary grant table map used in xbb_dispatch_io(). When + * XBB_MAX_SEGMENTS_PER_REQLIST gets large, keeping this on the + * stack could cause a stack overflow. + */ + struct gnttab_map_grant_ref maps[XBB_MAX_SEGMENTS_PER_REQLIST]; /** Mutex protecting per-instance data. */ struct mtx lock; @@ -614,8 +760,51 @@ struct xbb_softc { int pseudo_phys_res_id; #endif - /** I/O statistics. */ + /** + * I/O statistics from BlockBack dispatch down. These are + * coalesced requests, and we start them right before execution. + */ struct devstat *xbb_stats; + + /** + * I/O statistics coming into BlockBack. These are the requests as + * we get them from BlockFront. They are started as soon as we + * receive a request, and completed when the I/O is complete. + */ + struct devstat *xbb_stats_in; + + /** Disable sending flush to the backend */ + int disable_flush; + + /** Send a real flush for every N flush requests */ + int flush_interval; + + /** Count of flush requests in the interval */ + int flush_count; + + /** Don't coalesce requests if this is set */ + int no_coalesce_reqs; + + /** Number of requests we have received */ + uint64_t reqs_received; + + /** Number of requests we have completed*/ + uint64_t reqs_completed; + + /** How many forced dispatches (i.e. without coalescing) have happend */ + uint64_t forced_dispatch; + + /** How many normal dispatches have happend */ + uint64_t normal_dispatch; + + /** How many total dispatches have happend */ + uint64_t total_dispatch; + + /** How many times we have run out of KVA */ + uint64_t kva_shortages; + + /** How many times we have run out of request structures */ + uint64_t request_shortages; }; /*---------------------------- Request Processing ----------------------------*/ @@ -633,21 +822,14 @@ xbb_get_req(struct xbb_softc *xbb) struct xbb_xen_req *req; req = NULL; - mtx_lock(&xbb->lock); - /* - * Do not allow new requests to be allocated while we - * are shutting down. - */ - if ((xbb->flags & XBBF_SHUTDOWN) == 0) { - if ((req = SLIST_FIRST(&xbb->request_free_slist)) != NULL) { - SLIST_REMOVE_HEAD(&xbb->request_free_slist, links); - xbb->active_request_count++; - } else { - xbb->flags |= XBBF_RESOURCE_SHORTAGE; - } + mtx_assert(&xbb->lock, MA_OWNED); + + if ((req = STAILQ_FIRST(&xbb->request_free_stailq)) != NULL) { + STAILQ_REMOVE_HEAD(&xbb->request_free_stailq, links); + xbb->active_request_count++; } - mtx_unlock(&xbb->lock); + return (req); } @@ -660,34 +842,40 @@ xbb_get_req(struct xbb_softc *xbb) static inline void xbb_release_req(struct xbb_softc *xbb, struct xbb_xen_req *req) { - int wake_thread; + mtx_assert(&xbb->lock, MA_OWNED); - mtx_lock(&xbb->lock); - wake_thread = xbb->flags & XBBF_RESOURCE_SHORTAGE; - xbb->flags &= ~XBBF_RESOURCE_SHORTAGE; - SLIST_INSERT_HEAD(&xbb->request_free_slist, req, links); + STAILQ_INSERT_HEAD(&xbb->request_free_stailq, req, links); xbb->active_request_count--; - if ((xbb->flags & XBBF_SHUTDOWN) != 0) { - /* - * Shutdown is in progress. See if we can - * progress further now that one more request - * has completed and been returned to the - * free pool. - */ - xbb_shutdown(xbb); - } - mtx_unlock(&xbb->lock); + KASSERT(xbb->active_request_count >= 0, + ("xbb_release_req: negative active count")); +} - if (wake_thread != 0) - taskqueue_enqueue(xbb->io_taskqueue, &xbb->io_task); +/** + * Return an xbb_xen_req_list of allocated xbb_xen_reqs to the free pool. + * + * \param xbb Per-instance xbb configuration structure. + * \param req_list The list of requests to free. + * \param nreqs The number of items in the list. + */ +static inline void +xbb_release_reqs(struct xbb_softc *xbb, struct xbb_xen_req_list *req_list, + int nreqs) +{ + mtx_assert(&xbb->lock, MA_OWNED); + + STAILQ_CONCAT(&xbb->request_free_stailq, req_list); + xbb->active_request_count -= nreqs; + + KASSERT(xbb->active_request_count >= 0, + ("xbb_release_reqs: negative active count")); } /** * Given a page index and 512b sector offset within that page, * calculate an offset into a request's kva region. * - * \param req The request structure whose kva region will be accessed. + * \param reqlist The request structure whose kva region will be accessed. * \param pagenr The page index used to compute the kva offset. * \param sector The 512b sector index used to compute the page relative * kva offset. @@ -695,9 +883,9 @@ xbb_release_req(struct xbb_softc *xbb, s * \return The computed global KVA offset. */ static inline uint8_t * -xbb_req_vaddr(struct xbb_xen_req *req, int pagenr, int sector) +xbb_reqlist_vaddr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector) { - return (req->kva + (PAGE_SIZE * pagenr) + (sector << 9)); + return (reqlist->kva + (PAGE_SIZE * pagenr) + (sector << 9)); } #ifdef XBB_USE_BOUNCE_BUFFERS @@ -705,7 +893,7 @@ xbb_req_vaddr(struct xbb_xen_req *req, i * Given a page index and 512b sector offset within that page, * calculate an offset into a request's local bounce memory region. * - * \param req The request structure whose bounce region will be accessed. + * \param reqlist The request structure whose bounce region will be accessed. * \param pagenr The page index used to compute the bounce offset. * \param sector The 512b sector index used to compute the page relative * bounce offset. @@ -713,9 +901,9 @@ xbb_req_vaddr(struct xbb_xen_req *req, i * \return The computed global bounce buffer address. */ static inline uint8_t * -xbb_req_bounce_addr(struct xbb_xen_req *req, int pagenr, int sector) +xbb_reqlist_bounce_addr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector) { - return (req->bounce + (PAGE_SIZE * pagenr) + (sector << 9)); + return (reqlist->bounce + (PAGE_SIZE * pagenr) + (sector << 9)); } #endif @@ -724,7 +912,7 @@ xbb_req_bounce_addr(struct xbb_xen_req * * calculate an offset into the request's memory region that the * underlying backend device/file should use for I/O. * - * \param req The request structure whose I/O region will be accessed. + * \param reqlist The request structure whose I/O region will be accessed. * \param pagenr The page index used to compute the I/O offset. * \param sector The 512b sector index used to compute the page relative * I/O offset. @@ -736,12 +924,12 @@ xbb_req_bounce_addr(struct xbb_xen_req * * this request. */ static inline uint8_t * -xbb_req_ioaddr(struct xbb_xen_req *req, int pagenr, int sector) +xbb_reqlist_ioaddr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector) { #ifdef XBB_USE_BOUNCE_BUFFERS - return (xbb_req_bounce_addr(req, pagenr, sector)); + return (xbb_reqlist_bounce_addr(reqlist, pagenr, sector)); #else - return (xbb_req_vaddr(req, pagenr, sector)); + return (xbb_reqlist_vaddr(reqlist, pagenr, sector)); #endif } @@ -750,7 +938,7 @@ xbb_req_ioaddr(struct xbb_xen_req *req, * an offset into the local psuedo-physical address space used to map a * front-end's request data into a request. * - * \param req The request structure whose pseudo-physical region + * \param reqlist The request list structure whose pseudo-physical region * will be accessed. * \param pagenr The page index used to compute the pseudo-physical offset. * \param sector The 512b sector index used to compute the page relative @@ -763,10 +951,126 @@ xbb_req_ioaddr(struct xbb_xen_req *req, * this request. */ static inline uintptr_t -xbb_req_gntaddr(struct xbb_xen_req *req, int pagenr, int sector) +xbb_get_gntaddr(struct xbb_xen_reqlist *reqlist, int pagenr, int sector) { - return ((uintptr_t)(req->gnt_base - + (PAGE_SIZE * pagenr) + (sector << 9))); + struct xbb_softc *xbb; + + xbb = reqlist->xbb; + + return ((uintptr_t)(xbb->gnt_base_addr + + (uintptr_t)(reqlist->kva - xbb->kva) + + (PAGE_SIZE * pagenr) + (sector << 9))); +} + +/** + * Get Kernel Virtual Address space for mapping requests. + * + * \param xbb Per-instance xbb configuration structure. + * \param nr_pages Number of pages needed. + * \param check_only If set, check for free KVA but don't allocate it. + * \param have_lock If set, xbb lock is already held. + * + * \return On success, a pointer to the allocated KVA region. Otherwise NULL. + * + * Note: This should be unnecessary once we have either chaining or + * scatter/gather support for struct bio. At that point we'll be able to + * put multiple addresses and lengths in one bio/bio chain and won't need + * to map everything into one virtual segment. + */ +static uint8_t * +xbb_get_kva(struct xbb_softc *xbb, int nr_pages) +{ + intptr_t first_clear, num_clear; + uint8_t *free_kva; + int i; + + KASSERT(nr_pages != 0, ("xbb_get_kva of zero length")); + + first_clear = 0; + free_kva = NULL; + + mtx_lock(&xbb->lock); + + /* + * Look for the first available page. If there are none, we're done. + */ + bit_ffc(xbb->kva_free, xbb->reqlist_kva_pages, &first_clear); + + if (first_clear == -1) + goto bailout; + + /* + * Starting at the first available page, look for consecutive free + * pages that will satisfy the user's request. + */ + for (i = first_clear, num_clear = 0; i < xbb->reqlist_kva_pages; i++) { + /* + * If this is true, the page is used, so we have to reset + * the number of clear pages and the first clear page + * (since it pointed to a region with an insufficient number + * of clear pages). + */ + if (bit_test(xbb->kva_free, i)) { + num_clear = 0; + first_clear = -1; + continue; + } + + if (first_clear == -1) + first_clear = i; + + /* + * If this is true, we've found a large enough free region + * to satisfy the request. + */ + if (++num_clear == nr_pages) { + + bit_nset(xbb->kva_free, first_clear, *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***