From owner-p4-projects@FreeBSD.ORG Fri Aug 15 17:48:33 2008 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 78EDC1065675; Fri, 15 Aug 2008 17:48:33 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C5B8106564A for ; Fri, 15 Aug 2008 17:48:33 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outR.internet-mail-service.net (outr.internet-mail-service.net [216.240.47.241]) by mx1.freebsd.org (Postfix) with ESMTP id 15C3A8FC08 for ; Fri, 15 Aug 2008 17:48:33 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 0801E23F9; Fri, 15 Aug 2008 10:48:33 -0700 (PDT) Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 414822D60E8; Fri, 15 Aug 2008 10:48:31 -0700 (PDT) Message-ID: <48A5C16F.2070306@elischer.org> Date: Fri, 15 Aug 2008 10:48:31 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.16 (Macintosh/20080707) MIME-Version: 1.0 To: Marko Zec References: <200808150806.m7F86mA0039023@repoman.freebsd.org> In-Reply-To: <200808150806.m7F86mA0039023@repoman.freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Perforce Change Reviews Subject: Re: PERFORCE change 147425 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Aug 2008 17:48:33 -0000 Marko Zec wrote: > http://perforce.freebsd.org/chv.cgi?CH=147425 > > Change 147425 by zec@zec_tpx32 on 2008/08/15 08:06:14 > > Add an intro section to the document, clarify a few issues, > randomly s/virtual machine/virtual environment/ or vimage or > vnet where appropriate. THANKYOU! > > Affected files ... > > .. //depot/projects/vimage/porting_to_vimage.txt#6 edit > > Differences ... > > ==== //depot/projects/vimage/porting_to_vimage.txt#6 (text+ko) ==== > > @@ -6,21 +6,94 @@ > =================== > > Vimage is a framework in the BSD kernel which allows a co-operating module > -to present multiple instances of itself so that it can participate > -in a virtual machine scenario. > +to operate on multiple independent instances of its state so that it can > +participate in a virtual machine / virtual environment scenario. > + > +The implementation approach taken by the vimage framwork is a replacement > +of selected global state variables with constructs that allow for the > +virtualized state to be stored and resolved in appropriate instances of > +module-specific container structures. The code operating on virtualized state > +has to conform to a set of rules described further bellow, among other things > +in order to allow for all the changes to be conditionally compilable, i.e. > +permitting the virtualized code to fall back to operation on global state. > + > +The most visible change throughout the existing code is typically replacement > +of direct references to global variables with macros; foo_var thus becomes > +V_foo_var. V_foo_bar macros will resolve back to foo_bar global in default > +kernel builds, and alternatively to some_base_pointer->_foo_bar for "options > +VIMAGE" kernel configs. Prepending of "V_" prefixes to variable references > +helps in visual discrimination between global and virtualized state. The > +framework extends the sysctl infrastructure to support access to virtualized > +state through introduction of the SYSCTL_V family of macros; those also > +automatically fall back to their standard SYSCTL counterparts in default > +kernel builds. Transparent kldsym(2) lookups are provided to virtualized > +variables explicitly marked for visibility to kldsym interface, which permits > +userland binaries such as netstat to operate unmodified on "options VIMAGE" > +kernels, though this may have wide security implications. > + > +The vimage struct is currently primarily a placeholder for pointers to > +module-specific struct instances; currently V_NET (networking), V_CPU > +(CPU scheduling), and V_PROCG (jail-style interprocess protection) major > +module classes are defined. Each vimage module may or may not be further > +split into minor or submodules; the networking subsystem (vimage id V_NET; > +struct vnet) in particular is organized in submodules such as VNET_MOD_NET > +(mandatory shared infrastructure: routing tables, interface lists etc.); > +VNET_MOD_INET (IPv4 state including transport protocols); VNET_MOD_INET6, > +VNET_MOD_IPSEC, VNET_MOD_IPFW, VNET_MOD_NETGRAPH etc. The speciality of > +VNET submodules is in that they not only provide storage for virtualized > +data, but also enforce ordering of initialization and cleanup. Hence, not > +all submodules must necessarily allocate private storage for their specific > +data; they may be defined solely for to support proper initialization > +ordering. > + > +Each process is associated with a vimage, and vimages currently hang off of > +ucred-s. This relationship defines a process's administrative affinity > +to a vimage and thus indirectly to all of its modules (NET, CPU, PROCG) > +as well as to any submodules. All network interfaces and sockets hold > +pointers back to their parent vnets; this relationship is obviously entirely > +independent from proc->ucred->vimage bindings. Hence, when a process > +opens a socket, the socket will get bound to a vnet instance hanging off of > +proc->ucred->vimage->vnet, but once such a socket->vnet binding gets > +established, it cannot be changed for the entire socket lifetime. Certain > +classes of network interfaces (Ethernet in particular) can be assigned > +from one vnet to another at any time. By definition all vnets are > +are independent and can communicate only if they are explicitly provided > +with communication paths; currently only netgraph can be used to establish > +inter-vnet datapaths. > + > +In network traffic processing the vnet affinity is defined either by the > +inbound interface or by the socket / pcb -> vnet binding. However, there > +are many functions in the network stack that cannot implicitly fetch > +the vnet context from their standard arguments. Instead of explicitly > +extending argument lists of such functions with a struct vnet *, > +a per-thread variable td_vnet was introduced, which can be fetched via > +the curvnet macro (#define curvnet curthread->td_vnet). The curvnet > +context has to be set on entry to the network stack (socket operations, > +packet reception, or timer-driven functions) and cleared on exit. This > +must be done via provided CURVNET_SET() / CURVNET_RESTORE() family of > +macros, which allow for "stacking" of curvnet context setting and provide > +additional debugging info in INVARIANTS kernel configs. In most cases > +however a developer writing virtualized code will not have to set / > +restore the curvnet context unless the code would include timer-driven > +events, given that those are inherently vnet-contextless on entry. > + > + > +Converting / virtualizing existing code > +======================================= > > There are several steps need in virtualisation. > + > 1/ decide whether the module needs to be virtualised. > > if the module is a driver for specific hardware, it makes sense that > there be only one instance of the driver as there is only one piece of > physical hardware. There are changes in the networking code to allow > - physical (or virtual) interfaces to be moved between virtual machines. > - This generally requires NO changes to the network drivers of the classes > + physical (or virtual) interfaces to be moved between vnets. This > + generally requires NO changes to the network drivers of the classes > covered (e.g. ethernet). > > 2/ decide if your module is part of one of the major module groups. > - These are V_GLOBAL V_NET V_PROCG V_CPU. > + These are currently V_NET V_PROCG V_CPU. > > The reader will note that the descriptions below use the acronym VNET > a lot. The vimage system has been at this time broken into a number of > @@ -32,11 +105,6 @@ > processors to it, but keep the saem filesystem and network setup, or > alternatively to share processors but to have virtualised networking. > > - The current code has a "vnet" pointer in the thread. It could be argued > - that it should actually be a vimage. > - > - [comments from Marko here] > - > 3/ If the module is to be virtualised, decide which attributes of the > module should be virtualised. > > @@ -51,26 +119,28 @@ > achieve the behaviour required for part #2. > > 5/ Work out for all the code paths through the module, how the path entering > - the module can divine which virtual machine it is on. > + the module can divine which virtual environment it is on. > > Some examples: > - * Since interfaces are all assigned to one virtual machine or > - another, an incoming packet has a pointer to the receive interface, > - which in turn has a pointer to the virtual machine instance. > + * Since interfaces are all assigned to one vnet or another, an incoming > + packet has a pointer to the receive interface, which in turn has a > + pointer back to the vnet. > * Similarly, on any request from outside the kernel, (direct or indirect) > - the current thread has a way to get to the current virtual machine > - instance (easily referable as the "curvnet" macro). > + the current thread has a way to get to the current virtual environment > + instance via td->ucred->vimage. For existig sockets the vnet context > + must be used via so->so_vnet since td->ucred->vimage might change after > + socket creation. > * Timer initiated actions usually have a (void *) argument which points to > some private structure for the module. It should be possible to add > - a pointer to the appropriate virtual machine instance into whatever > - structure that points to. > - * Sometimes an action (timer initialted or initialted by module load or > - unload simply has to chack all the virtual machine instances. > - There is a macro (pair) for this which will iterate through all the > - virtual machine instances. > + a pointer to the appropriate module instance into whatever structure > + that points to. > + * Sometimes an action (timer trigerred or trigerred by module load or > + unload simply has to check all the vimage or module instances. > + There are macro (pairs) for this which will iterate through all the > + VNET or VPROCG instances. > > This covers most of the cases, however in some cases it may still be > - required for the module to stash away the virtual machine instance > + required for the module to stash away the virtual environment instance > somewhere, and make associated changes in the code. > > 6/ Add the code described below to the files that make up the module > @@ -80,7 +150,7 @@ > temp. note: for module FOO add a definition for VNET_MOD_FOO in sys/vimage.h. > Thos will eventually be dynamically assigned. > > -For now these instructions refer mainly to VNET and not VCPU etc. > +For now these instructions refer mainly to VNET and not VCPU, VPROCG etc. > > Symbols defined in other modules that have been virtualised will have been > moved to a module-specific virtualisation structure. It will be defined in a > @@ -103,18 +173,19 @@ > When VIMAGE is compiled in, the macro will evaluate to an access to an > element in a structure pointed to by a local varible. > For this reason, it is necessary to also add, at the beginning of > -these functions another MACRO that will instanciate this local variable > +these functions another MACRO that will instantiate this local variable > and point it at the correct place. > -As an example, prior to using the "V_ifnet" structure, we must > -add the following MACRO at the head of a code block enclosing the references. > - INIT_VNET_NET(initial_value); > +As an example, prior to using the "V_ifnet" structure in a program block, > +we must add the following MACRO at the head of a code block enclosing the > +references to set up module-specific base pointer variable: > + INIT_VNET_NET(initial_valu); > > When VIMAGE is not defined, this will evaluate to nothing but when it > IS defined, it will evaluate to: > struct vnet_net *vnet_net = (initial_value); > > The initial value is usually something like "curvnet" which in turn > -is a macro that derives the virtual machine reference from the current thread. > +is a macro that derives the vnet affinity from the current thread. > It could also be (m->m_ifp->if_vnet) if we were receiving an mbuf. > > In the case where it is just one function in a module calling > @@ -125,17 +196,17 @@ > marked as "unused"). > > Usually, when a packet enters the system it is carried through the processing > -path via a single thread, and that thread will set its virtual machine > +path via a single thread, and that thread will set its virtual environment > reference to that indicated by the packet on picking up that new packet. > This means that in the normal inbound processing path as well as the > outgoing process path the current thread can be used to indicate the > -current virtual machine. In the case of timer initiated events, best practice > -would also be to set the current virtual machine reference to that indicated > -calculated by whatever way that would be done, so that any functions called > -could rely on the current thread being a good reference for the correct > -virtual machine. > +current virtual environment. In the case of timer initiated events, best > +practice would also be to set the current virtual module reference to that > +indicated calculated by whatever way that would be done, so that any functions > +called could rely on the current thread being a good reference for the correct > +virtual module. > > -When a new module is defined for virtualisation. The following > +When a new VNET submodule is defined for virtualisation, the following > structure defining macro is used to define it to the framework. > > > @@ -150,17 +221,18 @@ > .vmi_struct_size = \ > sizeof(struct vnet_##m_name_lc), \ > .vmi_symmap = m_symmap \ > + > The ID we allocated in the temporary first step in "Details" is > -the first entry here. Eventually this should be automatically done > +the first entry here; eventually this should be automatically done > by module name. The DEPENDSON field tells us the order that modules > -should be initialised in a new virtual machine. This may later need > +should be initialised in a new virtual environment. This may later need > to be changes to a list of text module names for dynamic calculation. > -The rest of the fields are self explanatory.. > +The rest of the fields are self explanatory. > With the exception of the symmap entry. > The symmap allows us to intercept calls by libkvm to the > linker when it is looking up symbols and to redirect it > dynamically. this allows for example "netstat -r" to find the > -routing tables for THIS virtual machine. (cute eh?) > +routing tables for THIS virtual environment. > (of course that won't work for core dumps). (XXX *needs thought *) > > As example of virtualising a dummy module named the FOO module > @@ -194,11 +266,13 @@ > #endif /* !_FOO_VFOO_H_ */ > ========================================================= > > -For each time the foo module is initiated for a new virtual machine, > +For each time the foo module is initiated for a new virtual environment, > the foo_bar structure must be initiated, so a new foo_creator and destructor > functions are defined for the module. The Module will call these when a new > -virtual machine is created or destroyed. The constructor must be called once > -for the base machine when the system is booted, even when VIMAGE is not defined. > +virtual environment is created or destroyed. The constructor must be called > +once for the base machine when the system is booted, even when options VIMAGE > +is not defined. > + > ==================== in module foo.c ====== > #include "opt_vimage.h" > [...] > @@ -229,7 +303,7 @@ > > #ifdef VIMAGE > /* If we have symbols we need to divert for libkvm > - * then put them in here. We may net need to do anything if > + * then put them in here. We may not need to do anything if > * the symbols are not used by libkvm. > */ > static struct vnet_symmap vnet_net_symmap[] = { > @@ -239,7 +313,7 @@ > }; > /* > * Declare our module and state that we want to be done after the > - * loopback interface is initialised for the virtual machine. > + * loopback interface is initialised for the virtual environment. > */ > VNET_MOD_DECLARE(FOO, foo, vnet_foo_iattach, > vnet_foo_idetach, LOIF, vnet_foo_symmap) > @@ -295,7 +369,7 @@ > /* Initialize everything. */ > /* put your code here */ > #ifdef VIMAGE > - /* This will do the work for each vortual machine. */ > + /* This will do the work for each vortual environment. */ > vnet_mod_register(&vnet_foo_modinfo); > #else /* !VIMAGE */ > #ifdef FUTURE > @@ -309,7 +383,7 @@ > case MOD_UNLOAD: > /* You can't unload it because an interface may be using it. */ > /* this needs work */ > - /* Should refuse to unload if any virtual machines */ > + /* Should refuse to unload if any virtual environment */ > /* are using this still. */ > /* MARKO, fill in here */ > error = EBUSY;