From owner-p4-projects@FreeBSD.ORG Wed Aug 12 23:02:26 2009 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 8B2591065675; Wed, 12 Aug 2009 23:02:26 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E12D106564A for ; Wed, 12 Aug 2009 23:02:26 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outU.internet-mail-service.net (outu.internet-mail-service.net [216.240.47.244]) by mx1.freebsd.org (Postfix) with ESMTP id 233BA8FC44 for ; Wed, 12 Aug 2009 23:02:25 +0000 (UTC) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id D025AD392; Wed, 12 Aug 2009 16:02:25 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 5427B2D600F; Wed, 12 Aug 2009 16:02:25 -0700 (PDT) Message-ID: <4A834A00.1000605@elischer.org> Date: Wed, 12 Aug 2009 16:02:24 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: Marko Zec References: <200908122108.n7CL8uhJ058398@repoman.freebsd.org> <4A8345E1.1070301@elischer.org> <4A8347E5.8070409@elischer.org> <200908130058.51574.zec@freebsd.org> In-Reply-To: <200908130058.51574.zec@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Perforce Change Reviews , Robert Watson Subject: Re: PERFORCE change 167260 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Aug 2009 23:02:27 -0000 Marko Zec wrote: > On Thursday 13 August 2009 00:53:25 Julian Elischer wrote: >> Julian Elischer wrote: >>> Marko Zec wrote: >>>> On Wednesday 12 August 2009 23:58:46 Julian Elischer wrote: >>>>> Marko Zec wrote: >>>> ... >>>> >>>>>> @@ -710,22 +715,36 @@ >>>>>> .pr_input = div_input, >>>>>> .pr_ctlinput = div_ctlinput, >>>>>> .pr_ctloutput = ip_ctloutput, >>>>>> - .pr_init = NULL, >>>>>> + .pr_init = div_init, >>>>>> .pr_usrreqs = &div_usrreqs >>>>> If you are going to make pr_init() called for every vnet then >>>>> pr_destroy should be as well. But in fact that is not really safe. >>>>> (either of them) >>>>> >>>>> The trouble is that we can not guarantee that other protocols can >>>>> handle being called multiple times in their init and destroy methods. >>>>> Especially 3rd party protocols. >>>>> >>>>> We need to ensure only protocols that have been converted to run >>>>> with multiple vnets are ever called with multiple vnets. >>>>> >>>>> for this reason the only safe way to do this is via the VNET_SYSINIT >>>>> and VNET_SYSUNINIT calls. >>>> That would mean you would have to convert most if not all of the >>>> existing things that hang off of protosw-s in netinet, netinet6 etc. >>>> to use VNET_SYSINT / VNET_SYSUNIT instead of protosw->pr_init(). So >>>> the short answer is no. >>> robert has done just that. >>> >>>> I cannot recall that we ever discussed or planned to be able to mix >>>> virtualized with non-virtualized protocols in the same kernel. That >>>> would be a horrible mess, and I cannot even imagine having say a >>>> multi-instance INET with a single-instance INET6 kernel, shared among >>>> all the vnets. To start with, how would you decide that you're not >>>> allowed to process an IPv6 packet received on the wire in a >>>> non-default vnet in such an environment? Do we have the >>>> infrastructure in place necessary for preventing doing say a ifconfig >>>> lo0 ::1 in a non-default vnet in such an hypotetical setup? The >>>> answer is no. >>> I agree that it is horrible and we have not said that it will all work >>> >>>> VNET_SYSINIT is nice, but proper special-casing changes required to >>>> support single-instance protocols to work only with vnet0 and not with >>>> the other protocols are simply not there, and I hope will never be, >>>> because I fear they would be highly intrusive, difficult to verify and >>>> maintain, and probably also have an impact on performance. >>>> >>>> A proper solution for the issue you are raising could be something >>>> that would prevent modules assuming our stack is compiled as >>>> single-instance to be kldloaded if the kernel was actually built with >>>> multi-instance stack support. I think Robert (cc-ed) had some ideas >>>> on how to accomplish this by having such modules depend on a magic >>>> global variable (say __no_vnet_support) to be available. >>>> >>>> All the current "base" protocols are already using pr_init() in >>>> multi-instance mode in options VIMAGE case. So I see no reason for >>>> ip_divert not being allowed to leverage on the same mechanism. >>>> >>>> Re. pr_destroy(), you're right, patch already submitted to p4... >> But pr_destroy is not called from pf_proto_unregister() >> (a bug I think.) so I notice that you call it yourself, >> but only on one vnet. > > No, div_destro() is only called directly in nooptions VIMAGE case, while for > VIMAGE builds kldunload -f will not be permitted. > > pf_proto_unregister() should call pr_destroy, yes, but in this particular > situation it is non-trivial to decide whether / when it would be safe to > proceed with pf_proto_unregister(). That's why I opted not to do it at this > point in time, i.e. if we want to push this in 8.0 as a really smallish diff. > >> with the code I had you could load and unload divert when there were >> jails present or not. >> >> it would do the right thing. >> >> robert's code was specifically set up to avoid calling the proto_init >> function on each as far as I could see. >> >>>> Marko > trouble is I think we now have the init routines being called per vnet TWICE.. follow these links... vmware-current# find . -name "*.[ch]" | xargs grep vnet_domain_init ./kern/uipc_domain.c:vnet_domain_init(void *arg) ./sys/domain.h:void vnet_domain_init(void *); ./sys/domain.h: VNET_SYSINIT(vnet_domain_init_ ## name, SI_SUB_PROTO_DOMAIN, \ ./sys/domain.h: SI_ORDER_SECOND, vnet_domain_init, & name ## domain); \ vmware-current# find . -name "*.[ch]" | xargs grep VNET_DOMAIN_SET ./netinet/in_proto.c:VNET_DOMAIN_SET(inet); ./netgraph/ng_socket.c:VNET_DOMAIN_SET(ng); ./net/rtsock.c:VNET_DOMAIN_SET(route); ./netinet6/in6_proto.c:VNET_DOMAIN_SET(inet6); ./netipsec/keysock.c:VNET_DOMAIN_SET(key); ./sys/domain.h:#define VNET_DOMAIN_SET(name) \ ./sys/domain.h:#define VNET_DOMAIN_SET(name) DOMAIN_SET(name)