From owner-p4-projects@FreeBSD.ORG Wed Aug 12 22:59:02 2009 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 57608106568D; Wed, 12 Aug 2009 22:59:02 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11846106568C; Wed, 12 Aug 2009 22:59:02 +0000 (UTC) (envelope-from zec@freebsd.org) Received: from labs3.cc.fer.hr (labs3.cc.fer.hr [161.53.72.21]) by mx1.freebsd.org (Postfix) with ESMTP id 69E028FC41; Wed, 12 Aug 2009 22:59:01 +0000 (UTC) Received: from sluga.fer.hr (sluga.cc.fer.hr [161.53.72.14]) by labs3.cc.fer.hr (8.13.8+Sun/8.12.10) with ESMTP id n7CMwxTD014295; Thu, 13 Aug 2009 00:58:59 +0200 (CEST) Received: from localhost ([161.53.19.8]) by sluga.fer.hr with Microsoft SMTPSVC(6.0.3790.3959); Thu, 13 Aug 2009 00:58:59 +0200 From: Marko Zec To: Julian Elischer Date: Thu, 13 Aug 2009 00:58:51 +0200 User-Agent: KMail/1.9.10 References: <200908122108.n7CL8uhJ058398@repoman.freebsd.org> <4A8345E1.1070301@elischer.org> <4A8347E5.8070409@elischer.org> In-Reply-To: <4A8347E5.8070409@elischer.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200908130058.51574.zec@freebsd.org> X-OriginalArrivalTime: 12 Aug 2009 22:58:59.0614 (UTC) FILETIME=[7A6CABE0:01CA1BA0] Cc: Perforce Change Reviews , Robert Watson Subject: Re: PERFORCE change 167260 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Aug 2009 22:59:03 -0000 On Thursday 13 August 2009 00:53:25 Julian Elischer wrote: > Julian Elischer wrote: > > Marko Zec wrote: > >> On Wednesday 12 August 2009 23:58:46 Julian Elischer wrote: > >>> Marko Zec wrote: > >> > >> ... > >> > >>>> @@ -710,22 +715,36 @@ > >>>> .pr_input = div_input, > >>>> .pr_ctlinput = div_ctlinput, > >>>> .pr_ctloutput = ip_ctloutput, > >>>> - .pr_init = NULL, > >>>> + .pr_init = div_init, > >>>> .pr_usrreqs = &div_usrreqs > >>> > >>> If you are going to make pr_init() called for every vnet then > >>> pr_destroy should be as well. But in fact that is not really safe. > >>> (either of them) > >>> > >>> The trouble is that we can not guarantee that other protocols can > >>> handle being called multiple times in their init and destroy methods. > >>> Especially 3rd party protocols. > >>> > >>> We need to ensure only protocols that have been converted to run > >>> with multiple vnets are ever called with multiple vnets. > >>> > >>> for this reason the only safe way to do this is via the VNET_SYSINIT > >>> and VNET_SYSUNINIT calls. > >> > >> That would mean you would have to convert most if not all of the > >> existing things that hang off of protosw-s in netinet, netinet6 etc. > >> to use VNET_SYSINT / VNET_SYSUNIT instead of protosw->pr_init(). So > >> the short answer is no. > > > > robert has done just that. > > > >> I cannot recall that we ever discussed or planned to be able to mix > >> virtualized with non-virtualized protocols in the same kernel. That > >> would be a horrible mess, and I cannot even imagine having say a > >> multi-instance INET with a single-instance INET6 kernel, shared among > >> all the vnets. To start with, how would you decide that you're not > >> allowed to process an IPv6 packet received on the wire in a > >> non-default vnet in such an environment? Do we have the > >> infrastructure in place necessary for preventing doing say a ifconfig > >> lo0 ::1 in a non-default vnet in such an hypotetical setup? The > >> answer is no. > > > > I agree that it is horrible and we have not said that it will all work > > > >> VNET_SYSINIT is nice, but proper special-casing changes required to > >> support single-instance protocols to work only with vnet0 and not with > >> the other protocols are simply not there, and I hope will never be, > >> because I fear they would be highly intrusive, difficult to verify and > >> maintain, and probably also have an impact on performance. > >> > >> A proper solution for the issue you are raising could be something > >> that would prevent modules assuming our stack is compiled as > >> single-instance to be kldloaded if the kernel was actually built with > >> multi-instance stack support. I think Robert (cc-ed) had some ideas > >> on how to accomplish this by having such modules depend on a magic > >> global variable (say __no_vnet_support) to be available. > >> > >> All the current "base" protocols are already using pr_init() in > >> multi-instance mode in options VIMAGE case. So I see no reason for > >> ip_divert not being allowed to leverage on the same mechanism. > >> > >> Re. pr_destroy(), you're right, patch already submitted to p4... > > But pr_destroy is not called from pf_proto_unregister() > (a bug I think.) so I notice that you call it yourself, > but only on one vnet. No, div_destro() is only called directly in nooptions VIMAGE case, while for VIMAGE builds kldunload -f will not be permitted. pf_proto_unregister() should call pr_destroy, yes, but in this particular situation it is non-trivial to decide whether / when it would be safe to proceed with pf_proto_unregister(). That's why I opted not to do it at this point in time, i.e. if we want to push this in 8.0 as a really smallish diff. > with the code I had you could load and unload divert when there were > jails present or not. > > it would do the right thing. > > robert's code was specifically set up to avoid calling the proto_init > function on each as far as I could see. > > >> Marko