From owner-p4-projects@FreeBSD.ORG  Wed Aug 12 22:59:02 2009
Return-Path: <owner-p4-projects@FreeBSD.ORG>
Delivered-To: p4-projects@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 57608106568D; Wed, 12 Aug 2009 22:59:02 +0000 (UTC)
Delivered-To: perforce@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 11846106568C;
	Wed, 12 Aug 2009 22:59:02 +0000 (UTC) (envelope-from zec@freebsd.org)
Received: from labs3.cc.fer.hr (labs3.cc.fer.hr [161.53.72.21])
	by mx1.freebsd.org (Postfix) with ESMTP id 69E028FC41;
	Wed, 12 Aug 2009 22:59:01 +0000 (UTC)
Received: from sluga.fer.hr (sluga.cc.fer.hr [161.53.72.14])
	by labs3.cc.fer.hr (8.13.8+Sun/8.12.10) with ESMTP id n7CMwxTD014295;
	Thu, 13 Aug 2009 00:58:59 +0200 (CEST)
Received: from localhost ([161.53.19.8]) by sluga.fer.hr with Microsoft
	SMTPSVC(6.0.3790.3959); Thu, 13 Aug 2009 00:58:59 +0200
From: Marko Zec <zec@freebsd.org>
To: Julian Elischer <julian@elischer.org>
Date: Thu, 13 Aug 2009 00:58:51 +0200
User-Agent: KMail/1.9.10
References: <200908122108.n7CL8uhJ058398@repoman.freebsd.org>
	<4A8345E1.1070301@elischer.org> <4A8347E5.8070409@elischer.org>
In-Reply-To: <4A8347E5.8070409@elischer.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200908130058.51574.zec@freebsd.org>
X-OriginalArrivalTime: 12 Aug 2009 22:58:59.0614 (UTC)
	FILETIME=[7A6CABE0:01CA1BA0]
Cc: Perforce Change Reviews <perforce@freebsd.org>,
	Robert Watson <rwatson@freebsd.org>
Subject: Re: PERFORCE change 167260 for review
X-BeenThere: p4-projects@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: p4 projects tree changes <p4-projects.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/p4-projects>,
	<mailto:p4-projects-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/p4-projects>
List-Post: <mailto:p4-projects@freebsd.org>
List-Help: <mailto:p4-projects-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/p4-projects>,
	<mailto:p4-projects-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Aug 2009 22:59:03 -0000

On Thursday 13 August 2009 00:53:25 Julian Elischer wrote:
> Julian Elischer wrote:
> > Marko Zec wrote:
> >> On Wednesday 12 August 2009 23:58:46 Julian Elischer wrote:
> >>> Marko Zec wrote:
> >>
> >> ...
> >>
> >>>> @@ -710,22 +715,36 @@
> >>>>      .pr_input =        div_input,
> >>>>      .pr_ctlinput =        div_ctlinput,
> >>>>      .pr_ctloutput =        ip_ctloutput,
> >>>> -    .pr_init =        NULL,
> >>>> +    .pr_init =        div_init,
> >>>>      .pr_usrreqs =        &div_usrreqs
> >>>
> >>> If you are going to make pr_init() called for every vnet then
> >>> pr_destroy should be as well. But in fact that is not really safe.
> >>> (either of them)
> >>>
> >>> The trouble is that we can not guarantee that other protocols can
> >>> handle being called multiple times in their init and destroy methods.
> >>> Especially 3rd party protocols.
> >>>
> >>> We need to ensure only protocols that have been converted to run
> >>> with multiple vnets are ever called with multiple vnets.
> >>>
> >>> for this reason the only safe way to do this is via the VNET_SYSINIT
> >>> and VNET_SYSUNINIT calls.
> >>
> >> That would mean you would have to convert most if not all of the
> >> existing things that hang off of protosw-s in netinet, netinet6 etc.
> >> to use VNET_SYSINT / VNET_SYSUNIT instead of protosw->pr_init().  So
> >> the short answer is no.
> >
> > robert has done just that.
> >
> >> I cannot recall that we ever discussed or planned to be able to mix
> >> virtualized with non-virtualized protocols in the same kernel.  That
> >> would be a horrible mess, and I cannot even imagine having say a
> >> multi-instance INET with a single-instance INET6 kernel, shared among
> >> all the vnets.  To start with, how would you decide that you're not
> >> allowed to process an IPv6 packet received on the wire in a
> >> non-default vnet in such an environment?  Do we have the
> >> infrastructure in place necessary for preventing doing say a ifconfig
> >> lo0 ::1 in a non-default vnet in such an hypotetical setup?  The
> >> answer is no.
> >
> > I agree that it is horrible and we have not said that it will all work
> >
> >> VNET_SYSINIT is nice, but proper special-casing changes required to
> >> support single-instance protocols to work only with vnet0 and not with
> >> the other protocols are simply not there, and I hope will never be,
> >> because I fear they would be highly intrusive, difficult to verify and
> >> maintain, and probably also have an impact on performance.
> >>
> >> A proper solution for the issue you are raising could be something
> >> that would prevent modules assuming our stack is compiled as
> >> single-instance to be kldloaded if the kernel was actually built with
> >> multi-instance stack support.  I think Robert (cc-ed) had some ideas
> >> on how to accomplish this by having such modules depend on a magic
> >> global variable (say __no_vnet_support) to be available.
> >>
> >> All the current "base" protocols are already using pr_init() in
> >> multi-instance mode in options VIMAGE case.  So I see no reason for
> >> ip_divert not being allowed to leverage on the same mechanism.
> >>
> >> Re. pr_destroy(), you're right, patch already submitted to p4...
>
> But pr_destroy is not called from pf_proto_unregister()
> (a bug I think.) so I notice that you call it yourself,
>   but only on one vnet.

No, div_destro() is only called directly in nooptions VIMAGE case, while for 
VIMAGE builds kldunload -f will not be permitted.

pf_proto_unregister() should call pr_destroy, yes, but in this particular 
situation it is non-trivial to decide whether / when it would be safe to 
proceed with pf_proto_unregister().  That's why I opted not to do it at this 
point in time, i.e. if we want to push this in 8.0 as a really smallish diff.

> with the code I had you could load and unload divert when there were
> jails present or not.
>
> it would do the right thing.
>
> robert's code was specifically set up to avoid calling the proto_init
> function on each as far as I could see.
>
> >> Marko