From owner-freebsd-net@FreeBSD.ORG Wed Jul 5 06:47:50 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 43C6E16A4DF; Wed, 5 Jul 2006 06:47:50 +0000 (UTC) (envelope-from Hartmut.Brandt@dlr.de) Received: from smtp-3.dlr.de (smtp-3.dlr.de [195.37.61.187]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5DEC043D49; Wed, 5 Jul 2006 06:47:49 +0000 (GMT) (envelope-from Hartmut.Brandt@dlr.de) Received: from beagle.kn.op.dlr.de ([129.247.173.6]) by smtp-3.dlr.de over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Wed, 5 Jul 2006 08:47:47 +0200 Date: Wed, 5 Jul 2006 08:47:47 +0200 (CEST) From: Harti Brandt X-X-Sender: brandt_h@beagle.kn.op.dlr.de To: Brooks Davis In-Reply-To: <20060704195858.GB12928@odin.ac.hmc.edu> Message-ID: <20060705084551.V78288@beagle.kn.op.dlr.de> References: <44A40C25.904@elischer.org> <20060630115749.G3964@fledge.watson.org> <20060703202803.GA22556@odin.ac.hmc.edu> <20060704.102539.-494099438.imp@bsdimp.com> <20060704174208.GA1734@odin.ac.hmc.edu> <20060704195220.K74584@beagle.kn.op.dlr.de> <44AAB986.2070505@elischer.org> <20060704205916.X74584@beagle.kn.op.dlr.de> <20060704195858.GB12928@odin.ac.hmc.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-OriginalArrivalTime: 05 Jul 2006 06:47:47.0873 (UTC) FILETIME=[ED522110:01C69FFE] Cc: src-committers@freebsd.org, yar@comp.chem.msu.su, rwatson@freebsd.org, Julian Elischer , freebsd-net@freebsd.org, "M. Warner Losh" Subject: Re: cvs commit: src/sys/net if_vlan.c X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Harti Brandt List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jul 2006 06:47:50 -0000 On Tue, 4 Jul 2006, Brooks Davis wrote: BD>On Tue, Jul 04, 2006 at 09:02:32PM +0200, Harti Brandt wrote: BD>> On Tue, 4 Jul 2006, Julian Elischer wrote: BD>> BD>> JE>Harti Brandt wrote: BD>> JE> BD>> JE>> On Tue, 4 Jul 2006, Brooks Davis wrote: BD>> JE>> BD>> JE>> BD>On Tue, Jul 04, 2006 at 10:25:39AM -0600, M. Warner Losh wrote: BD>> JE>> BD>> In message: <20060703202803.GA22556@odin.ac.hmc.edu> BD>> JE>> BD>> Brooks Davis writes: BD>> JE>> BD>> : and act as though the interface is not there. We could then BD>> JE>> consider BD>> JE>> BD>> : either holding the interface for a configurable or computed length BD>> JE>> BD>> : of time or adding some sort of refcounting (probably impractical). BD>> JE>> BD>> BD>> Refcounting would be good for the 'macro' things (coming and BD>> JE>> going) BD>> JE>> BD>> that are infrequent, but we might have mulitple people doing. You are BD>> JE>> BD>> right it likely is too inefficient to do with mbugs. One other option BD>> JE>> BD>> might be to have a configurable time after the last time that it was BD>> JE>> BD>> accessed via the 'safe' routines that were setup. This way we'd tie BD>> JE>> BD>> the removal of the interface to a period of time after it was last BD>> JE>> BD>> used, rather than after it was removed. I don't know if such a BD>> JE>> BD>> difference would matter much in practice. BD>> JE>> BD> BD>> JE>> BD>We might get some mielage out of last used, but then we'd have to keep BD>> JE>> BD>that timestamp updated. For normal applications, once we've torn down BD>> JE>> BD>the sockets and drained their queues, I believe we should not have to BD>> JE>> BD>wait more than a few seconds unless dummynet or some other mechanism BD>> JE>> BD>that queues mbufs for a significant period of time is enabled. If BD>> JE>> BD>dummynet is enabled we need to wait a bit longer, but it isn't outside BD>> JE>> BD>the relm of possibility for dummynet to be modified to tell us how long BD>> JE>> BD>it will be until the last mbuf it currenly holds will be released. In BD>> JE>> BD>practice, 121 seconds is probably a good default number since a 60 BD>> JE>> BD>second max RTT is assumed in TCP and thus delays longer than that BD>> JE>> BD>would break everything anyway. BD>> JE>> BD> BD>> JE>> BD>> The only other 'issue' that I see with this approach is if I remove a BD>> JE>> BD>> card, and then insert it again before the timeout happens. Does that BD>> JE>> BD>> card get a new interface name? And would people care or not... BD>> JE>> BD> BD>> JE>> BD>The name is unregistered with the call to if_detach because if_detach BD>> JE>> BD>removes the interface from the ifnet list. My guess is that BD>> JE>> BD>we'll either zero the name field or set to something like _zombie. The BD>> JE>> BD>unit will remain reserved until later. We'll need to add an SNMP index BD>> JE>> BD>mananaged in userland to satisfy come current if_index consumers. BD>> JE>> BD>> JE>> bsnmp does this anyway because of the rules for ifIndex. It has some BD>> JE>> heuristic to guess whether an interface is a physical one or not and if it BD>> JE>> is, it uses the same index again. The downside of this is that the BD>> JE>> interface index you see via SNMP has nothing to do with the interface index BD>> JE>> you see in the system and this does not work accross reboots and daemon BD>> JE>> restarts as required by the RFC. BD>> JE> BD>> JE>If we had a way to to this in the system (e.g. kept the mac address BD>> JE>with the ifnum in a hash) then we could just keep the ifnum in the mbuf BD>> JE>instead of the ifp pointer, as that is only occasionally used, and a BD>> JE>ifnum2ipf() macro could check the validity wheheve it is used. BD>> BD>> This would be helpful for the SNMP daemon, because this would also allow BD>> to reuse the ifnum if the same interface is plugged in back. However care BD>> must be taken that non-physical interfaces get a new ifnum always (iftun BD>> for example). BD> BD>This belongs in userland. It's too complex to handle in the kernel. We BD>probably need to store the index in the kernel as a rondivues point, BD>but the actual decision on deciding which interfaces are the "same" and BD>which are "different" as defined by the SNMP spec is too difficult and BD>has too many edge cases to try to do in the kernel. I think the right BD>answer is a daemon (or devd triggered script) that sets the SNMP index BD>based on a database containing previous lladdrs and units. That's probably true and that's what bsnmp is doing, although a hint from the kernel whether the interface is a physical one or not could be very helpful (the driver should know). harti