From owner-freebsd-net@FreeBSD.ORG Wed Aug 14 13:55:38 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 68EE4C52; Wed, 14 Aug 2013 13:55:38 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id DFBDA2C87; Wed, 14 Aug 2013 13:55:37 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 82DA07300A; Wed, 14 Aug 2013 16:00:13 +0200 (CEST) Date: Wed, 14 Aug 2013 16:00:13 +0200 From: Luigi Rizzo To: "Alexander V. Chernikov" Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)) Message-ID: <20130814140013.GA65049@onelab2.iet.unipi.it> References: <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org> <20130814102109.GA63246@onelab2.iet.unipi.it> <587579055.20130814154713@serebryakov.spb.ru> <20130814120551.GA64260@onelab2.iet.unipi.it> <520B74DD.1060102@ipfw.ru> <20130814124024.GA64548@onelab2.iet.unipi.it> <520B7F91.2080209@ipfw.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <520B7F91.2080209@ipfw.ru> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Lawrence Stewart , Lev Serebryakov , FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Aug 2013 13:55:38 -0000 On Wed, Aug 14, 2013 at 05:01:05PM +0400, Alexander V. Chernikov wrote: > On 14.08.2013 16:40, Luigi Rizzo wrote: ... > >> You can save rte&arp, however doing this > >> gives you perfect chance to crash your kernel if egress interface is > >> destroyed (like vlan or ng or tun). > > I hope I learned not to follow a stale ifp pointer :) > Well, currently we have no locks (or other means) to ensure all other > cores has "current" pointer to ifp or its fields (or am I wrong?) This i don't know -- but in case, we should fix the race anyways (another timescale, but still dangerous). > > anyways ARP is really just the mac address so there is no > > dandling pointer issue. > > > > For the ifp associated to the route, > > i do not see a huge problem in marking the route/ifp as > > zombie and destroy it when the last reference goes away. > Yes, but references requires some synchronization primitives. One Again, we should protect against ifp destruction anyways. Surely we should try and make the protection mechanism cheap (in my proposal, going through the refcount once per millisecond instead of every single packet; there might be better ways, and i am all ears on that); surely, we cannot dismiss something because "we run without seatbelts now so anything else is more expensive". We had a related discussion regarding races in interfaces between the datapath (if_transmit() and *_rxeof() ) and the control path (ioctls, watchdog etc.). The reason I am raising this issue is because i want to fix the races that emerged when we moved to SMP, not because I want to "make hacks" and cut corners in unsafe ways. cheers luigi > possible solution is using pcpu counters, but it does not play well on > !amd64. > > > > Not that the current way is any better -- you need to lock/unlock > > the rte while you do the lookup, and hold a refcount to the ifp > > until the packet is queued. So how does my suggestion make > > things worse ? > > > > cheers > > luigi > > > > > >>> > >>> Considering that each lookup takes between 100..300ns if you are > >>> lucky (not many misses, relatively empty table etc.), one could > >>> reasonably do the lookup at most once per millisecond or so (just > >>> reading 'ticks', no need for a nanotime() if you have a slow clock), > >>> or whenever we get an error related to the socket, either in the > >>> forward path (e.g. ifp points to an interface that is down) or in > >>> the reverse path (e.g. a dupack because we sent a packet to the > >>> wrong place). > >> This sounds like "Hey, the kernel lookup is slow (which is true), let's > >> make a hack and don't bother lookups". > >> This approach gives us mtx-locked rte refcounts which are used (misused) > >> in many places making things worse and decreasing the ability to fix the > >> things up.. > >>> cheers > >>> luigi > >>> _______________________________________________ > >>> freebsd-net@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net > >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > >>> >