From owner-freebsd-net@FreeBSD.ORG  Wed Aug 14 13:55:38 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 68EE4C52;
 Wed, 14 Aug 2013 13:55:38 +0000 (UTC)
 (envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
 by mx1.freebsd.org (Postfix) with ESMTP id DFBDA2C87;
 Wed, 14 Aug 2013 13:55:37 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
 id 82DA07300A; Wed, 14 Aug 2013 16:00:13 +0200 (CEST)
Date: Wed, 14 Aug 2013 16:00:13 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: "Alexander V. Chernikov" <melifaro@ipfw.ru>
Subject: Re: route/arp lifetime (Re: it's the output, not ack coalescing
 (Re: TSO and FreeBSD vs Linux))
Message-ID: <20130814140013.GA65049@onelab2.iet.unipi.it>
References: <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org>
 <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org>
 <20130814102109.GA63246@onelab2.iet.unipi.it>
 <587579055.20130814154713@serebryakov.spb.ru>
 <20130814120551.GA64260@onelab2.iet.unipi.it>
 <520B74DD.1060102@ipfw.ru>
 <20130814124024.GA64548@onelab2.iet.unipi.it>
 <520B7F91.2080209@ipfw.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <520B7F91.2080209@ipfw.ru>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: Lawrence Stewart <lstewart@freebsd.org>, Lev Serebryakov <lev@FreeBSD.org>,
 FreeBSD Net <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Aug 2013 13:55:38 -0000

On Wed, Aug 14, 2013 at 05:01:05PM +0400, Alexander V. Chernikov wrote:
> On 14.08.2013 16:40, Luigi Rizzo wrote:
...
> >> You can save rte&arp, however doing this
> >> gives you perfect chance to crash your kernel if egress interface is
> >> destroyed (like vlan or ng or tun).
> > I hope I learned not to follow a stale ifp pointer :)
> Well, currently we have no locks (or other means)  to ensure all other 
> cores has "current" pointer to ifp or its fields (or am I wrong?)

This i don't know -- but in case, we should fix the race anyways
(another timescale, but still dangerous).

> > anyways ARP is really just the mac address so there is no
> > dandling pointer issue.
> >
> > For the ifp associated to the route,
> > i do not see a huge problem in marking the route/ifp as
> > zombie and destroy it when the last reference goes away.
> Yes, but references requires some synchronization primitives. One 

Again, we should protect against ifp destruction anyways.  Surely
we should try and make the protection mechanism cheap (in my proposal,
going through the refcount once per millisecond instead of every
single packet; there might be better ways, and i am all ears on
that); surely, we cannot dismiss something because "we run without
seatbelts now so anything else is more expensive".

We had a related discussion regarding races in interfaces between
the datapath (if_transmit() and *_rxeof() ) and the control path
(ioctls, watchdog etc.).

The reason I am raising this issue is because i want to fix the
races that emerged when we moved to SMP, not because I want to "make
hacks" and cut corners in unsafe ways.

cheers
luigi

> possible solution is using pcpu counters, but it does not play well on 
> !amd64.
> >
> > Not that the current way is any better -- you need to lock/unlock
> > the rte while you do the lookup, and hold a refcount to the ifp
> > until the packet is queued. So how does my suggestion make
> > things worse ?
> >
> > cheers
> > luigi
> >
> >
> >>>
> >>> Considering that each lookup takes between 100..300ns if you are
> >>> lucky (not many misses, relatively empty table etc.), one could
> >>> reasonably do the lookup at most once per millisecond or so (just
> >>> reading 'ticks', no need for a nanotime() if you have a slow clock),
> >>> or whenever we get an error related to the socket, either in the
> >>> forward path (e.g. ifp points to an interface that is down) or in
> >>> the reverse path (e.g. a dupack because we sent a packet to the
> >>> wrong place).
> >> This sounds like "Hey, the kernel lookup is slow (which is true), let's
> >> make a hack and don't bother lookups".
> >> This approach gives us mtx-locked rte refcounts which are used (misused)
> >> in many places making things worse and decreasing the ability to fix the
> >> things up..
> >>> cheers
> >>> luigi
> >>> _______________________________________________
> >>> freebsd-net@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> >>>
>