From owner-freebsd-net@FreeBSD.ORG  Fri Jan 27 02:05:29 2006
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D04B816A420
	for <freebsd-net@freebsd.org>; Fri, 27 Jan 2006 02:05:29 +0000 (GMT)
	(envelope-from craig@olyun.gank.org)
Received: from ion.gank.org (ion.gank.org [69.55.238.164])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9C47343D70
	for <freebsd-net@freebsd.org>; Fri, 27 Jan 2006 02:05:29 +0000 (GMT)
	(envelope-from craig@olyun.gank.org)
Received: by ion.gank.org (mail, from userid 1001)
	id 330D12AA01; Thu, 26 Jan 2006 20:05:29 -0600 (CST)
Date: Thu, 26 Jan 2006 20:05:28 -0600
From: Craig Boston <craig@olyun.gank.org>
To: freebsd-net@freebsd.org
Message-ID: <20060127020528.GA18728@nowhere>
References: <20060125152032.GA40581@nowhere>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20060125152032.GA40581@nowhere>
User-Agent: Mutt/1.4.2.1i
Subject: Re: Race condition in ip6_getpmtu (actually gif)?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Jan 2006 02:05:29 -0000

On Wed, Jan 25, 2006 at 09:20:33AM -0600, craig@olyun.gank.org wrote:
> I seem to be running into a race condition in ip6_getpmtu.  I've been
> having sporadic panics recently -- sometimes the machine will last a
> week, sometimes it'll panic twice in a day.  The backtrace is always the
> same:
>
> -- snip --

After some more analysis I think this is a problem in in6_gif_output.
It keeps a cached route in its softc.  After ip6_output completes, if
IFF_LINK0 is not set, the cached route is freed.  This works fine so
long as in6_gif_output is not reentered.

My current theory is that a higher priority kernel thread is preempting
while we're somewhere in ip6_getmtu.  Say, an incoming IPv4 ICMP packet
might cause the NIC driver to call ether_input from an ithread.  Since
IPv4 is marked NETISR_MPSAFE it will be dispatched from the ithread,
filter all the way down to icmp_input, which decides that an ICMP
reply needs to be sent a host across the tunnel.  It goes to icmp_send,
which passes it to ip_output.  The destination is a gif interface, so
into gif_output we go, and BAM!  We just re-entered in6_gif_output while
still in the ithread.

When this happens, the route cached in the sc is still valid, so a new
one is not allocated.  After ip6_output completes, the route is freed
and set to NULL.  Later, context returns to the original thread, and
ip6_getpmtu (called from ip6_output) has just had its route pulled out
from under it...  It's a longshot, but I think it is possible and that
would certainly explain why it sometimes takes millions of packets to
trigger.

Attached is a quick hack to protect the cached route with a mutex.  A
better fix with less overhead would be to allocate the route in a local
variable on the stack, and only copy it to the softc if route caching is
enabled.  I'll run for a couple weeks with the patch and file a PR if
that fixes it.

If I have time I'll also try to set up a test machine and attempt to
detect if ip6_gif_output is indeed reentered, and if so how.

I think this should only be a problem for gif when IPv4 is the inner
protocol and IPv6 is the outer.  Since IPv4 is MPSAFE and v6 is not, gif
might sometimes inadvertently cause v6 code that hasn't been fully
locked to be re-entered or otherwise called without GIANT held.  There
may be other problems that are less likely to occur...

Craig