From owner-freebsd-net@FreeBSD.ORG Sat Oct 1 06:15:47 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 485F9106564A; Sat, 1 Oct 2011 06:15:47 +0000 (UTC) (envelope-from s.dave.jones@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id BA51F8FC08; Sat, 1 Oct 2011 06:15:46 +0000 (UTC) Received: by vws11 with SMTP id 11so2582828vws.13 for ; Fri, 30 Sep 2011 23:15:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=xziKnvwtKl9ezSgUwcQXBP0wWVUpu43G5zpi+5TGs5E=; b=kGEdErG36N/X6T5SmsHGeG1dI5G1YbKT2dj8HBGU+V45YYAv62bAKGWDEJBa9jFfUz ZlE1o/hGW/TCboF6p0A7rr+XUNN5MEo4Qm6/0qTiUrDWTADX77telhayxbXtOeO4O7xh UVwv8r41ZfOZ3kb23GxP3JYILuq8ODyPN4Kwk= MIME-Version: 1.0 Received: by 10.52.94.109 with SMTP id db13mr8924081vdb.318.1317449745940; Fri, 30 Sep 2011 23:15:45 -0700 (PDT) Received: by 10.52.107.194 with HTTP; Fri, 30 Sep 2011 23:15:45 -0700 (PDT) In-Reply-To: References: <8662kcigif.fsf@kopusha.home.net> Date: Sat, 1 Oct 2011 14:15:45 +0800 Message-ID: From: dave jones To: Robert Watson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Mikolaj Golub , "freebsd-net@freebsd.org" , Adrian Chadd , "K. Macy" , Arnaud Lacombe Subject: Re: Kernel panic on FreeBSD 9.0-beta2 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Oct 2011 06:15:47 -0000 On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote: > > On Wed, 28 Sep 2011, Mikolaj Golub wrote: > >> On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote: >> >> KM> Sorry, didn't look at the images (limited bw), I've seen something K= M> >> like this before in timewait. This "can't happen" with UDP so will be KM= > >> interested in learning more about the bug. >> >> The panic can be easily triggered by this: > > Hi: > > Just catching up on this thread. =A0I think the analysis here is generall= y > right: in 9.0, you're much more likely to see an inpcb with its in_socket > pointer cleared in the hash list than in prior releases, and > in_pcbbind_setup() trips over this. > > However, at least on first glance (and from the perspective of invariants > here), I think the bug is actualy that in_pcbbind_setup() is asking > in_pcblookup_local() for an inpcb and then access the returned inpcb's > in_socket pointer without acquiring a lock on the inpcb. =A0Structurally,= it > can't acquire this lock for lock order reasons -- it already holds the lo= ck > on its own inpcb. =A0Therefore, we should only access fields that are saf= e to > follow in an inpcb when you hold a reference via the hash lock and not a > lock on the inpcb itself, which appears generally OK (+/-) for all the > fields in that clause but the t->inp_socket->so_options dereference. > > A preferred fix would cache the SO_REUSEPORT flag in an inpcb-layer field= , > such as inp_flags2, giving us access to its value without having to walk > into the attached (or not) socket. > > This raises another structural question, which is whether we need a new > inp_foo flags field that is protected explicitly by the hash lock, and no= t > by the inpcb lock, which could hold fields relevant to address binding. = =A0I > don't think we need to solve that problem in this context, as a slightly > race on SO_REUSEPORT is likely acceptable. > > The suggested fix does perform the desired function of explicitly detachi= ng > the inpcb from the hash list before the socket is disconnected from the > inpcb. However, it's incomplete in that the invariant that's being broken= is > also relied on for other protocols (such as raw sockets). =A0The correct > invariant is that inp_socket is safe to follow unconditionally if an inpc= b > is locked and INP_DROPPED isn't set -- the bug is in "locked" not in > "INP_DROPPED", which is why I think this is the wrong fix, even though it > prevents a panic :-). Hello Robert, Thank you for taking your valuable time to find out the problem. Since I don't have idea about network internals, would you have a patch about this? I'd be glad to test it, thanks again. > Robert Best regards, Dave.