From owner-freebsd-stable@FreeBSD.ORG Tue Apr 21 22:25:41 2009 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 792391065674 for ; Tue, 21 Apr 2009 22:25:41 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 50E6B8FC18 for ; Tue, 21 Apr 2009 22:25:41 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id ECBD046B64; Tue, 21 Apr 2009 18:25:40 -0400 (EDT) Date: Tue, 21 Apr 2009 23:25:40 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Mikolaj Golub In-Reply-To: <86k55demux.fsf@kopusha.onet> Message-ID: References: <200904210524.n3L5O9YS086865@lava.sentex.ca> <200904211111.57295.jhb@freebsd.org> <200904211519.n3LFJFsk090691@lava.sentex.ca> <20090421153112.GA47589@edoofus.dev.vega.ru> <200904211610.n3LGAYll090970@lava.sentex.ca> <86k55demux.fsf@kopusha.onet> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@FreeBSD.org Subject: Re: RELENG_7 crash X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2009 22:25:41 -0000 On Wed, 22 Apr 2009, Mikolaj Golub wrote: > RW> There are several bugs here, one difficult to fix (lack of > RW> refcounting), but also stuff like ifp being derived from an interface > RW> number twice, but checked against NULL only the first time (line 85 > RW> checked for NULL, re-queried but no check line 88). Fixing the top > RW> bit of the function to only query the ifp once and check it for NULL > RW> then would be a good idea. More fundamentally, we do need to refcount > RW> ifnets when used from the management path, which is not all that hard > RW> a change, but preferably to try the easy way first given where we are > RW> in the release cycle. > > I was thinking a bit on this problem (the same issue was reported in > kern/132734) and eliminating double call of ifnet_byindex() was the first > thing I did. But I decided then that the proper fix would be to wrap all > critical code in sysctl_ifdata in IFNET_RLOCK/IFNET_RUNLOCK (the patch is > attached). It looks like I am wrong and my idea about how the kernel works > is oversimplified? :-) Unfortunately, I didn't manage to reproduce the panic > in my environment so I was not able to do some experiments and tests. The problem is that you can't hold IFNET_RLOCK() over the sysctl copyouts. I'm preparing a patch for 8-CURRENT that will add a refcount to struct ifnet to handle this case, but that isn't an MFC candidate for 7.2. If fixing the return value checks for ifnet_byindex() avoids the insta-panic Mike's running into, it's a better 7.2 change at this point. We can target the ifnet refcount changes for 7.3. Robert N M Watson Computer Laboratory University of Cambridge