From owner-freebsd-net@FreeBSD.ORG Sun Oct 11 20:30:58 2009 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49B0E1065672; Sun, 11 Oct 2009 20:30:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 246F68FC0A; Sun, 11 Oct 2009 20:30:58 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id C582546B09; Sun, 11 Oct 2009 16:30:57 -0400 (EDT) Date: Sun, 11 Oct 2009 21:30:57 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Harsha Srinath In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org, net@freebsd.org Subject: Re: Page fault in IFNET_WLOCK_ASSERT [if.c and pccbb.c] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Oct 2009 20:30:58 -0000 On Sun, 11 Oct 2009, Harsha Srinath wrote: > I'm running an updated HEAD kernel and got a page fault in > ifindex_alloc_locked() in if.c. I figured that the problem was caused by the > (pluggable) network card of my laptop and found that during the > initialization of the interface, cb_event_thread() takes the giant lock and > up the call chain in if_alloc(), we call IFNET_WLOCK() and assert on the RW > locks in ifindex_alloc_locked(). It is in the asset macro > IFNET_WLOCK_ASSERT() I see the page fault. > > I looked up some recent related changes and noticed the following comment in > one of the check-ins in- > http://svn.freebsd.org/viewvc/base/head/sys/net/if.c > > "Break out allocation of new ifindex values from if_alloc() and if_vmove(), > and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK, > and add missing IFNET_WLOCK in if_alloc(). This does not close all known > races in this code." > > So I think I have hit one of those fault conditions. > > Apparently the giant lock code was removed and added back recently - > http://svn.freebsd.org/viewvc/base/head/sys/dev/pccbb/pccbb.c > > I believe that the root cause is that ifnet_rw is a non sleepable exclusive > RW lock and we have taken the exclusive sleep mutex Giant before that. > > Any pointers and suggestions are welcome. Hi Harsha-- Giant is a bit special in that the long-term sleep code in the kernel knows to drop it when sleeping, and re-acquire when waking up. So, unlike all other mutexes, it should be OK to hold it in this case, as Giant will simply get dropped if the kernel has to sleep waiting on a sleepable lock. This is because, historically in FreeBSD 3.x/4.x, the kernel was protected by a single spinlock, which would get released whenever the kernel stopped executing, such as during an I/O sleep. On the whole, Giant has disappeared from the modern kernel, but where it is used, it retains those curious historic properties. To break things down a bit further, IFNET_WLOCK is, itself, a bit special: notice that in FreeBSD 8, it's actually two locks, a sleep lock, and a mutex, which must both be acquired exclusively to ensure mutual exclusion. if_alloc() and associated calls are also sleepable because they perform potentially sleeping memory allocation (M_WAITOK), so it's an invariant of any code calling interface allocation that it must be able to tolerate a sleep. Do you have a copy of the stack trace and fault information handy? In my experience, a NULL pointer deref or other page fault in the locking code for a global lock is almost always corrupted thread state, perhaps due to tripping over another thread having locked a corrupted/freed/uninitialized lock. We might be able to track that down by tracing other threads that were in execution at the time of the panic. Robert