From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 4 08:21:50 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1129C16A4B3; Sat, 4 Oct 2003 08:21:50 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1FDF943FCB; Sat, 4 Oct 2003 08:21:49 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.9p2/8.12.9) with ESMTP id h94FLkMg006995; Sat, 4 Oct 2003 11:21:46 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)h94FLjoK006992; Sat, 4 Oct 2003 11:21:45 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 4 Oct 2003 11:21:45 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Brian Fundakowski Feldman In-Reply-To: <200310040538.h945cDxp014188@green.bikeshed.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: hackers@FreeBSD.org Subject: Re: Is socket buffer locking as questionable as it seems? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Oct 2003 15:21:50 -0000 On Sat, 4 Oct 2003, Brian Fundakowski Feldman wrote: > I keep getting these panics on my SMP box (no backtrace or DDB or crash > dump of course, because panic() == hang to FreeBSD these days): panic: > receive: m == 0 so->so_rcv.sb_cc == 52 From what I can tell, all sorts > of socket-related calls are "MP-safe" and yet never even come close to > locking the socket buffer. From what I can tell, the easiest way for > this occur would be sbrelease() being called from somewhere that it's > supposed to, but doesn't, have sblock(). Has anyone seen these, or a > place to start looking? Maybe a way to get panics to stop hanging the > machine? TIA if anyone has some enlightenment. The system calls are marked MPSAFE in the case of the socket calls because the grabbing of Giant has been pushed down into the system call, as opposed to Giant being grabbed by the system call code itself. Giant should be held across all the relevant socket-related events -- if you find a place where it's not, send some details :-). As you observe, there is currently no socket locking in the source tree, although I'm hopeful that will be remedied in the next couple of months. The lower levels of the IP stack can be run Giant-free at this point, although my local patches to run multiple input paths in parallel runs into a panic due to insufficient locking in ip_forward() (bug report already filed with Sam). One of the conclusions from the recent developer summit was that a big focus needs to be placed on interrupt processing latency and device driver improvements so that we get the benefits of finger-grained locking. Peter's has picked up the task of doing a driver API sweep to provide better facilities for doing this. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories