From owner-freebsd-net@FreeBSD.ORG Thu Aug 14 07:41:41 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B8A837B401 for ; Thu, 14 Aug 2003 07:41:41 -0700 (PDT) Received: from mail.speakeasy.net (mail7.speakeasy.net [216.254.0.207]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9AFFF43FBD for ; Thu, 14 Aug 2003 07:41:40 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 15996 invoked from network); 14 Aug 2003 14:41:40 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender )encrypted SMTP for ; 14 Aug 2003 14:41:40 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.9/8.12.9) with ESMTP id h7EEfb9s065214; Thu, 14 Aug 2003 10:41:38 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.4 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20030813164935.M29363@odysseus.silby.com> Date: Thu, 14 Aug 2003 10:42:00 -0400 (EDT) From: John Baldwin To: Mike Silbersack cc: "freebsd-net@freebsd.org" Subject: RE: TCP socket shutdown race condition X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Aug 2003 14:41:41 -0000 On 13-Aug-2003 Mike Silbersack wrote: > > On Wed, 13 Aug 2003, Ed Maste wrote: > >> I think I've found the problem. >> >> crfree() is called from a lot of places (I counted at least 20) including >> sodealloc() in the socket code, crcopy() etc. It's called at splnet() from >> sodealloc(). I'm not sure what spl (if any) it might be called at from >> elsewhere, but certainly not splnet(). >> >> I'm going to investigate the correct solution for this and supply a >> PR / patch, but for now let me know if more information is desired. >> >> -ed > > Hm, sounds like you've done some solid debugging, and this should be easy > to fix. However, perhaps we need to think about this for a little bit > longer before we just switch to atomic operations or a spl call within the > cr functions... > > As I understand it, 4.x uses just a single lock on anything going into the > kernel, meaning that this type of problem should be prevented. However, > maybe there's something a lot more subtle which actually goes on. What > I'm thinking is that perhaps we're seeing a single entrypoint which > happens to call the cr* functions that should be more generally locked, > and that we're just seeing the problem in the cr functions. > > John, can you give us a quick overview of how 4.x SMP works so that we can > determine the correct solution here? My main question is this: If CPU 1 > is chugging along at a low SPL level and an interrupt comes in to CPU 2, > can it wrestle control away from the other CPU, and/or run the interrupt > handler concurrently? In that case, CPU 2 uses an IPI to "push" the interrupt over to CPU 1 since CPU 1 is in the kernel. CPU 2 will not handle an interrupt unless it can get the giant lock. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/