From owner-freebsd-net@FreeBSD.ORG Wed Aug 13 14:58:37 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EAD6337B401 for ; Wed, 13 Aug 2003 14:58:37 -0700 (PDT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id A9F4443FA3 for ; Wed, 13 Aug 2003 14:58:36 -0700 (PDT) (envelope-from silby@silby.com) Received: (qmail 29279 invoked from network); 13 Aug 2003 21:58:35 -0000 Received: from niwun.pair.com (HELO localhost) (209.68.2.70) by relay.pair.com with SMTP; 13 Aug 2003 21:58:35 -0000 X-pair-Authenticated: 209.68.2.70 Date: Wed, 13 Aug 2003 16:57:28 -0500 (CDT) From: Mike Silbersack To: Ed Maste In-Reply-To: Message-ID: <20030813164935.M29363@odysseus.silby.com> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: "'freebsd-net@freebsd.org'" cc: John Baldwin Subject: RE: TCP socket shutdown race condition X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Aug 2003 21:58:38 -0000 On Wed, 13 Aug 2003, Ed Maste wrote: > I think I've found the problem. > > crfree() is called from a lot of places (I counted at least 20) including > sodealloc() in the socket code, crcopy() etc. It's called at splnet() from > sodealloc(). I'm not sure what spl (if any) it might be called at from > elsewhere, but certainly not splnet(). > > I'm going to investigate the correct solution for this and supply a > PR / patch, but for now let me know if more information is desired. > > -ed Hm, sounds like you've done some solid debugging, and this should be easy to fix. However, perhaps we need to think about this for a little bit longer before we just switch to atomic operations or a spl call within the cr functions... As I understand it, 4.x uses just a single lock on anything going into the kernel, meaning that this type of problem should be prevented. However, maybe there's something a lot more subtle which actually goes on. What I'm thinking is that perhaps we're seeing a single entrypoint which happens to call the cr* functions that should be more generally locked, and that we're just seeing the problem in the cr functions. John, can you give us a quick overview of how 4.x SMP works so that we can determine the correct solution here? My main question is this: If CPU 1 is chugging along at a low SPL level and an interrupt comes in to CPU 2, can it wrestle control away from the other CPU, and/or run the interrupt handler concurrently? Thanks, Mike "Silby" Silbersack