From owner-freebsd-net@FreeBSD.ORG Fri Aug 1 19:53:06 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 208FF37B401 for ; Fri, 1 Aug 2003 19:53:06 -0700 (PDT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 5DC6543FB1 for ; Fri, 1 Aug 2003 19:53:05 -0700 (PDT) (envelope-from silby@silby.com) Received: (qmail 86112 invoked from network); 2 Aug 2003 02:53:02 -0000 Received: from niwun.pair.com (HELO localhost) (209.68.2.70) by relay.pair.com with SMTP; 2 Aug 2003 02:53:02 -0000 X-pair-Authenticated: 209.68.2.70 Date: Fri, 1 Aug 2003 21:51:01 -0500 (CDT) From: Mike Silbersack To: Scot Loach In-Reply-To: Message-ID: <20030801214411.A2165@odysseus.silby.com> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: "'freebsd-net@freebsd.org'" Subject: Re: TCP socket shutdown race condition X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Aug 2003 02:53:06 -0000 On Fri, 1 Aug 2003, Scot Loach wrote: > Earlier this week one of our FreeBSD 4.7 boxes panic'd. I've posted the > stack trace at the end of this message. Using google, I've found several > references to this panic over the past three years, but it seems its never > been taken to root cause. > > The box crashes because the cr_uidinfo pointer in the so_cred structure is > null. However, on closer inspection the so_cred structure is corrupted > (cr_ref=3279453304 for example), so I'm guessing it has already been freed. > Looking closer at the socket, I see that the SS_NOFDREF flag is set, which > supports my theory. The tcpcb is in the CLOSED state, and has the SENTFIN > flag set. About how many concurrent connections are you pushing this machine to? There's an unfortunate problem with uidinfo in 4.x: struct uidinfo { LIST_ENTRY(uidinfo) ui_hash; rlim_t ui_sbsize; /* socket buffer space consumed */ long ui_proccnt; /* number of processes */ uid_t ui_uid; /* uid */ u_short ui_ref; /* reference count */ }; It doesn't look like we have any seatbelts preventing ui_ref from overflowing, thus causing an early free on the way back down, thereby making all the other references to the structure junk. Can you try going into kern_resource.c, finding the function uifind, and changing: if (uip == NULL) uip = uicreate(uid); uip->ui_ref++; return (uip); to if (uip == NULL) uip = uicreate(uid); uip->ui_ref++; if (uip->ui_ref == 0) panic("ui_ref overflowed"); return (uip); That would confirm that it is the problem you're running into. If that is the case, please tell us so that we can transition to the political side of the problem. :) Mike "Silby" Silbersack