From owner-freebsd-net@FreeBSD.ORG  Sat Aug  2 06:26:40 2003
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 02D2337B401
	for <freebsd-net@freebsd.org>; Sat,  2 Aug 2003 06:26:40 -0700 (PDT)
Received: from mail.sandvine.com (sandvine.com [199.243.201.138])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 464CC43FA3
	for <freebsd-net@freebsd.org>; Sat,  2 Aug 2003 06:26:39 -0700 (PDT)
	(envelope-from sloach@sandvine.com)
Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19)
	id <305LHGV5>; Sat, 2 Aug 2003 09:26:37 -0400
Message-ID: <FE045D4D9F7AED4CBFF1B3B813C8533701AE86BF@mail.sandvine.com>
From: Scot Loach <sloach@sandvine.com>
To: 'Mike Silbersack' <silby@silby.com>
Date: Sat, 2 Aug 2003 09:26:31 -0400 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
cc: "'freebsd-net@freebsd.org'" <freebsd-net@freebsd.org>
Subject: RE: TCP socket shutdown race condition
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Aug 2003 13:26:40 -0000

I don't think that's the problem, although it does seem suspicious.

Here's the struct ucred pointed to by the socket:

(kgdb) p *so.so_cred
$2 = {cr_ref = 3279453304, cr_uid = 3486088556, cr_ngroups = 1, cr_groups =
{
    0, 3276863080, 3277717504, 21162, 0, 0, 0, 0, 0, 4294967295, 4294967295,
    0, 0, 0, 0, 3279496516}, cr_uidinfo = 0x0}

This looks like garbage, but the cr_uidinfo pointer is null, and the cr_ref
of _this_ structure is 32 bits.

This doesn't look to me like a problem with the uidinfo, it looks to me like
the ucred structure has already been freed.

scot.

-----Original Message-----
From: Mike Silbersack [mailto:silby@silby.com]
Sent: Friday, August 01, 2003 10:51 PM
To: Scot Loach
Cc: 'freebsd-net@freebsd.org'
Subject: Re: TCP socket shutdown race condition


On Fri, 1 Aug 2003, Scot Loach wrote:

> Earlier this week one of our FreeBSD 4.7 boxes panic'd.  I've posted the
> stack trace at the end of this message.  Using google, I've found several
> references to this panic over the past three years, but it seems its never
> been taken to root cause.
>
> The box crashes because the cr_uidinfo pointer in the so_cred structure is
> null.  However, on closer inspection the so_cred structure is corrupted
> (cr_ref=3279453304 for example), so I'm guessing it has already been
freed.
> Looking closer at the socket, I see that the SS_NOFDREF flag is set, which
> supports my theory.  The tcpcb is in the CLOSED state, and has the SENTFIN
> flag set.

About how many concurrent connections are you pushing this machine to?

There's an unfortunate problem with uidinfo in 4.x:

struct uidinfo {
        LIST_ENTRY(uidinfo) ui_hash;
        rlim_t  ui_sbsize;              /* socket buffer space consumed */
        long    ui_proccnt;             /* number of processes */
        uid_t   ui_uid;                 /* uid */
        u_short ui_ref;                 /* reference count */
};

It doesn't look like we have any seatbelts preventing ui_ref from
overflowing, thus causing an early free on the way back down, thereby
making all the other references to the structure junk.  Can you try going
into kern_resource.c, finding the function uifind, and changing:

        if (uip == NULL)
                uip = uicreate(uid);
        uip->ui_ref++;
        return (uip);

to

        if (uip == NULL)
                uip = uicreate(uid);
        uip->ui_ref++;
	if (uip->ui_ref == 0)
		panic("ui_ref overflowed");
        return (uip);

That would confirm that it is the problem you're running into.  If that is
the case, please tell us so that we can transition to the political side
of the problem. :)

Mike "Silby" Silbersack