Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Aug 2003 08:54:59 -0400 
From:      Scot Loach <sloach@sandvine.com>
To:        "'freebsd-net@freebsd.org'" <freebsd-net@freebsd.org>
Subject:   TCP socket shutdown race condition
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C8533701AE86B5@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help
Earlier this week one of our FreeBSD 4.7 boxes panic'd.  I've posted the
stack trace at the end of this message.  Using google, I've found several
references to this panic over the past three years, but it seems its never
been taken to root cause.

The box crashes because the cr_uidinfo pointer in the so_cred structure is
null.  However, on closer inspection the so_cred structure is corrupted
(cr_ref=3279453304 for example), so I'm guessing it has already been freed.
Looking closer at the socket, I see that the SS_NOFDREF flag is set, which
supports my theory.  The tcpcb is in the CLOSED state, and has the SENTFIN
flag set.

I was able to reproduce this crash, although when I reproduced it it was the
2msl timer that triggered it instead of the rexmt timer, and the socket was
in the TIME_WAIT state.  To reproduce it, I ran a server on a SMP box that
accepts incoming TCP connections, adds each socket to a kqueue, reads data
and calls shutdown(), then calls close(), and I had another box making
thousands of connections per second.  Since kqueue and shutdown were
involved, there's a slight chance that this could be related to kern/54331.
However I still need to fine-tune my test to narrow down the problem and
make it happen faster (it took over 12 hours to reproduce).

Any ideas on what the problem might be?  Suggestions on how I can debug
this?



#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:493
#1  0xc01ba7e8 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:322
#2  0xc01bad11 in panic (fmt=0xc0327ad9 "%s")
    at /usr/src/sys/kern/kern_shutdown.c:608
#3  0xc02d414e in trap_fatal (frame=0xff807ca8, eva=48)
    at /usr/src/sys/i386/i386/trap.c:974
#4  0xc02d3d7d in trap_pfault (frame=0xff807ca8, usermode=0, eva=48)
    at /usr/src/sys/i386/i386/trap.c:867
#5  0xc02d381f in trap (frame={tf_fs = -820445160, tf_es = 16,
      tf_ds = -8388592, tf_edi = 0, tf_esi = -1070280516, tf_ebp = -8356624,
      tf_isp = -8356652, tf_ebx = -1, tf_edx = 1778434048, tf_ecx =
-93045248,
      tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071185923, tf_cs =
8,
      tf_eflags = 66054, tf_esp = -820398592, tf_ss = -820398592})
    at /usr/src/sys/i386/i386/trap.c:466
#6  0xc026fffd in acquire_lock (lk=0xc034d0bc) at machine/globals.h:114
#7  0xc02749e4 in softdep_update_inodeblock (ip=0xcf19b600, bp=0xdb8ca0dc,
    waitfor=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3813
#8  0xc026f03a in ffs_update (vp=0xfa743e00, waitfor=0)
    at /usr/src/sys/ufs/ffs/ffs_inode.c:106
#9  0xc0278437 in ffs_sync (mp=0xcf0cfa00, waitfor=2, cred=0xc387b800,
    p=0xc0378880) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1025
#10 0xc01f1e9b in sync (p=0xc0378880, uap=0x0)
    at /usr/src/sys/kern/vfs_syscalls.c:576
#11 0xc01ba55b in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:241
#12 0xc01bad11 in panic (fmt=0xc0327ad9 "%s")
    at /usr/src/sys/kern/kern_shutdown.c:608
#13 0xc02d414e in trap_fatal (frame=0xff807e84, eva=8)
    at /usr/src/sys/i386/i386/trap.c:974
#14 0xc02d3d7d in trap_pfault (frame=0xff807e84, usermode=0, eva=8)
    at /usr/src/sys/i386/i386/trap.c:867
#15 0xc02d381f in trap (frame={tf_fs = -1071579112, tf_es = -819986416,
      tf_ds = -818085872, tf_edi = 0, tf_esi = 2147483647, tf_ebp =
-8356140,
      tf_isp = -8356176, tf_ebx = -1, tf_edx = 1644167168, tf_ecx = 0,
      tf_eax = 1644167168, tf_trapno = 12, tf_err = 0, tf_eip = -1071930883,
      tf_cs = 8, tf_eflags = 66054, tf_esp = -272018704, tf_ss =
-272018816})
    at /usr/src/sys/i386/i386/trap.c:466
#16 0xc01ba1fd in chgsbsize (uip=0x0, hiwat=0xefc952f4, to=0,
    max=9223372036854775807) at /usr/src/sys/kern/kern_resource.c:780
#17 0xc01e0243 in sbrelease (sb=0xefc952f0, so=0xefc95280)
    at /usr/src/sys/kern/uipc_socket2.c:437
#18 0xc01dd457 in sofree (so=0xefc95280) at
/usr/src/sys/kern/uipc_socket.c:262
#19 0xc020d44c in in_pcbdetach (inp=0xf24437e0)
    at /usr/src/sys/netinet/in_pcb.c:567
#20 0xc021e97a in tcp_close (tp=0xf24438a0)
    at /usr/src/sys/netinet/tcp_subr.c:754
#21 0xc021e7a3 in tcp_drop (tp=0xf24438a0, errno=60)
    at /usr/src/sys/netinet/tcp_subr.c:604
#22 0xc0220eb6 in tcp_timer_rexmt (xtp=0xf24438a0)
    at /usr/src/sys/netinet/tcp_timer.c:379
#23 0xc01c16de in softclock () at /usr/src/sys/kern/kern_timeout.c:131
#24 0xc02c2dfb in doreti_swi ()



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701AE86B5>