Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Aug 2004 22:56:58 -0500
From:      Richard Todd <rmtodd@ichotolot.servalan.com>
To:        freebsd-current@freebsd.org
Subject:   Re: Panic 'kernel trap doesn't have ucred' in last night's -current
Message-ID:  <E1C1zlS-00039w-D6@servalan.servalan.com>
References:  <20040820211618.62A1C7E4@mx2.synetsystems.com>

next in thread | previous in thread | raw e-mail | index | archive | help
In the freebsd-current mailing list I wrote last week:
>Hi.  Upgraded to -current last night and got the following panic.  The panic
>seems to be fairly repeatable and is triggered by a minute or so's worth of
>database activity with mysqld.  The version of mysql is 3.23.58_1 from ports,
>linked against libpthread; using libmap.conf to force usage of libc_r avoids
>the panic, so it's definitely a thread/KSE issue.  (Also note the oddness
>of the backtrace in the frames between doreti_ast and sched_switch -- is this
>normal, is the stack mangled, or is gdb just hallucinating?)  My kernel config

I've had a chance to do a bit more followup on this, and have found the 
following:

1) The bug is still there in -current as of last night.

2) I can fairly reliably reproduce the bug by starting mysqld (with it using
libpthread.so) and start several mysqldumps of the same database as near 
simultaneously as possible.  Got a coredump from this most recent attempt,
about which more below.

3) Yeah, apparently gdb was a little confused w.r.t. the stack backtrace, as
a backtrace from DDB on the console at the time of the panic looks different,
and more plausible: the DDB backtrace shows that thread_alloc_spare() called
crhold() (which it does), whereas the gdb backtrace is under the delusion
that thread_alloc_spare() calls thr_suspend().

4) Looking at the (sane) DDB backtrace as well as poking around in upper 
frames, the final sequence of events leading to trouble is now clearer:

   1. ast calls thread_user_enter().  The thread passed to thread_user_enter,
      thread 0xc28c39a0, has a NULL td_ucred pointer.  (Why?  Good question.)
   2. thread_user_enter() calls thread_alloc_spare().
   3. thread_alloc_spare() does its business, creating another thread, and 
      at the end tries to give the new thread a reference to the orig. thread's
      cred by calling crhold().  Unfortunately, the orig. thread's td_ucred
      is NULL.
   4. crhold() cheerfully takes the NULL pointer offered to it and tries to
      lock the mutex pointed to by the cred structure at 0x00000000.  This 
      causes an immediate segfault, hence the trap 12.
   5. trap() goes about its business, notices the current thread has no 
      td_ucred, and panics.

So the real question is how come that thread had a NULL td_ucred when it got
there?  Wish I knew.   If it'll help anyone figure this out, I had ktr(4) 
enabled and logging a bunch of stuff (KTR_GEN|KTR_LOCK|KTR_PROC) and can
supply a ktrdump from that core file; I didn't include it here because it's
rather long.


Script started on Mon Aug 30 22:01:46 2004
ichotolot# gdb6 -k kernel.debug /var/crash/vmcore.106
GNU gdb 20040803 [GDB v6.x for FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-portbld-freebsd6.0"...
panic: kernel trap doesn't have ucred
panic messages:
---
panic: kernel trap doesn't have ucred
cpuid = 0
KDB: stack backtrace:
kdb_backtrace(100,c28c36e0,c,c28c36e0,0) at kdb_backtrace+0x29
panic(c083022d,6c,0,0,df512c9c) at panic+0x114
trap(c0600018,c0970010,10,c28c36e0,c28c36e0) at trap+0x2f0
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0xc060e8e3, esp = 0xdf512cbc, ebp = 0xdf512ccc ---
crhold(0,c28c39e4,c4,c2886e40,c28c36e0) at crhold+0x13
thread_alloc_spare(c28c36e0) at thread_alloc_spare+0x3b
thread_user_enter(c25e2c40,c28c36e0) at thread_user_enter+0x63
ast(df512d48) at ast+0x94
doreti_ast() at doreti_ast+0x17
boot() called on cpu#0
Uptime: 1h22m58s
Dumping 638 MB
[CTRL-C to abort]  16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624
---
#0  doadump () at pcpu.h:159
159		__asm __volatile("movl %%fs:0,%0" : "=r" (td));
doadump () at pcpu.h:159
159		__asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:159
#1  0xc06115dc in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:384
#2  0xc06118f5 in panic (fmt=0xc083022d "kernel trap doesn't have ucred")
    at /usr/src/sys/kern/kern_shutdown.c:540
#3  0xc07a187c in trap (frame=
      {tf_fs = -1067450344, tf_es = -1063845872, tf_ds = 16, tf_edi = -1030998304, tf_esi = -1030998304, tf_ebp = -548328244, tf_isp = -548328280, tf_ebx = 0, tf_edx = 4, tf_ecx = 0, tf_eax = -1034015680, tf_trapno = 12, tf_err = 0, tf_eip = -1067390749, tf_cs = 8, tf_eflags = 66118, tf_esp = 0, tf_ss = -1065293200})
    at /usr/src/sys/i386/i386/trap.c:413
#4  0xc0790bda in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#5  0xc0600018 in thr_suspend (td=0xc28c36e0, uap=0x0)
    at /usr/src/sys/kern/kern_thr.c:282
#6  0xc060189f in thread_alloc_spare (td=0x6cb)
    at /usr/src/sys/kern/kern_kse.c:1021
#7  0xc0601bbf in thread_user_enter (p=0x0, td=0xc28c39a0)
    at /usr/src/sys/kern/kern_kse.c:1170
#8  0xc062f640 in ast (framep=0xdf512d48) at /usr/src/sys/kern/subr_trap.c:166
#9  0xc079154d in doreti_ast () at /usr/src/sys/i386/i386/exception.s:294
#10 0xdf512d48 in ?? ()
#11 0x0000002f in ?? ()
#12 0x0000002f in ?? ()
#13 0x0000002f in ?? ()
#14 0x0b13f400 in ?? ()
#15 0x08340000 in ?? ()
#16 0xbfbfe3d8 in ?? ()
#17 0xdf512d74 in ?? ()
#18 0x2033d81c in ?? ()
#19 0x0b108a00 in ?? ()
#20 0x0833a900 in ?? ()
#21 0x00000000 in ?? ()
#22 0x00000016 in ?? ()
#23 0x00000002 in ?? ()
#24 0x20339d2f in ?? ()
#25 0x0000001f in ?? ()
#26 0x00000202 in ?? ()
#27 0xbfbfe3ac in ?? ()
#28 0x0000002f in ?? ()
#29 0x00000000 in ?? ()
#30 0x00000000 in ?? ()
#31 0x00000000 in ?? ()
#32 0x00000000 in ?? ()
#33 0x1ef8e000 in ?? ()
#34 0xc25e4c60 in ?? ()
#35 0xc1ca05f0 in ?? ()
#36 0xdf512c9c in ?? ()
#37 0xdf512c84 in ?? ()
#38 0xc28c36e0 in ?? ()
#39 0xc062141b in sched_switch (td=0x2033d81c, newtd=0xb13f400)
---Type <return> to continue, or q <return> to quit---q
 at /usr/src/sys/Quit
(kgdb) fr 7
#7  0xc0601bbf in thread_user_enter (p=0x0, td=0xc28c39a0)
    at /usr/src/sys/kern/kern_kse.c:1170
1170			thread_alloc_spare(td);
(kgdb) p td
$1 = (struct thread *) 0xc28c39a0
(kgdb) p *td
$2 = {td_proc = 0xc25e2c40, td_ksegrp = 0x0, td_plist = {
    tqe_next = 0xc2778420, tqe_prev = 0xc28c32c8}, td_kglist = {
    tqe_next = 0x0, tqe_prev = 0xc28c32d0}, td_slpq = {tqe_next = 0xc200ab00, 
    tqe_prev = 0xc200a178}, td_lockq = {tqe_next = 0x0, tqe_prev = 0x0}, 
  td_runq = {tqe_next = 0x0, tqe_prev = 0xc08726cc}, td_selq = {
    tqh_first = 0x0, tqh_last = 0xc28c39d0}, td_sleepqueue = 0xc1fbace0, 
  td_turnstile = 0xc23c0bc0, td_tid = 100205, td_flags = 0, td_inhibitors = 0, 
  td_pflags = 0, td_last_kse = 0x0, td_kse = 0x0, td_dupfd = 0, 
  td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 0 '\0', td_oncpu = 0 '\0', 
  td_locks = 0, td_blocked = 0x0, td_ithd = 0x0, td_lockname = 0x0, 
  td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, 
  td_intr_nesting_level = 0, td_pinned = 0, td_mailbox = 0x0, td_ucred = 0x0, 
  td_standin = 0x0, td_prticks = 0, td_upcall = 0x0, td_sticks = 0, 
  td_uuticks = 0, td_usticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {
      0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_siglist = {
    __bits = {0, 0, 0, 0}}, td_waitset = 0x0, td_umtx = {tqe_next = 0x0, 
    tqe_prev = 0x0}, td_generation = 0, td_sigstk = {ss_sp = 0x0, ss_size = 0, 
    ss_flags = 0}, td_kflags = 0, td_xsig = 0, td_profil_addr = 0, 
  td_profil_ticks = 0, td_base_pri = 160 ' ', td_priority = 160 ' ', 
  td_pcb = 0xdf518da0, td_state = TDS_INACTIVE, td_retval = {1, 137600624}, 
  td_slpcallout = {c_links = {sle = {sle_next = 0xc28c33dc}, tqe = {
        tqe_next = 0xc28c33dc, tqe_prev = 0xce9ed370}}, c_time = 500970, 
    c_arg = 0xc28c39a0, c_func = 0, c_flags = 8}, td_frame = 0xdf518d48, 
  td_kstack_obj = 0xc2895948, td_kstack = 3746656256, td_kstack_pages = 2, 
  td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0, 
  td_critnest = 1, td_md = {md_savecrit = 582}, td_sched = 0xc28c3afc}
(kgdb) p td->td_ucred
$3 = (struct ucred *) 0x0
(kgdb) q
ichotolot# exit
exit

Script done on Mon Aug 30 22:04:29 2004



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1C1zlS-00039w-D6>