Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Jun 2009 20:25:50 +0300
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        freebsd-stable@freebsd.org
Subject:   Assertion lock == sq->sq_lock failed at /usr/src/sys/kern/subr_sleepqueue.c:327
Message-ID:  <86d49eocwx.fsf@kopusha.onet>

next in thread | raw e-mail | index | archive | help
--=-=-=

After recent upgrade (on June 3) of my 7-stable host (WITNESS is enabled, the
previous build was Apr 26), I have been experienced panics when starting some
our home made application:

  Architecture: i386 
  Architecture Version: 2 
  Dump Length: 73797632B (70 MB) 
  Blocksize: 512 
  Dumptime: Mon Jun  8 15:29:14 2009 
  Hostname: zhuzha.ua1 
  Magic: FreeBSD Kernel Dump 
  Version String: FreeBSD 7.2-STABLE #18: Wed Jun  3 14:28:49 EEST 2009 
    root@zhuzha.ua1:/usr/obj/usr/src/sys/DEBUG 
  Panic String: Assertion lock == sq->sq_lock failed at /usr/src/sys/kern/subr_sleepqueue.c:327 
  Dump Parity: 1277108796 
  Bounds: 18 
 
Narrowing down the problem I have written a simple test program (in attaches)
that models in simplified way the behaviour or our application and reproduces
the crash. There are two threads. One of the threads is doing vfork/exec to
start child process while the other is monitoring the status of the child
calling wait4().

At the moment of the crash the parent vfork thread is sleeping on
wchan=0xc4d17ae0 (td->td_proc) with lock=0xc4d178b8 waiting for the child to
exec. The parent wait4 thread with lock=0xc4d17b70 and the same wchan calls
sleepq_lookup(wchan) to check if the wait channel already have a sleep queue
associated with it, finds the queue of the vfork thread, tries to insert the
current thread into this sleep queue and fails on assertion lock==sq->sq_lock.

wait4 treead:

(kgdb) thr 128
[Switching to thread 128 (Thread 100164)]#0  doadump () at pcpu.h:196
196     in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc0496509 in db_fncall (dummy1=-1061924641, dummy2=0, dummy3=-1, dummy4=0xe69a089c "")
    at /usr/src/sys/ddb/db_command.c:516
#2  0xc0496abf in db_command (last_cmdp=0xc0c9ab54, cmd_table=0x0, dopager=0)
    at /usr/src/sys/ddb/db_command.c:413
#3  0xc0496b34 in db_command_script (command=0xc0c9baa4 "call doadump")
    at /usr/src/sys/ddb/db_command.c:484
#4  0xc049a3a0 in db_script_exec (scriptname=0xe69a09a8 "kdb.enter.panic", warnifnotfound=Variable "warnifnotfound" is not available.
)
    at /usr/src/sys/ddb/db_script.c:302
#5  0xc049a47d in db_script_kdbenter (eventname=0xc0b8fa85 "panic") at /usr/src/sys/ddb/db_script.c:324
#6  0xc0498388 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:227
#7  0xc0810d36 in kdb_trap (type=3, code=0, tf=0xe69a0ae0) at /usr/src/sys/kern/subr_kdb.c:524
#8  0xc0add6ab in trap (frame=0xe69a0ae0) at /usr/src/sys/i386/i386/trap.c:688
#9  0xc0ac159b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#10 0xc0810eba in kdb_enter_why (why=0xc0b8fa85 "panic", msg=0xc0b8fa85 "panic") at cpufunc.h:60
#11 0xc07e4216 in panic (fmt=0xc0b5ecd3 "Assertion %s failed at %s:%d")
    at /usr/src/sys/kern/kern_shutdown.c:557
#12 0xc08194dc in sleepq_add (wchan=0xc4d17ae0, lock=0xc4d17b70, wmesg=0xc0b959c1 "wait", flags=256, 
    queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:327
#13 0xc07ec94b in _sleep (ident=0xc4d17ae0, lock=Variable "lock" is not available.
) at /usr/src/sys/kern/kern_synch.c:203
#14 0xc07bd8e6 in kern_wait (td=0xc41e8480, pid=-1, status=0xe69a0c2c, options=Variable "options" is not available.
)
    at /usr/src/sys/kern/kern_exit.c:902
#15 0xc07bd95b in wait4 (td=0xc41e8480, uap=0xe69a0cfc) at /usr/src/sys/kern/kern_exit.c:688
#16 0xc0adce23 in syscall (frame=0xe69a0d38) at /usr/src/sys/i386/i386/trap.c:1090
#17 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#18 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 12
#12 0xc08194dc in sleepq_add (wchan=0xc4d17ae0, lock=0xc4d17b70, wmesg=0xc0b959c1 "wait", flags=256, 
    queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:327
327                     MPASS(lock == sq->sq_lock);
(kgdb) p lock
$1 = (struct lock_object *) 0xc4d17b70
(kgdb) p sq->sq_lock
$2 = (struct lock_object *) 0xc4d178b8
(kgdb) p *lock
$3 = {lo_name = 0xc0b8e532 "process lock", lo_type = 0xc0b8e532 "process lock", lo_flags = 21168128, 
  lo_witness_data = {lod_list = {stqe_next = 0xc0ce4840}, lod_witness = 0xc0ce4840}}
(kgdb) p *sq->sq_lock
$4 = {lo_name = 0xc0b8e532 "process lock", lo_type = 0xc0b8e532 "process lock", lo_flags = 21168128, 
  lo_witness_data = {lod_list = {stqe_next = 0xc0ce4840}, lod_witness = 0xc0ce4840}}

vfork thread of the parent process:

(kgdb) thr 127
[Switching to thread 127 (Thread 100162)]#0  sched_switch (td=0xc41e86c0, newtd=Variable "newtd" is not available.
)
    at /usr/src/sys/kern/sched_ule.c:1944
1944                    cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xc41e86c0, newtd=Variable "newtd" is not available.
) at /usr/src/sys/kern/sched_ule.c:1944
#1  0xc07ec4c3 in mi_switch (flags=Variable "flags" is not available.
) at /usr/src/sys/kern/kern_synch.c:444
#2  0xc0818bd2 in sleepq_switch (wchan=0xc4d17ae0) at /usr/src/sys/kern/subr_sleepqueue.c:497
#3  0xc0819800 in sleepq_wait (wchan=0xc4d17ae0) at /usr/src/sys/kern/subr_sleepqueue.c:580
#4  0xc07eca69 in _sleep (ident=0xc4d17ae0, lock=0xc4d178b8, priority=92, wmesg=0xc0b8bacb "ppwait", 
    timo=0) at /usr/src/sys/kern/kern_synch.c:230
#5  0xc07c0631 in fork1 (td=0xc41e86c0, flags=-2147483596, pages=0, procp=0xe699ac78)
    at /usr/src/sys/kern/kern_fork.c:747
#6  0xc07c07c9 in vfork (td=0xc41e86c0, uap=0xe699acfc) at /usr/src/sys/kern/kern_fork.c:124
#7  0xc0adce23 in syscall (frame=0xe699ad38) at /usr/src/sys/i386/i386/trap.c:1090
#8  0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#9  0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

And here is the child process:

(kgdb) thr 129
[Switching to thread 129 (Thread 100163)]#0  sched_switch (td=0xc4d1d900, newtd=Variable "newtd" is not available.
)
    at /usr/src/sys/kern/sched_ule.c:1944
1944                    cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xc4d1d900, newtd=Variable "newtd" is not available.
) at /usr/src/sys/kern/sched_ule.c:1944
#1  0xc07ec4c3 in mi_switch (flags=Variable "flags" is not available.
) at /usr/src/sys/kern/kern_synch.c:444
#2  0xc0818bd2 in sleepq_switch (wchan=0xc0ccdac4) at /usr/src/sys/kern/subr_sleepqueue.c:497
#3  0xc0818e6f in sleepq_catch_signals (wchan=0xc0ccdac4) at /usr/src/sys/kern/subr_sleepqueue.c:417
#4  0xc0819647 in sleepq_timedwait_sig (wchan=0xc0ccdac4) at /usr/src/sys/kern/subr_sleepqueue.c:631
#5  0xc07eca31 in _sleep (ident=0xc0ccdac4, lock=0x0, priority=348, wmesg=0xc0b90bcd "nanslp", timo=2)
    at /usr/src/sys/kern/kern_synch.c:224
#6  0xc07f3f51 in kern_nanosleep (td=0xc4d1d900, rqt=0xe699dc64, rmt=0xe699dc6c)
    at /usr/src/sys/kern/kern_time.c:373
#7  0xc07f594f in nanosleep (td=0xc4d1d900, uap=0xe699dcfc) at /usr/src/sys/kern/kern_time.c:415
#8  0xc0adce23 in syscall (frame=0xe699dd38) at /usr/src/sys/i386/i386/trap.c:1090
#9  0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#10 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) p &td->td_proc.p_mtx
$5 = (struct mtx *) 0xc4d178b8

Actually, I am not sure that the problem is observed only on the recent
STABLE. It might have not been triggered by our application before the
upgrade. Currently I don't have the system with some old STABLE to check this
running the test program.

-- 
Mikolaj Golub


--=-=-=--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86d49eocwx.fsf>