Date: Mon, 08 Jun 2009 20:25:50 +0300 From: Mikolaj Golub <to.my.trociny@gmail.com> To: freebsd-stable@freebsd.org Subject: Assertion lock == sq->sq_lock failed at /usr/src/sys/kern/subr_sleepqueue.c:327 Message-ID: <86d49eocwx.fsf@kopusha.onet>
next in thread | raw e-mail | index | archive | help
--=-=-= After recent upgrade (on June 3) of my 7-stable host (WITNESS is enabled, the previous build was Apr 26), I have been experienced panics when starting some our home made application: Architecture: i386 Architecture Version: 2 Dump Length: 73797632B (70 MB) Blocksize: 512 Dumptime: Mon Jun 8 15:29:14 2009 Hostname: zhuzha.ua1 Magic: FreeBSD Kernel Dump Version String: FreeBSD 7.2-STABLE #18: Wed Jun 3 14:28:49 EEST 2009 root@zhuzha.ua1:/usr/obj/usr/src/sys/DEBUG Panic String: Assertion lock == sq->sq_lock failed at /usr/src/sys/kern/subr_sleepqueue.c:327 Dump Parity: 1277108796 Bounds: 18 Narrowing down the problem I have written a simple test program (in attaches) that models in simplified way the behaviour or our application and reproduces the crash. There are two threads. One of the threads is doing vfork/exec to start child process while the other is monitoring the status of the child calling wait4(). At the moment of the crash the parent vfork thread is sleeping on wchan=0xc4d17ae0 (td->td_proc) with lock=0xc4d178b8 waiting for the child to exec. The parent wait4 thread with lock=0xc4d17b70 and the same wchan calls sleepq_lookup(wchan) to check if the wait channel already have a sleep queue associated with it, finds the queue of the vfork thread, tries to insert the current thread into this sleep queue and fails on assertion lock==sq->sq_lock. wait4 treead: (kgdb) thr 128 [Switching to thread 128 (Thread 100164)]#0 doadump () at pcpu.h:196 196 in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:196 #1 0xc0496509 in db_fncall (dummy1=-1061924641, dummy2=0, dummy3=-1, dummy4=0xe69a089c "") at /usr/src/sys/ddb/db_command.c:516 #2 0xc0496abf in db_command (last_cmdp=0xc0c9ab54, cmd_table=0x0, dopager=0) at /usr/src/sys/ddb/db_command.c:413 #3 0xc0496b34 in db_command_script (command=0xc0c9baa4 "call doadump") at /usr/src/sys/ddb/db_command.c:484 #4 0xc049a3a0 in db_script_exec (scriptname=0xe69a09a8 "kdb.enter.panic", warnifnotfound=Variable "warnifnotfound" is not available. ) at /usr/src/sys/ddb/db_script.c:302 #5 0xc049a47d in db_script_kdbenter (eventname=0xc0b8fa85 "panic") at /usr/src/sys/ddb/db_script.c:324 #6 0xc0498388 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:227 #7 0xc0810d36 in kdb_trap (type=3, code=0, tf=0xe69a0ae0) at /usr/src/sys/kern/subr_kdb.c:524 #8 0xc0add6ab in trap (frame=0xe69a0ae0) at /usr/src/sys/i386/i386/trap.c:688 #9 0xc0ac159b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 #10 0xc0810eba in kdb_enter_why (why=0xc0b8fa85 "panic", msg=0xc0b8fa85 "panic") at cpufunc.h:60 #11 0xc07e4216 in panic (fmt=0xc0b5ecd3 "Assertion %s failed at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:557 #12 0xc08194dc in sleepq_add (wchan=0xc4d17ae0, lock=0xc4d17b70, wmesg=0xc0b959c1 "wait", flags=256, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:327 #13 0xc07ec94b in _sleep (ident=0xc4d17ae0, lock=Variable "lock" is not available. ) at /usr/src/sys/kern/kern_synch.c:203 #14 0xc07bd8e6 in kern_wait (td=0xc41e8480, pid=-1, status=0xe69a0c2c, options=Variable "options" is not available. ) at /usr/src/sys/kern/kern_exit.c:902 #15 0xc07bd95b in wait4 (td=0xc41e8480, uap=0xe69a0cfc) at /usr/src/sys/kern/kern_exit.c:688 #16 0xc0adce23 in syscall (frame=0xe69a0d38) at /usr/src/sys/i386/i386/trap.c:1090 #17 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #18 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 12 #12 0xc08194dc in sleepq_add (wchan=0xc4d17ae0, lock=0xc4d17b70, wmesg=0xc0b959c1 "wait", flags=256, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:327 327 MPASS(lock == sq->sq_lock); (kgdb) p lock $1 = (struct lock_object *) 0xc4d17b70 (kgdb) p sq->sq_lock $2 = (struct lock_object *) 0xc4d178b8 (kgdb) p *lock $3 = {lo_name = 0xc0b8e532 "process lock", lo_type = 0xc0b8e532 "process lock", lo_flags = 21168128, lo_witness_data = {lod_list = {stqe_next = 0xc0ce4840}, lod_witness = 0xc0ce4840}} (kgdb) p *sq->sq_lock $4 = {lo_name = 0xc0b8e532 "process lock", lo_type = 0xc0b8e532 "process lock", lo_flags = 21168128, lo_witness_data = {lod_list = {stqe_next = 0xc0ce4840}, lod_witness = 0xc0ce4840}} vfork thread of the parent process: (kgdb) thr 127 [Switching to thread 127 (Thread 100162)]#0 sched_switch (td=0xc41e86c0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 1944 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xc41e86c0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07ec4c3 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:444 #2 0xc0818bd2 in sleepq_switch (wchan=0xc4d17ae0) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc0819800 in sleepq_wait (wchan=0xc4d17ae0) at /usr/src/sys/kern/subr_sleepqueue.c:580 #4 0xc07eca69 in _sleep (ident=0xc4d17ae0, lock=0xc4d178b8, priority=92, wmesg=0xc0b8bacb "ppwait", timo=0) at /usr/src/sys/kern/kern_synch.c:230 #5 0xc07c0631 in fork1 (td=0xc41e86c0, flags=-2147483596, pages=0, procp=0xe699ac78) at /usr/src/sys/kern/kern_fork.c:747 #6 0xc07c07c9 in vfork (td=0xc41e86c0, uap=0xe699acfc) at /usr/src/sys/kern/kern_fork.c:124 #7 0xc0adce23 in syscall (frame=0xe699ad38) at /usr/src/sys/i386/i386/trap.c:1090 #8 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #9 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) And here is the child process: (kgdb) thr 129 [Switching to thread 129 (Thread 100163)]#0 sched_switch (td=0xc4d1d900, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 1944 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xc4d1d900, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07ec4c3 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:444 #2 0xc0818bd2 in sleepq_switch (wchan=0xc0ccdac4) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc0818e6f in sleepq_catch_signals (wchan=0xc0ccdac4) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xc0819647 in sleepq_timedwait_sig (wchan=0xc0ccdac4) at /usr/src/sys/kern/subr_sleepqueue.c:631 #5 0xc07eca31 in _sleep (ident=0xc0ccdac4, lock=0x0, priority=348, wmesg=0xc0b90bcd "nanslp", timo=2) at /usr/src/sys/kern/kern_synch.c:224 #6 0xc07f3f51 in kern_nanosleep (td=0xc4d1d900, rqt=0xe699dc64, rmt=0xe699dc6c) at /usr/src/sys/kern/kern_time.c:373 #7 0xc07f594f in nanosleep (td=0xc4d1d900, uap=0xe699dcfc) at /usr/src/sys/kern/kern_time.c:415 #8 0xc0adce23 in syscall (frame=0xe699dd38) at /usr/src/sys/i386/i386/trap.c:1090 #9 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #10 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) p &td->td_proc.p_mtx $5 = (struct mtx *) 0xc4d178b8 Actually, I am not sure that the problem is observed only on the recent STABLE. It might have not been triggered by our application before the upgrade. Currently I don't have the system with some old STABLE to check this running the test program. -- Mikolaj Golub --=-=-=--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86d49eocwx.fsf>