Date: Sun, 17 Nov 2002 14:54:04 -0500 (EST) From: Robert Watson <rwatson@freebsd.org> To: "Joel M. Baldwin" <qumqats@outel.org> Cc: current@freebsd.org Subject: Re: more info from panic from running dnet on SMP kernel ( lock order reversal, recursed on non-recursive lock ) Message-ID: <Pine.NEB.3.96L.1021117145242.93303I-100000@fledge.watson.org> In-Reply-To: <211086306.1037497825@[192.168.1.20]>
index | next in thread | previous in thread | raw e-mail
Hmm. It looks like there is indeed a lock leak in the RFTHREAD code.
Maybe a change like the following might help:
PROC_LOCK(p2);
psignal(p2, SIGKILL);
PROC_UNLOCK(p2);
}
Change the } to:
} else
PROC_UNLOCK(p1->p_leader);
And see if that gets rid of the problem. Any chance this is highly
reproduceable, btw? :-) And what app are you running that's using
RFTHREAD -- linux thread stuff?
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org Network Associates Laboratories
On Sun, 17 Nov 2002, Joel M. Baldwin wrote:
>
> running dnet on a SMP kernel causes the kernel to panic.
>
>
> lock order reversal
> 1st 0xc2c803e8 process lock (process lock) @
> ../../../kern/kern_fork.c:571
> 2nd 0xc03cfce0 proctree (proctree) @ ../../../kern/kern_fork.c:596
> recursed on non-recursive lock (sleep mutex) process lock @
> ../../../kern/kern_fork.c:599
> first acquired @ ../../../kern/kern_fork.c:571
> panic: recurse
> cpuid = 1; lapic.id = 01000000
> Debugger("panic")
> Stopped at Debugger+0x55: xchgl %ebx,in_Debugger.0
> db> t
> Debugger(c03926fa,1000000,c0395ada,d26f5c08,1) at Debugger+0x55
> panic(c0395ada,c038feab,23b,c038feab,257) at panic+0x11f
> witness_lock(c2c803e8,8,c038feab,257,0) at witness_lock+0x3e6
> _mtx_lock_flags(c2c803e8,0,c038feab,257,d26f5cb8) at
> _mtx_lock_flags+0xb2
> fork1(c2773d00,6050,0,d26f5cd4,c2c803e8) at fork1+0xbfc
> rfork(c2773d00,d26f5d10,c03b07a2,407,1) at rfork+0x65
> syscall(2f,2f,2f,0,80ddf10) at syscall+0x28e
> Xint0x80_syscall() at Xint0x80_syscall+0x1d
> --- syscall (251, FreeBSD ELF32, rfork), eip = 0x8087d14, esp =
> 0xbfbff4a8, ebp = 0xbfbff524 ---
> db> ps
> pid proc addr uid ppid pgrp flag stat wmesg wchan
> cmd
> 6217 c2b98e00 d28a7000 0 6215 6216 0000000 newpanic: unknown
> thread state
> cpuid = 1; lapic.id = 01000000
> boot() called on cpu#1
> Uptime: 1h43m39s
> pfs_vncache_unload(): 1 entries remaining
> Dumping 255 MB
> 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240
> Dump complete
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> cpu_reset called on cpu#1
> cpu_reset: Restarting BSP
> cpu_reset_proxy: Stopped CPU 1
>
>
>
> And then when the system came back up and I took a closer
> look at the core dump.
>
>
> (kgdb) where
> #0 doadump () at ../../../kern/kern_shutdown.c:232
> #1 0xc02114ad in boot (howto=260) at ../../../kern/kern_shutdown.c:364
> #2 0xc0211767 in panic () at ../../../kern/kern_shutdown.c:517
> #3 0xc014f2bc in db_ps (dummy1=-1070342907, dummy2=0, dummy3=-1,
> dummy4=0xd26f5a24 "")
> at ../../../ddb/db_ps.c:169
> #4 0xc014d142 in db_command (last_cmdp=0xc03be920, cmd_table=0x0,
> aux_cmd_tablep=0xc03b5540,
> aux_cmd_tablep_end=0xc03b5558) at ../../../ddb/db_command.c:346
> #5 0xc014d256 in db_command_loop () at ../../../ddb/db_command.c:472
> #6 0xc014feea in db_trap (type=3, code=0) at ../../../ddb/db_trap.c:72
> #7 0xc033da10 in kdb_trap (type=3, code=0, regs=0xd26f5b80)
> at ../../../i386/i386/db_interface.c:166
> #8 0xc0356a3f in trap (frame=
> {tf_fs = -1069481960, tf_es = 16, tf_ds = 16, tf_edi =
> -1032372992, tf_esi = 256, tf_ebp = -764453940, tf_isp = -764453972,
> tf_ebx = 0, tf_edx = 0, tf_ecx = 1, tf_eax = 18, tf_trapno = 3, tf_err
> = 0, tf_eip = -1070342907, tf_cs = 8, tf_eflags = 662, tf_esp =
> -1069883258, tf_ss = -1069996294}) at ../../../i386/i386/trap.c:603
> #9 0xc033f238 in calltrap () at {standard input}:99
> #10 0xc021174f in panic (fmt=0x0) at ../../../kern/kern_shutdown.c:503
> #11 0xc02333d6 in witness_lock (lock=0xc2c803e8, flags=8,
> file=0xc038feab "../../../kern/kern_fork.c", line=599) at
> ../../../kern/subr_witness.c:609
> #12 0xc02079c2 in _mtx_lock_flags (m=0xc03cf4c0, opts=0,
> file=0xc042cfd4 "è\003È«þ8À;\002",
> line=-1027079192) at ../../../kern/kern_mutex.c:328
> #13 0xc01fd3ec in fork1 (td=0xc2773d00, flags=24656, pages=0,
> procp=0xd26f5cd4)
> at ../../../kern/kern_fork.c:599
> #14 0xc01fc6c5 in rfork (td=0xc2773d00, uap=0xd26f5d10) at
> ../../../kern/kern_fork.c:168
> #15 0xc035739e in syscall (frame=
> {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi =
> 135126800, tf_ebp = -1077938908, tf_isp = -764453516, tf_ebx = 2,
> tf_edx = 135381248, tf_ecx = 135381248, tf_eax = 251, tf_trapno = 0,
> tf_err = 2, tf_eip = 134774036, tf_cs = 31, tf_eflags = 659, tf_esp =
> -1077939032, tf_ss = 47})
> at ../../../i386/i386/trap.c:1033
> #16 0xc033f28d in Xint0x80_syscall () at {standard input}:141
> ---Can't read userspace from dump, or kernel process---
>
>
>
>
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-current" in the body of the message
>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1021117145242.93303I-100000>
