Date: Sun, 25 Mar 2007 19:40:58 +0200 From: Niclas Zeising <niclas.zeising@gmail.com> To: Giorgos Keramidas <keramida@ceid.upatras.gr> Cc: current@freebsd.org Subject: Re: Latest -current complete lockup (tcp changes?) Message-ID: <4606B42A.502@gmail.com> In-Reply-To: <20070325165843.GA1558@kobe.laptop> References: <20070324124732.GA767@nagual.pp.ru> <20070325165843.GA1558@kobe.laptop>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------080305050202080106070600 Content-Type: text/plain; charset=ISO-8859-7; format=flowed Content-Transfer-Encoding: 7bit Giorgos Keramidas wrote: > On 2007-03-24 15:47, Andrey Chernov <ache@freebsd.org> wrote: >> Very recent -current cause complete lockup in case and after small >> amount of network activity happens. No panic, no ddb console - nothing >> just lockup. Previously working kernel is from Mar 22. I suspect >> recent round of TCP changes. > > I'm seeing kernel panics with a kernel updates to: > > changeset: 129918:f239d4a0fa1e > tag: tip > user: imp > date: Fri Mar 23 23:47:59 2007 +0000 > summary: Default to booting off the SD card. It is more useful, [...] > > The panic happens after a small amount of network activity, inside the > TCP SACK code: > > --------------------------------------------------------------------------- > Physical memory: 495 MB > Dumping 54 MB: 39 23 7 > [SNIP stack trace] > --------------------------------------------------------------------------- > > Disabling SACK lets me use my laptop again: > > net.inet.tcp.sack.enable=0 > > without any panics. I've got another panic with current sources, in tcp_output.c The panic happens right after the system goes to multiuser, sometimes I have time to login, but it blows within a minute. Looking at the stack trace (taken from /var/log/messages, I have KDB_TRACE in my kernel) it seems like it panics multiple times. The panic is network-related, booting the system in single user mode without networking works like a charm, so I've done a back trace from the core file, which is attached. I still have the core file around, so I can get more information. Regards! //Niclas Here is a stack trace from the panic: Sleeping on "-" with the following non-sleepable locks held: exclusive sleep mutex tcp r = 0 (0xc07e90cc) locked @ /usr/src/sys/netinet/tcp_input.c:619 KDB: stack backtrace: db_trace_self_wrapper(c071d5f1,e409fc9c,c0580e33,c07e6fec,e409fcb0,...) at db_trace_self_wrapper+0x27 kdb_backtrace(c07e6fec,e409fcb0,1,0,c3badbd0,...) at kdb_backtrace+0x2f witness_warn(5,0,c071b75c,c070bd17,0,...) at witness_warn+0x1af msleep_spin(c3c46200,c3c4621c,c070bd17,0,1,...) at msleep_spin+0x1eb taskqueue_thread_loop(c3c4a214,e409fd38,c0717d8c,326,c3baf000,...) at taskqueue_thread_loop+0x8d fork_exit(c057b52a,c3c4a214,e409fd38) at fork_exit+0xcc fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xe409fd70, ebp = 0 --- kernel boot file is /boot/kernel/kernel Sleeping thread (tid 100035, pid 30) owns a non-sleepable lock sched_switch(c3badbd0,0,1,179,c079fdf8,...) at sched_switch+0x10a mi_switch(1,0,c071e040,1cc,0,...) at mi_switch+0x2d0 sleepq_switch(c3c46200,0,c071e040,21e,e409fcd4,...) at sleepq_switch+0x10d sleepq_wait(c3c46200,0,c071b75c,c070bd17,0,...) at sleepq_wait+0x66 msleep_spin(c3c46200,c3c4621c,c070bd17,0,1,...) at msleep_spin+0x20d taskqueue_thread_loop(c3c4a214,e409fd38,c0717d8c,326,c3baf000,...) at taskqueue_thread_loop+0x8d fork_exit(c057b52a,c3c4a214,e409fd38) at fork_exit+0xcc fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xe409fd70, ebp = 0 --- panic: sleeping thread KDB: stack backtrace: db_trace_self_wrapper(c071d5f1,e25cfb90,c054e887,c071b028,c079cae0,...) at db_trace_self_wrapper+0x27 kdb_backtrace(c071b028,c079cae0,c071e729,e25cfb9c,100,...) at kdb_backtrace+0x2f panic(c071e729,ffffffff,1e,ae,8,...) at panic+0xb8 propagate_priority(c3abe6c0,0,c071e6ca,287,c07a0380,...) at propagate_priority+0xe7 turnstile_wait(c07e90cc,c3badbd0,0,18b,c3badbd2,...) at turnstile_wait+0x499 _mtx_lock_sleep(c07e90cc,c3abe6c0,0,c072ad06,7f,...) at _mtx_lock_sleep+0x122 _mtx_lock_flags(c07e90cc,0,c072ad06,7f,e25cfc70,...) at _mtx_lock_flags+0xec tcp_slowtimo(c054429f,c079d0e4,1,c0719efe,0,...) at tcp_slowtimo+0x3b pfslowtimo(0,0,c071c0f5,ee,0,...) at pfslowtimo+0x52 softclock(0,c054429f,c079c8b0,1,c3aa66a0,...) at softclock+0x27b ithread_execute_handlers(c3abdb40,c3abb400,c0717f9c,30e,c3abe6c0,...) at ithread_execute_handlers+0x14c ithread_loop(c3aa66a0,e25cfd38,c0717d8c,326,c3abdb40,...) at ithread_loop+0x77 fork_exit(c0535d55,c3aa66a0,e25cfd38) at fork_exit+0xcc fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xe25cfd70, ebp = 0 --- KDB: enter: panic Physical memory: 1015 MB Dumping 97 MB: 82 66 50 34 18 2 --------------080305050202080106070600 Content-Type: text/plain; name="trace.out" Content-Transfer-Encoding: 8bit Content-Disposition: inline; filename="trace.out" # kgdb /boot/kernel/kernel /var/crash/vmcore.2 kgdb: kvm_nlist(_stopped_cpus): kgdb: kvm_nlist(_stoppcbs): [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Sleeping thread (tid 100035, pid 30) owns a non-sleepable lock sched_switch(c3badbd0,0,1,179,c079fdf8,...) at sched_switch+0x10a mi_switch(1,0,c071e040,1cc,0,...) at mi_switch+0x2d0 sleepq_switch(c3c46200,0,c071e040,21e,e409fcd4,...) at sleepq_switch+0x10d sleepq_wait(c3c46200,0,c071b75c,c070bd17,0,...) at sleepq_wait+0x66 msleep_spin(c3c46200,c3c4621c,c070bd17,0,1,...) at msleep_spin+0x20d taskqueue_thread_loop(c3c4a214,e409fd38,c0717d8c,326,c3baf000,...) at taskqueue_thread_loop+0x8d fork_exit(c057b52a,c3c4a214,e409fd38) at fork_exit+0xcc fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xe409fd70, ebp = 0 --- panic: sleeping thread KDB: stack backtrace: db_trace_self_wrapper(c071d5f1,e25cfb90,c054e887,c071b028,c079cae0,...) at db_trace_self_wrapper+0x27 kdb_backtrace(c071b028,c079cae0,c071e729,e25cfb9c,100,...) at kdb_backtrace+0x2f panic(c071e729,ffffffff,1e,ae,8,...) at panic+0xb8 propagate_priority(c3abe6c0,0,c071e6ca,287,c07a0380,...) at propagate_priority+0xe7 turnstile_wait(c07e90cc,c3badbd0,0,18b,c3badbd2,...) at turnstile_wait+0x499 _mtx_lock_sleep(c07e90cc,c3abe6c0,0,c072ad06,7f,...) at _mtx_lock_sleep+0x122 _mtx_lock_flags(c07e90cc,0,c072ad06,7f,e25cfc70,...) at _mtx_lock_flags+0xec tcp_slowtimo(c054429f,c079d0e4,1,c0719efe,0,...) at tcp_slowtimo+0x3b pfslowtimo(0,0,c071c0f5,ee,0,...) at pfslowtimo+0x52 softclock(0,c054429f,c079c8b0,1,c3aa66a0,...) at softclock+0x27b ithread_execute_handlers(c3abdb40,c3abb400,c0717f9c,30e,c3abe6c0,...) at ithread_execute_handlers+0x14c ithread_loop(c3aa66a0,e25cfd38,c0717d8c,326,c3abdb40,...) at ithread_loop+0x77 fork_exit(c0535d55,c3aa66a0,e25cfd38) at fork_exit+0xcc fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xe25cfd70, ebp = 0 --- KDB: enter: panic Physical memory: 1015 MB Dumping 97 MB: 82 66 50 34 18 2 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0xc044b92d in db_fncall (dummy1=0, dummy2=0, dummy3=1999, dummy4=0xe25cf980 " WxÀ") at /usr/src/sys/ddb/db_command.c:486 #2 0xc044b6fc in db_command (last_cmdp=0xc0784e24, cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:401 #3 0xc044b7c3 in db_command_loop () at /usr/src/sys/ddb/db_command.c:453 #4 0xc044d731 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:222 #5 0xc05728f0 in kdb_trap (type=0, code=0, tf=0x0) at /usr/src/sys/kern/subr_kdb.c:502 #6 0xc06db633 in trap (frame=0xe25cfb1c) at /usr/src/sys/i386/i386/trap.c:621 #7 0xc06cb0db in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #8 0xc0572668 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at cpufunc.h:60 #9 0xc054e89d in panic (fmt=0xc071e729 "sleeping thread") at /usr/src/sys/kern/kern_shutdown.c:547 #10 0xc057c0b7 in propagate_priority (td=0xc3badbd0) at /usr/src/sys/kern/subr_turnstile.c:205 #11 0xc057d061 in turnstile_wait (lock=0xc07e90cc, owner=0xc3badbd0, queue=0) at /usr/src/sys/kern/subr_turnstile.c:682 #12 0xc05444e5 in _mtx_lock_sleep (m=0xc07e90cc, tid=3282822848, opts=0, file=0x1 <Address 0x1 out of bounds>, line=1) at /usr/src/sys/kern/kern_mutex.c:411 #13 0xc0543f4c in _mtx_lock_flags (m=0xc07e90cc, opts=0, file=0xc072ad06 "/usr/src/sys/netinet/tcp_timer.c", line=127) ---Type <return> to continue, or q <return> to quit--- at /usr/src/sys/kern/kern_mutex.c:194 #14 0xc060884b in tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:127 #15 0xc05996c5 in pfslowtimo (arg=0x0) at /usr/src/sys/kern/uipc_domain.c:481 #16 0xc055e1e3 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:273 #17 0xc0535c88 in ithread_execute_handlers (p=0xc3abdb40, ie=0xc3abb400) at /usr/src/sys/kern/kern_intr.c:682 #18 0xc0535dcc in ithread_loop (arg=0xc3aa66a0) at /usr/src/sys/kern/kern_intr.c:766 #19 0xc05349ac in fork_exit (callout=0xc0535d55 <ithread_loop>, arg=0x12, frame=0x12) at /usr/src/sys/kern/kern_fork.c:814 #20 0xc06cb150 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205 (kgdb) bt full #0 doadump () at pcpu.h:172 No locals. #1 0xc044b92d in db_fncall (dummy1=0, dummy2=0, dummy3=1999, dummy4=0xe25cf980 " WxÀ") at /usr/src/sys/ddb/db_command.c:486 fn_addr = -1068179472 args = {-1066111392, -497223348, -1065855328, -497223332, -1069239410, -1065855328, -1066111392, -497223300, -497223348, 0} nargs = 0 retval = -1011678976 t = 0 #2 0xc044b6fc in db_command (last_cmdp=0xc0784e24, cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:401 cmd = (struct command *) 0xc0746e60 t = 0 modif = " WxÀ\000\000\000\000\000É«Ã W\177À`\213yÀ\r\000\000\000\001\000\000\000¼ù\\â1þLÀ\000\001³Ã\aK\004 ä\213yÀàq~À WxÀx\000\000\000 WxÀ\034û\\âàù\\âóÚDÀ\204¯qÀ\bØDÀ\000\000\000\000\020\000\000\000\034û\\â WxÀØÎDÀ WxÀØNxÀx\000\000\000<ú\\â" addr = 0 count = 1999 have_addr = 0 result = 0 #3 0xc044b7c3 in db_command_loop () at /usr/src/sys/ddb/db_command.c:453 No locals. ---Type <return> to continue, or q <return> to quit--- #4 0xc044d731 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:222 jb = {{_jb = {-497223108, -497223136, -497223056, 3, -497222884, -1069230384, -1066618544, -497222288, 18, 0, -497223056, -1068029843}}} prev_jb = (void *) 0x0 bkpt = 0 #5 0xc05728f0 in kdb_trap (type=0, code=0, tf=0x0) at /usr/src/sys/kern/subr_kdb.c:502 handled = 0 #6 0xc06db633 in trap (frame=0xe25cfb1c) at /usr/src/sys/i386/i386/trap.c:621 td = (struct thread *) 0xc3abe6c0 p = (struct proc *) 0xc3abdb40 i = 0 ucode = 0 type = 3 code = 0 addr = -1064743456 eva = 0 ksi = {ksi_link = {tqe_next = 0xc071ae56, tqe_prev = 0xe25cfaf0}, ksi_info = {si_signo = 0, si_errno = -1066291153, si_code = -1065760017, si_pid = -497222888, si_uid = 3797744372, si_status = -1069240794, si_addr = 0xe25cfb18, si_value = {sival_int = -1, sival_ptr = 0xffffffff}, _reason = {_fault = {_trapno = -497222940}, _timer = {_timerid = -497222940, _overrun = -1069230212}, _mesgq = { _mqd = -497222940}, _poll = {_band = -497222940}, __spare__ = { ---Type <return> to continue, or q <return> to quit--- __spare1__ = -497222940, __spare2__ = {-1069230212, -497222876, -497222892, -1066277079, 1, -497222892, -1066623191}}}}, ksi_flags = -1012144448, ksi_sigq = 0x0} #7 0xc06cb0db in calltrap () at /usr/src/sys/i386/i386/exception.s:139 No locals. #8 0xc0572668 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at cpufunc.h:60 No locals. #9 0xc054e89d in panic (fmt=0xc071e729 "sleeping thread") at /usr/src/sys/kern/kern_shutdown.c:547 td = (struct thread *) 0xc3abe6c0 bootopt = 256 newpanic = 1 ap = 0xe25cfb9c "ÿÿÿÿ\036" buf = "sleeping thread", '\0' <repeats 240 times> #10 0xc057c0b7 in propagate_priority (td=0xc3badbd0) at /usr/src/sys/kern/subr_turnstile.c:205 tc = (struct turnstile_chain *) 0xc07e90cc ts = (struct turnstile *) 0xc3ab2d40 pri = 52 #11 0xc057d061 in turnstile_wait (lock=0xc07e90cc, owner=0xc3badbd0, queue=0) at /usr/src/sys/kern/subr_turnstile.c:682 tc = (struct turnstile_chain *) 0xc07a0380 ts = (struct turnstile *) 0xc3ab2d40 ---Type <return> to continue, or q <return> to quit--- td = (struct thread *) 0xc3abe6c0 td1 = (struct thread *) 0xc07e90cc #12 0xc05444e5 in _mtx_lock_sleep (m=0xc07e90cc, tid=3282822848, opts=0, file=0x1 <Address 0x1 out of bounds>, line=1) at /usr/src/sys/kern/kern_mutex.c:411 v = 3238088704 #13 0xc0543f4c in _mtx_lock_flags (m=0xc07e90cc, opts=0, file=0xc072ad06 "/usr/src/sys/netinet/tcp_timer.c", line=127) at /usr/src/sys/kern/kern_mutex.c:194 No locals. #14 0xc060884b in tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:127 No locals. #15 0xc05996c5 in pfslowtimo (arg=0x0) at /usr/src/sys/kern/uipc_domain.c:481 dp = (struct domain *) 0xc076dde0 pr = (struct protosw *) 0xc076d9c8 #16 0xc055e1e3 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:273 c_func = (void (*)(void *)) 0xc0599673 <pfslowtimo> c_arg = (void *) 0x0 c_mtx = (struct mtx *) 0x0 c_flags = 22 c = (struct callout *) 0xc1015000 bucket = (struct callout_tailq *) 0xd7b13170 curticks = 180004 ---Type <return> to continue, or q <return> to quit--- steps = 0 depth = 1 mpcalls = 1 mtxcalls = 0 gcalls = 0 #17 0xc0535c88 in ithread_execute_handlers (p=0xc3abdb40, ie=0xc3abb400) at /usr/src/sys/kern/kern_intr.c:682 ih = (struct intr_handler *) 0xc3ab9d00 ihn = (struct intr_handler *) 0xc3c58400 #18 0xc0535dcc in ithread_loop (arg=0xc3aa66a0) at /usr/src/sys/kern/kern_intr.c:766 ithd = (struct intr_thread *) 0xc3aa66a0 ie = (struct intr_event *) 0xc3abb400 td = (struct thread *) 0xc3abe6c0 p = (struct proc *) 0xc3abdb40 __func__ = "ithread_loop" #19 0xc05349ac in fork_exit (callout=0xc0535d55 <ithread_loop>, arg=0x12, frame=0x12) at /usr/src/sys/kern/kern_fork.c:814 p = (struct proc *) 0xc3abdb40 td = (struct thread *) 0x1 #20 0xc06cb150 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205 No locals. (kgdb) q # --------------080305050202080106070600--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4606B42A.502>