From owner-freebsd-bugs@freebsd.org Mon Aug 31 22:28:25 2015 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 823E79C6A5F for ; Mon, 31 Aug 2015 22:28:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6F0AF141E for ; Mon, 31 Aug 2015 22:28:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t7VMSPsL057159 for ; Mon, 31 Aug 2015 22:28:25 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 202801] panic: spin lock held too long, maybe a pipe issue Date: Mon, 31 Aug 2015 22:28:25 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 8.4-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: longwitz@incore.de X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 22:28:25 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202801 Bug ID: 202801 Summary: panic: spin lock held too long, maybe a pipe issue Product: Base System Version: 8.4-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: longwitz@incore.de On a system running FreeBSD 8.4-STABLE r284383 (amd64) the system panics during amanda backup. The processes amandad (pid 11806,tid 101757, holds spin lock) and sendbackup (pid 11809, tid 100537, panics) are involved. (kgdb) ps pid ppid pgrp uid state wmesg wchan cmd 12286 2867 30 0 S nanslp 0xffffffff808ab788 sleep 11818 11815 11806 990 S pause 0xffffff00bbcb0980 dump 11817 11815 11806 990 S pause 0xffffff00150ab0a0 dump 11816 11815 11806 990 S pipdwt 0xffffff03135e02d8 dump 11815 11810 11806 990 S sbwait 0xffffff035bc233ec dump 11810 11807 11806 990 S wait 0xffffff01087e08e0 dump 11809 11807 11806 990 R CPU 2 sendbackup 11807 11806 11806 990 S piperd 0xffffff0015100000 initial thread 11806 5831 11806 990 Ss pipelk 0xffffff03b40972d8 initial thread 7761 7751 7617 984 S select 0xffffff03d63985c0 initial thread 7756 7737 7607 984 S select 0xffffff02b1cee140 initial thread .......... 100000 68 Run CPU 5 [swapper] 11811 11809 11806 990 Z sh 11808 11806 11806 990 Z amandad (kgdb) info threads 507 Thread 101855 (PID=12286: sleep) sched_switch (td=0xffffff00b8d4c8e0, newtd=0xffffff0003567000, flags=) at /usr/src/sys/kern/sched_ule.c:1932 506 Thread 101216 (PID=11818: dump) sched_switch (td=0xffffff00bbceb000, newtd=0xffffff000355c8e0, flags=) at /usr/src/sys/kern/sched_ule.c:1932 505 Thread 100132 (PID=11817: dump) sched_switch (td=0xffffff00150c5000, newtd=0xffffff000355c470, flags=) at /usr/src/sys/kern/sched_ule.c:1932 504 Thread 101906 (PID=11816: dump) sched_switch (td=0xffffff033b5fc470, newtd=0xffffff00035678e0, flags=) at /usr/src/sys/kern/sched_ule.c:1932 503 Thread 100957 (PID=11815: dump) sched_switch (td=0xffffff03a1c7f8e0, newtd=0xffffff000355c470, flags=) at /usr/src/sys/kern/sched_ule.c:1932 502 Thread 101391 (PID=11810: dump) sched_switch (td=0xffffff02a0f86000, newtd=0xffffff000355b000, flags=) at /usr/src/sys/kern/sched_ule.c:1932 * 501 Thread 100537 (PID=11809: sendbackup) doadump () at /usr/src/sys/kern/kern_shutdown.c:266 500 Thread 101807 (PID=11807: sendbackup/initial thread) sched_switch (td=0xffffff005d1d4470, newtd=0xffffff000355b470, flags=) at /usr/src/sys/kern/sched_ule.c:1932 499 Thread 101757 (PID=11806: amandad/initial thread) sched_switch (td=0xffffff024153b8e0, newtd=0xffffff0003567470, flags=) at /usr/src/sys/kern/sched_ule.c:1932 Unread portion of the kernel message buffer: spin lock 0xffffffff808cccd8 (sleepq chain) held by 0xffffff024153b8e0 (tid 101757) too long panic: spin lock held too long cpuid = 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1ce _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 _mtx_lock_spin() at _mtx_lock_spin+0xc3 wakeup() at wakeup+0x19 pipe_write() at pipe_write+0x9a9 dofilewrite() at dofilewrite+0x8b kern_writev() at kern_writev+0x60 write() at write+0x55 amd64_syscall() at amd64_syscall+0x24f Xfast_syscall() at Xfast_syscall+0xfc --- syscall (4, FreeBSD ELF64, write), rip = 0x801e7287c, rsp = 0x7fffffffe448, rbp = 0x7fffffffe470 --- KDB: enter: panic 0xffffff0015b74b10: tag ufs, type VREG usecount 1, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xffffff00ba0d1510 ref 0 pages 1 lock type ufs: EXCL by thread 0xffffff0003cca8e0 (pid 8) ino 329813, on dev mirror/gmsv02p8.journal Dumping 9635 out of 16365 MB:..1%..11%..21%..31%..41%. #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:266 #1 0xffffffff801fdfcc in db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at /usr/src/sys/ddb/db_command.c:548 #2 0xffffffff801fe27d in db_command (last_cmdp=0xffffffff80879dc0, cmd_table=, dopager=0) at /usr/src/sys/ddb/db_command.c:445 #3 0xffffffff80202933 in db_script_exec (scriptname=0xffffff84862da410 "kdb.enter.panic", warnifnotfound=0) at /usr/src/sys/ddb/db_script.c:302 #4 0xffffffff80202a02 in db_script_kdbenter (eventname=) at /usr/src/sys/ddb/db_script.c:324 #5 0xffffffff80200614 in db_trap (type=, code=) at /usr/src/sys/ddb/db_main.c:230 #6 0xffffffff80440f21 in kdb_trap (type=3, code=0, tf=0xffffff84862da6f0) at /usr/src/sys/kern/subr_kdb.c:654 #7 0xffffffff805f5ba6 in trap (frame=0xffffff84862da6f0) at /usr/src/sys/amd64/amd64/trap.c:574 #8 0xffffffff805db808 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #9 0xffffffff804409bb in kdb_enter (why=0xffffffff80695a0a "panic", msg=0xa
) at cpufunc.h:63 #10 0xffffffff8040aa67 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:616 #11 0xffffffff803f7f09 in _mtx_lock_spin_failed (m=) at /usr/src/sys/kern/kern_mutex.c:505 #12 0xffffffff803f7fd3 in _mtx_lock_spin (m=0xffffffff808cccd8, tid=18446742977329681520, opts=, file=, line=) at /usr/src/sys/kern/kern_mutex.c:544 #13 0xffffffff80414719 in wakeup (ident=0xffffff03b40972d8) at /usr/src/sys/kern/kern_synch.c:365 #14 0xffffffff80458029 in pipe_write (fp=0xffffff00bfa614b0, uio=0xffffff84862daad0, active_cred=, flags=, td=) at /usr/src/sys/kern/sys_pipe.c:517 #15 0xffffffff80453cbb in dofilewrite (td=0xffffff00baaad470, fd=3, fp=0xffffff00bfa614b0, auio=0xffffff84862daad0, offset=, flags =0) at file.h:254 #16 0xffffffff80453fd0 in kern_writev (td=0xffffff00baaad470, fd=3, auio=0xffffff84862daad0) at /usr/src/sys/kern/sys_generic.c:447 #17 0xffffffff80454055 in write (td=, uap=) at /usr/src/sys/kern/sys_generic.c:363 #18 0xffffffff805f483f in amd64_syscall (td=0xffffff00baaad470, traced=0) at subr_syscall.c:114 #19 0xffffffff805dbafc in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:392 #20 0x0000000801e7287c in ?? () kgdb) f 12 #12 0xffffffff803f7fd3 in _mtx_lock_spin (m=0xffffffff808cccd8, tid=18446742977329681520, opts=, file=, line=) at /usr/src/sys/kern/kern_mutex.c:544 544 _mtx_lock_spin_failed(m); (kgdb) info loc i = 60000000 <---- indicates too long spin (kgdb) p m $8 = (struct mtx *) 0xffffffff808cccd8 (kgdb) mtx_owner m td: 0xffffff024153b8e0 pid: 11806, p_comm: amandad (kgdb) p *m $12 = {lock_object = {lo_name = 0xffffffff8069a4d1 "sleepq chain", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446742983883864288} (kgdb) x/x m->mtx_lock 0xffffff024153b8e0: 0x808cc3c8 kgdb) tid 101757 [Switching to thread 499 (Thread 101757)]#0 sched_switch (td=0xffffff024153b8e0, newtd=0xffffff0003567470, flags=) (kgdb) where #0 sched_switch (td=0xffffff024153b8e0, newtd=0xffffff0003567470, flags=) at /usr/src/sys/kern/sched_ule.c:1932 #1 0xffffffff804142a6 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff8044be92 in sleepq_catch_signals (wchan=0xffffff024153b8e0, pri=76) at /usr/src/sys/kern/subr_sleepqueue.c:421 #3 0xffffffff8044c9d2 in sleepq_wait_sig (wchan=, pri=) at /usr/src/sys/kern/subr_sleepqueue.c:627 #4 0xffffffff80414a0d in _sleep (ident=0xffffff024153b8e0, lock=0x100, priority=, wmesg=0xffffffff8069ac67 "pipelk", timo=0) at /usr/src/sys/kern/kern_synch.c:248 #5 0xffffffff8045704d in pipe_read (fp=0xffffff00d85804b0, uio=0xffffff8487aaead0, active_cred=, flags=, td=) at /usr/src/sys/kern/sys_pipe.c:493 #6 0xffffffff80454101 in dofileread (td=0xffffff024153b8e0, fd=7, fp=0xffffff00d85804b0, auio=0xffffff8487aaead0, offset=, flags= 0) at file.h:242 #7 0xffffffff80454490 in kern_readv (td=0xffffff024153b8e0, fd=7, auio=0xffffff8487aaead0) at /usr/src/sys/kern/sys_generic.c:238 #8 0xffffffff80454585 in read (td=, uap=) at /usr/src/sys/kern/sys_generic.c:154 #9 0xffffffff805f483f in amd64_syscall (td=0xffffff024153b8e0, traced=0) at subr_syscall.c:114 #10 0xffffffff805dbafc in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:392 #11 0x0000000801d6289c in ?? () kgdb) f 4 #4 0xffffffff80414a0d in _sleep (ident=0xffffff024153b8e0, lock=0x100, priority=, wmesg=0xffffffff8069ac67 "pipelk", timo=0) at /usr/src/sys/kern/kern_synch.c:248 248 rval = sleepq_wait_sig(ident, pri); (kgdb) list 243 if (timo && catch) 244 rval = sleepq_timedwait_sig(ident, pri); 245 else if (timo) 246 rval = sleepq_timedwait(ident, pri); 247 else if (catch) 248 rval = sleepq_wait_sig(ident, pri); 249 else { 250 sleepq_wait(ident, pri); 251 rval = 0; 252 } (kgdb) p catch $26 = 256 (kgdb) p timo $27 = 0 kgdb) f 5 #5 0xffffffff8045704d in pipe_read (fp=0xffffff00d85804b0, uio=0xffffff8487aaead0, active_cred=, flags=, td=) at /usr/src/sys/kern/sys_pipe.c:493 493 error = msleep(cpipe, PIPE_MTX(cpipe), (kgdb) list 488 int error; 489 490 PIPE_LOCK_ASSERT(cpipe, MA_OWNED); 491 while (cpipe->pipe_state & PIPE_LOCKFL) { 492 cpipe->pipe_state |= PIPE_LWANT; 493 error = msleep(cpipe, PIPE_MTX(cpipe), 494 catch ? (PRIBIO | PCATCH) : PRIBIO, 495 "pipelk", 0); 496 if (error != 0) 497 return (error); (kgdb) p *(struct pipe *)0xffffff024153b8e0 $28 = {pipe_buffer = {cnt = 2156708808, in = 4294967295, out = 1147716720, size = 4294967042, buffer = 0x0}, pipe_map = {cnt = 18446742983935575168, po s = 0, npages = -2138352248, ms = {0x0, 0xffffff0149d19e00, 0x0, 0xffffff84877254f0, 0xffffff000354fdc8, 0xffffff00151f0280, 0x0, 0xffffff00150586c0, 0 xffffff02cb71bd00, 0x18d7d, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffffff024153b980, 0xffffff024468c470}}, pipe_sel = {si_tdlist = {tqh_first = 0x1, tqh_last = 0x2 0000000c}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0, kl_unlock = 0xffffff03b40972d8, kl_assert_locked = 0xffffffff8069ac67 , kl_assert_unlocked = 0x5f0000ff04, kl_lockarg = 0x0}, si_mtx = 0x0}, pipe_atime = {tv_sec = 0, tv_nsec = 0}, pipe_mtime = {tv_sec = 0, tv_nsec = 0}, pipe_ctime = {tv_sec = -1096019876864, tv_nsec = 8189724157764698112}, pipe_sigio = 0xd400000000 , pipe_peer = 0x0, pipe_pair = 0x0, pipe_state = 0, pi pe_busy = 0, pipe_present = 0} (kgdb) p *(struct lock_object*)0x100 Cannot access memory at address 0x100 That looks suspicious, further line 493 in sys_pipe.c uses PIPE_MTX: #define PIPE_MTX(pipe) (&(pipe)->pipe_pair->pp_mtx) but we saw pipe_pair=0. I can give more information from the kernel dump. -- You are receiving this mail because: You are the assignee for the bug.