From owner-freebsd-bugs@FreeBSD.ORG Sat Mar 29 23:10:00 2014 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BE98C58F for ; Sat, 29 Mar 2014 23:10:00 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9B8106D1 for ; Sat, 29 Mar 2014 23:10:00 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.8/8.14.8) with ESMTP id s2TNA03W039227 for ; Sat, 29 Mar 2014 23:10:00 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.8/8.14.8/Submit) id s2TNA0Nx039226; Sat, 29 Mar 2014 23:10:00 GMT (envelope-from gnats) Resent-Date: Sat, 29 Mar 2014 23:10:00 GMT Resent-Message-Id: <201403292310.s2TNA0Nx039226@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Mathieu Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 02D6E573 for ; Sat, 29 Mar 2014 23:07:26 +0000 (UTC) Received: from cgiserv.freebsd.org (cgiserv.freebsd.org [IPv6:2001:1900:2254:206a::50:4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D6DB96C3 for ; Sat, 29 Mar 2014 23:07:25 +0000 (UTC) Received: from cgiserv.freebsd.org ([127.0.1.6]) by cgiserv.freebsd.org (8.14.8/8.14.8) with ESMTP id s2TN7POB011394 for ; Sat, 29 Mar 2014 23:07:25 GMT (envelope-from nobody@cgiserv.freebsd.org) Received: (from nobody@localhost) by cgiserv.freebsd.org (8.14.8/8.14.8/Submit) id s2TN7Pwv011393; Sat, 29 Mar 2014 23:07:25 GMT (envelope-from nobody) Message-Id: <201403292307.s2TN7Pwv011393@cgiserv.freebsd.org> Date: Sat, 29 Mar 2014 23:07:25 GMT From: Mathieu To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Subject: kern/188063: deadlock between syncache(4) and pf(4) X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Mar 2014 23:10:00 -0000 >Number: 188063 >Category: kern >Synopsis: deadlock between syncache(4) and pf(4) >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Mar 29 23:10:00 UTC 2014 >Closed-Date: >Last-Modified: >Originator: Mathieu >Release: 9.2-RELEASE-p3 >Organization: >Environment: FreeBSD 9.2-RELEASE-p3 amd64 >Description: We have a server that becomes unresponsive every few weeks or so. When it happens, the NICs seem dead, and user processes hang in the "tcp" state. The only way to fix it is rebooting. This time, I got it to dump core before rebooting. IIUC, there's a deadlock happening with an inpcb and a syncache_head locks between the "swi1: netisr 0" and "swi4: clock" threads. No idea where to go from there... (kgdb) tid 100011 [Switching to thread 41 (Thread 100011)]#3 0xffffffff808ef68e in _mtx_lock_sleep (m=0xffffff80010a7088, tid=18446741874755597456, opts=, file=, line=) at /usr/src/sys/kern/kern_mutex.c:466 466 turnstile_wait(ts, mtx_owner(m), TS_EXCLUSIVE_QUEUE); (kgdb) bt #0 sched_switch (td=0xfffffe0004217490, newtd=0xfffffe0004209490, flags=) at /usr/src/sys/kern/sched_ule.c:1920 #1 0xffffffff8090d4f4 in mi_switch (flags=259, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0xffffffff8094f446 in turnstile_wait (ts=, owner=0xfffffe0004216920, queue=) at /usr/src/sys/kern/subr_turnstile.c:753 #3 0xffffffff808ef68e in _mtx_lock_sleep (m=0xffffff80010a7088, tid=18446741874755597456, opts=, file=, line=) at /usr/src/sys/kern/kern_mutex.c:466 #4 0xffffffff80ab3c97 in syncache_lookup (inc=0xffffff80002a2910, schp=) at /usr/src/sys/netinet/tcp_syncache.c:500 #5 0xffffffff80ab424c in syncache_chkrst (inc=0xffffff80002a2910, th=0xfffffe005157ab7c) at /usr/src/sys/netinet/tcp_syncache.c:528 #6 0xffffffff80aabc33 in tcp_input (m=0xfffffe005157ab00, off0=) at /usr/src/sys/netinet/tcp_input.c:1184 #7 0xffffffff80a3c5aa in ip_input (m=0xfffffe005157ab00) at /usr/src/sys/netinet/ip_input.c:760 #8 0xffffffff809db591 in swi_net (arg=) at /usr/src/sys/net/netisr.c:806 #9 0xffffffff808d451d in intr_event_execute_handlers ( p=, ie=0xfffffe0004221c00) at /usr/src/sys/kern/kern_intr.c:1272 #10 0xffffffff808d5d0d in ithread_loop (arg=0xfffffe00042036c0) at /usr/src/sys/kern/kern_intr.c:1285 #11 0xffffffff808d099f in fork_exit ( callout=0xffffffff808d5c70 , arg=0xfffffe00042036c0, frame=0xffffff80002a2b00) at /usr/src/sys/kern/kern_fork.c:992 #12 0xffffffff80ce603e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 #13 0x0000000000000000 in ?? () (kgdb) frame 3 #3 0xffffffff808ef68e in _mtx_lock_sleep (m=0xffffff80010a7088, tid=18446741874755597456, opts=, file=, line=) at /usr/src/sys/kern/kern_mutex.c:466 466 turnstile_wait(ts, mtx_owner(m), TS_EXCLUSIVE_QUEUE); (kgdb) p ((struct thread *)(m->mtx_lock&~15))->td_tid $1 = 100013 (kgdb) tid 100013 [Switching to thread 43 (Thread 100013)]#0 sched_switch ( td=0xfffffe0004216920, newtd=0xfffffe0004209920, flags=) at /usr/src/sys/kern/sched_ule.c:1920 1920 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xfffffe0004216920, newtd=0xfffffe0004209920, flags=) at /usr/src/sys/kern/sched_ule.c:1920 #1 0xffffffff8090d4f4 in mi_switch (flags=259, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0xffffffff8094f446 in turnstile_wait (ts=, owner=0xfffffe0004216920, queue=) at /usr/src/sys/kern/subr_turnstile.c:753 #3 0xffffffff809014b2 in _rw_rlock (rw=0xfffffe0051850a98, file=, line=0) at /usr/src/sys/kern/kern_rwlock.c:477 #4 0xffffffff80a35771 in in_pcblookup_hash (pcbinfo=0xffffffff81434020, faddr=, fport=19210, laddr={s_addr = 1827520685}, lport=, lookupflags=2, ifp=0x0) at /usr/src/sys/netinet/in_pcb.c:1805 #5 0xffffffff81a1da99 in pf_socket_lookup () from /boot/kernel/pf.ko #6 0xffffffff81a248a5 in pf_test_rule () from /boot/kernel/pf.ko #7 0xffffffff81a2834c in pf_test () from /boot/kernel/pf.ko #8 0xffffffff81a2f961 in pf_check_out () from /boot/kernel/pf.ko #9 0xffffffff809dbbee in pfil_run_hooks (ph=, mp=0xffffff80002ac7f8, ifp=0x6e00, dir=115288696, inp=0x4b0a) at /usr/src/sys/net/pfil.c:82 #10 0xffffffff80a3ecb9 in ip_output (m=0xfffffe0006df2a00, opt=, ro=0xffffff80002ac810, flags=0, imo=0x0, inp=0x0) at /usr/src/sys/netinet/ip_output.c:504 #11 0xffffffff80ab398f in syncache_respond (sc=0xfffffe0173157000) at /usr/src/sys/netinet/tcp_syncache.c:1525 #12 0xffffffff80ab3afa in syncache_timer (xsch=) at /usr/src/sys/netinet/tcp_syncache.c:460 #13 0xffffffff80919ee8 in softclock (arg=) at /usr/src/sys/kern/kern_timeout.c:520 #14 0xffffffff808d451d in intr_event_execute_handlers ( p=, ie=0xfffffe0004221800) at /usr/src/sys/kern/kern_intr.c:1272 #15 0xffffffff808d5d0d in ithread_loop (arg=0xfffffe0004203680) at /usr/src/sys/kern/kern_intr.c:1285 #16 0xffffffff808d099f in fork_exit ( callout=0xffffffff808d5c70 , arg=0xfffffe0004203680, frame=0xffffff80002acb00) at /usr/src/sys/kern/kern_fork.c:992 #17 0xffffffff80ce603e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 #18 0x0000000000000000 in ?? () (kgdb) frame 3 #3 0xffffffff809014b2 in _rw_rlock (rw=0xfffffe0051850a98, file=, line=0) at /usr/src/sys/kern/kern_rwlock.c:477 477 turnstile_wait(ts, rw_owner(rw), TS_SHARED_QUEUE); (kgdb) p ((struct thread *)(rw->rw_lock&~15))->td_tid $2 = 100011 >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: