From owner-freebsd-current@FreeBSD.ORG Sun Dec 29 21:36:56 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2F46A88E for ; Sun, 29 Dec 2013 21:36:56 +0000 (UTC) Received: from mail-ig0-x22f.google.com (mail-ig0-x22f.google.com [IPv6:2607:f8b0:4001:c05::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id F0C8219BD for ; Sun, 29 Dec 2013 21:36:55 +0000 (UTC) Received: by mail-ig0-f175.google.com with SMTP id j1so34078995iga.2 for ; Sun, 29 Dec 2013 13:36:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=A+V09O3SzLsBla/Ot8TxdBX6F+beDk2PVMCxiYJQPQE=; b=JRnIadkswLjmsE5rcVsTPIa7ruYmCA84Vqmr9eYheFlxmmk5mBlRML1PSTfLwY6+1/ yox0mWsxAa5XMUCjoLHvUGPuW4JsatGSupnmTcR15o6zyoksIaQu5BjQhoZl4sG6Dh04 PKZtI8JSDpCE6WvbVYhHzN5MgFDIiZEbt5h708FOyUtu09TDrjkGZDF8+ru+vJvN1FPY JuN1izBDnc3qJ0fvOS6w7CATS82kbAjYENsj8sOI4Bwe7udR5bUxOk7bFeh5Nqndot+z 1ATe0Kz5Rt10apojvXQYS70x2xylrpdk58hEZhaC0mnFxXP4rDirZGeHkYztwXhLsTAM DUaA== X-Received: by 10.50.30.42 with SMTP id p10mr51690722igh.5.1388353014857; Sun, 29 Dec 2013 13:36:54 -0800 (PST) Received: from charmander.home ([65.95.185.87]) by mx.google.com with ESMTPSA id o1sm56249973igh.9.2013.12.29.13.36.53 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 29 Dec 2013 13:36:54 -0800 (PST) Sender: Mark Johnston Date: Sun, 29 Dec 2013 16:36:18 -0500 From: Mark Johnston To: freebsd-current@freebsd.org Subject: smp_rendezvous_cpus() deadlock Message-ID: <20131229213618.GA4990@charmander.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.22 (2013-10-16) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Dec 2013 21:36:56 -0000 Hello, While experimenting with some userland DTrace scripts, I seem to be consistently able to trigger a deadlock between smp_rendezvous_cpus() (called periodically by DTrace) and smp_targeted_tlb_shootdown(): spin lock 0xffffffff80fe0620 (smp rendezvous) held by 0xfffff8000753b490 (tid 100059) too long panic: spin lock held too long [...] (gdb) bt #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff806387c7 in kern_reboot (howto=260) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:452 #2 0xffffffff80638cd5 in vpanic (fmt=, ap=) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:759 #3 0xffffffff80638d23 in panic (fmt=) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:688 #4 0xffffffff80624b68 in _mtx_lock_spin_cookie (c=, tid=, opts=, file=, line=) at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:551 #5 0xffffffff80624878 in __mtx_lock_spin_flags (c=, opts=0, file=0xffffffff80a1ca28 "/usr/home/markj/src/freebsd/sys/kern/subr_smp.c", line=498) at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:279 #6 0xffffffff8067eba3 in smp_rendezvous_cpus (setup_func=0xffffffff8067eae0 , action_func=0xffffffff814e2d00 , teardown_func=0xffffffff8067eae0 , arg=0x0) at /usr/home/markj/src/freebsd/sys/kern/subr_smp.c:498 #7 0xffffffff814d5743 in dtrace_state_deadman (arg=0xfffff80007ee5c00) at /usr/home/markj/src/freebsd/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:13144 #8 0xffffffff8064cf38 in softclock_call_cc (c=0xfffff80007ee5d40, cc=0xffffffff80fda080, direct=0) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:681 #9 0xffffffff8064d2b7 in softclock (arg=) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:809 #10 0xffffffff8060a053 in intr_event_execute_handlers (p=, ie=0xfffff80002958d00) at /usr/home/markj/src/freebsd/sys/kern/kern_intr.c:1263 #11 0xffffffff8060aa26 in ithread_loop (arg=0xfffff80002999f60) at /usr/home/markj/src/freebsd/sys/kern/kern_intr.c:1276 #12 0xffffffff806071a4 in fork_exit (callout=0xffffffff8060a980 , arg=0xfffff80002999f60, frame=0xfffffe0113b99ac0) at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:977 #13 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:605 (kgdb) tid 100059 [Switching to thread 67 (Thread 100059)]#0 0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432 1432 savectx(&stoppcbs[cpu]); (kgdb) bt #0 0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432 #1 0xffffffff808e1ecf in ipi_nmi_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1417 #2 0xffffffff808f1e02 in trap (frame=0xfffffe0113b68f30) at /usr/home/markj/src/freebsd/sys/amd64/amd64/trap.c:208 #3 0xffffffff808d7ed3 in nmi_calltrap () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:504 #4 0xffffffff808e1b39 in smp_targeted_tlb_shootdown (mask={__bits = {0}}, vector=, pmap=, addr1=, addr2=) at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1204 #5 0xffffffff808e2f25 in pmap_invalidate_page (pmap=, va=) at /usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:1375 #6 0xffffffff808ec3d5 in pmap_ts_referenced (m=0xfffff800bcfc78b8) at /usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:5743 #7 0xffffffff808c8953 in vm_pageout () at /usr/home/markj/src/freebsd/sys/vm/vm_pageout.c:1366 #8 0xffffffff806071a4 in fork_exit (callout=0xffffffff808c7930 , arg=0x0, frame=0xfffffe011bfabac0) at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:977 #9 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:605 Indeed, there is a comment above the definition of smp_ipi_mtx in subr_smp.c to the effect that a deadlock can occur if, say, the target CPU of smp_targeted_tlb_shootdown() is spinning on smp_ipi_mtx. Is there any reason that this deadlock doesn't happen more often in practice? Is it possible to spin on smp_ipi_mtx without disabling interrupts, as that doesn't seem to be necessary in this case? Thanks, -Mark