From owner-freebsd-dtrace@FreeBSD.ORG Thu Aug 29 11:01:45 2013 Return-Path: Delivered-To: freebsd-dtrace@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7702CFDA; Thu, 29 Aug 2013 11:01:45 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9636C2C39; Thu, 29 Aug 2013 11:01:44 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA25424; Thu, 29 Aug 2013 14:01:43 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1VEzyw-0002qn-P9; Thu, 29 Aug 2013 14:01:42 +0300 Message-ID: <521F29F2.2020208@FreeBSD.org> Date: Thu, 29 Aug 2013 14:01:06 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130810 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-dtrace@FreeBSD.org, Mark Johnston Subject: sdt deadlock X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-dtrace@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "A discussion list for developers working on DTrace in FreeBSD." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 11:01:45 -0000 We've just experienced a deadlock between dtrace_lock and sdt_sx. Mark, could you please take a look at the following debugging data if you have some time? Does it look like something that could be fixed by your recent big change to the related code? Debug session follows. (kgdb) tid 101190 (kgdb) bt #0 sched_switch (td=0xfffffe007a109490, newtd=0xfffffe0018c7c920, flags=) at /usr/src/sys/kern/sched_ule.c:1920 #1 0xffffffff80942634 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0xffffffff8097da72 in sleepq_wait (wchan=0xffffffff81e7d300, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:623 #3 0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff81e7d300, tid=18446741876734203024, opts=, file=, line=) at /usr/src/sys/kern/kern_sx.c:669 #4 0xffffffff80942295 in _sx_xlock (sx=, opts=, file=, line=) at sx.h:161 #5 0xffffffff81c57d1b in dtrace_register (name=0xffffffff80f50e60 "sctp", pap=0xffffffff81ea73f8, priv=2, cr=0x0, pops=0xffffffff81ea7420, arg=0x0, idp=0xffffffff812fbde8) at /usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7471 #6 0xffffffff81ea7220 in sdt_provider_reg_callback (prov=, arg=) at /usr/src/sys/modules/dtrace/sdt/../../../cddl/dev/sdt/sdt.c:182 #7 0xffffffff80936ed5 in sdt_provider_listall_locked (callback_func=0xffffffff81ea71f0 , arg=0x0) at /usr/src/sys/kern/kern_sdt.c:246 #8 0xffffffff80937424 in sdt_register_callbacks (register_prov=0xffffffff81ea71f0 , reg_prov_arg=0x0, deregister_prov=0xffffffff81ea71e0 , dereg_prov_arg=0x0, register_probe=0xffffffff81ea7100 , reg_probe_arg=0x0) at /usr/src/sys/kern/kern_sdt.c:320 #9 0xffffffff8091aefe in linker_load_module (kldname=, modname=0xffffffff81c3d0c4 "sdt", parent=0xfffffe02b6b5e800, verinfo=0xffffffff81c3d38c, lfpp=0x0) at /usr/src/sys/kern/kern_linker.c:239 #10 0xffffffff8091b44c in linker_load_dependencies (lf=0xfffffe02b6b5e800) at /usr/src/sys/kern/kern_linker.c:2131 #11 0xffffffff80d42690 in link_elf_load_file (cls=, filename=, result=0xffffff91e59998f8) at /usr/src/sys/kern/link_elf_obj.c:845 #12 0xffffffff8091adef in linker_load_module (kldname=, modname=0xfffffe010cd64c00 "dtraceall", parent=0x0, verinfo=0x0, lfpp=0xffffff91e5999958) at linker_if.h:161 #13 0xffffffff8091b538 in kern_kldload (td=0xfffffe007a109490, file=, fileid=0xffffff91e59999b4) at /usr/src/sys/kern/kern_linker.c:1039 #14 0xffffffff8091b754 in sys_kldload (td=0xfffffe007a109490, uap=) at /usr/src/sys/kern/kern_linker.c:1073 #15 0xffffffff80cd523a in amd64_syscall (td=0xfffffe007a109490, traced=0) at subr_syscall.c:135 #16 0xffffffff80cbf7f7 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 #17 0x000000080087e6fc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 3 #3 0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff81e7d300, tid=18446741876734203024, opts=, file=, line=) at /usr/src/sys/kern/kern_sx.c:669 669 /usr/src/sys/kern/kern_sx.c: No such file or directory. in /usr/src/sys/kern/kern_sx.c (kgdb) p *sx $1 = {lock_object = {lo_name = 0xffffffff81c6e9dd "dtrace_provider_lock", lo_flags = 41091072, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741876595964196} (kgdb) p/x *sx $2 = {lock_object = {lo_name = 0xffffffff81c6e9dd, lo_flags = 0x2730000, lo_data = 0x0, lo_witness = 0x0}, sx_lock = 0xfffffe0071d33924} (kgdb) p ((struct thread*)0xfffffe0071d33920)->td_tid $3 = 100433 (kgdb) tid 100433 (kgdb) bt #0 sched_switch (td=0xfffffe0071d33920, newtd=0xfffffe0018c88920, flags=) at /usr/src/sys/kern/sched_ule.c:1920 #1 0xffffffff80942634 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0xffffffff8097da72 in sleepq_wait (wchan=0xffffffff813ac880, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:623 #3 0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff813ac880, tid=18446741876595964192, opts=, file=, line=) at /usr/src/sys/kern/kern_sx.c:669 #4 0xffffffff80937793 in sdt_provider_listall (callback_func=0xffffffff81ea7280 , arg=0x0) at sx.h:161 #5 0xffffffff81c523da in dtrace_probe_provide (desc=0x0, prv=0xfffffe011d7bb000) at /usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7984 #6 0xffffffff81c52487 in dtrace_open (dev=0xfffffe02b6b5bc00, oflags=, devtype=, td=) at /usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:15418 #7 0xffffffff80820658 in devfs_open (ap=0xffffff91e4ad07d0) at /usr/src/sys/fs/devfs/devfs_vnops.c:1084 #8 0xffffffff80d6b434 in VOP_OPEN_APV (vop=0xffffffff812be940, a=0xffffff91e4ad07d0) at vnode_if.c:447 #9 0xffffffff809e826b in vn_open_cred (ndp=0xffffff91e4ad0880, flagp=0xffffff91e4ad087c, cmode=0, vn_open_flags=, cred=0xfffffe0018c6da00, fp=0xfffffe02b6ed36e0) at vnode_if.h:196 #10 0xffffffff809e3faa in kern_openat (td=0xfffffe0071d33920, fd=-100, path=0x800aa89b2 , pathseg=UIO_USERSPACE, flags=3, mode=) at /usr/src/sys/kern/vfs_syscalls.c:1132 #11 0xffffffff80cd523a in amd64_syscall (td=0xfffffe0071d33920, traced=0) at subr_syscall.c:135 #12 0xffffffff80cbf7f7 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 #13 0x0000000801a2f50c in ?? () (kgdb) fr 3 #3 0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff813ac880, tid=18446741876595964192, opts=, file=, line=) at /usr/src/sys/kern/kern_sx.c:669 669 /usr/src/sys/kern/kern_sx.c: No such file or directory. in /usr/src/sys/kern/kern_sx.c (kgdb) p/x *sx $4 = {lock_object = {lo_name = 0xffffffff80f340fa, lo_flags = 0x2310000, lo_data = 0x0, lo_witness = 0x0}, sx_lock = 0xfffffe007a109494} (kgdb) p ((struct thread*)0xfffffe007a109490)->td_tid $5 = 101190 The related snippets from the source code: int dtrace_register(const char *name, const dtrace_pattr_t *pap, uint32_t priv, cred_t *cr, const dtrace_pops_t *pops, void *arg, dtrace_provider_id_t *idp) { ... mutex_enter(&dtrace_provider_lock); <------------ mutex_enter(&dtrace_lock); ... int sdt_provider_listall(sdt_provider_listall_func_t callback_func, void *arg) { int error; sx_xlock(&sdt_sx); <----------- ... So it seems that thread 101190 has sdt_sx and wants to take dtrace_provider_lock in dtrace_register. At the same time thread 100433 has dtrace_provider_lock (taken in dtrace_open right before dtrace_probe_provide) and wants to take sdt_sx in sdt_provider_listall. -- Andriy Gapon