Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Aug 2013 14:01:06 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        freebsd-dtrace@FreeBSD.org, Mark Johnston <markj@FreeBSD.org>
Subject:   sdt deadlock
Message-ID:  <521F29F2.2020208@FreeBSD.org>

next in thread | raw e-mail | index | archive | help

We've just experienced a deadlock between dtrace_lock and sdt_sx.

Mark, could you please take a look at the following debugging data if you have
some time?  Does it look like something that could be fixed by your recent big
change to the related code?


Debug session follows.

(kgdb) tid 101190

(kgdb) bt
#0  sched_switch (td=0xfffffe007a109490, newtd=0xfffffe0018c7c920, flags=<value
optimized out>) at /usr/src/sys/kern/sched_ule.c:1920
#1  0xffffffff80942634 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:485
#2  0xffffffff8097da72 in sleepq_wait (wchan=0xffffffff81e7d300, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:623
#3  0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff81e7d300,
tid=18446741876734203024, opts=<value optimized out>, file=<value optimized
out>, line=<value optimized out>) at /usr/src/sys/kern/kern_sx.c:669
#4  0xffffffff80942295 in _sx_xlock (sx=<value optimized out>, opts=<value
optimized out>, file=<value optimized out>, line=<value optimized out>) at sx.h:161
#5  0xffffffff81c57d1b in dtrace_register (name=0xffffffff80f50e60 "sctp",
pap=0xffffffff81ea73f8, priv=2, cr=0x0, pops=0xffffffff81ea7420, arg=0x0,
idp=0xffffffff812fbde8)
    at
/usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7471
#6  0xffffffff81ea7220 in sdt_provider_reg_callback (prov=<value optimized out>,
arg=<value optimized out>) at
/usr/src/sys/modules/dtrace/sdt/../../../cddl/dev/sdt/sdt.c:182
#7  0xffffffff80936ed5 in sdt_provider_listall_locked
(callback_func=0xffffffff81ea71f0 <sdt_provider_reg_callback>, arg=0x0) at
/usr/src/sys/kern/kern_sdt.c:246
#8  0xffffffff80937424 in sdt_register_callbacks
(register_prov=0xffffffff81ea71f0 <sdt_provider_reg_callback>, reg_prov_arg=0x0,
deregister_prov=0xffffffff81ea71e0 <sdt_provider_unreg_callback>,
dereg_prov_arg=0x0,
    register_probe=0xffffffff81ea7100 <sdt_probe_callback>, reg_probe_arg=0x0)
at /usr/src/sys/kern/kern_sdt.c:320
#9  0xffffffff8091aefe in linker_load_module (kldname=<value optimized out>,
modname=0xffffffff81c3d0c4 "sdt", parent=0xfffffe02b6b5e800,
verinfo=0xffffffff81c3d38c, lfpp=0x0) at /usr/src/sys/kern/kern_linker.c:239
#10 0xffffffff8091b44c in linker_load_dependencies (lf=0xfffffe02b6b5e800) at
/usr/src/sys/kern/kern_linker.c:2131
#11 0xffffffff80d42690 in link_elf_load_file (cls=<value optimized out>,
filename=<value optimized out>, result=0xffffff91e59998f8) at
/usr/src/sys/kern/link_elf_obj.c:845
#12 0xffffffff8091adef in linker_load_module (kldname=<value optimized out>,
modname=0xfffffe010cd64c00 "dtraceall", parent=0x0, verinfo=0x0,
lfpp=0xffffff91e5999958) at linker_if.h:161
#13 0xffffffff8091b538 in kern_kldload (td=0xfffffe007a109490, file=<value
optimized out>, fileid=0xffffff91e59999b4) at /usr/src/sys/kern/kern_linker.c:1039
#14 0xffffffff8091b754 in sys_kldload (td=0xfffffe007a109490, uap=<value
optimized out>) at /usr/src/sys/kern/kern_linker.c:1073
#15 0xffffffff80cd523a in amd64_syscall (td=0xfffffe007a109490, traced=0) at
subr_syscall.c:135
#16 0xffffffff80cbf7f7 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387
#17 0x000000080087e6fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 3
#3  0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff81e7d300,
tid=18446741876734203024, opts=<value optimized out>, file=<value optimized
out>, line=<value optimized out>) at /usr/src/sys/kern/kern_sx.c:669
669     /usr/src/sys/kern/kern_sx.c: No such file or directory.
        in /usr/src/sys/kern/kern_sx.c
(kgdb) p *sx
$1 = {lock_object = {lo_name = 0xffffffff81c6e9dd "dtrace_provider_lock",
lo_flags = 41091072, lo_data = 0, lo_witness = 0x0}, sx_lock = 18446741876595964196}
(kgdb) p/x *sx
$2 = {lock_object = {lo_name = 0xffffffff81c6e9dd, lo_flags = 0x2730000, lo_data
= 0x0, lo_witness = 0x0}, sx_lock = 0xfffffe0071d33924}
(kgdb) p ((struct thread*)0xfffffe0071d33920)->td_tid
$3 = 100433

(kgdb) tid 100433

(kgdb) bt
#0  sched_switch (td=0xfffffe0071d33920, newtd=0xfffffe0018c88920, flags=<value
optimized out>) at /usr/src/sys/kern/sched_ule.c:1920
#1  0xffffffff80942634 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:485
#2  0xffffffff8097da72 in sleepq_wait (wchan=0xffffffff813ac880, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:623
#3  0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff813ac880,
tid=18446741876595964192, opts=<value optimized out>, file=<value optimized
out>, line=<value optimized out>) at /usr/src/sys/kern/kern_sx.c:669
#4  0xffffffff80937793 in sdt_provider_listall (callback_func=0xffffffff81ea7280
<sdt_provider_entry>, arg=0x0) at sx.h:161
#5  0xffffffff81c523da in dtrace_probe_provide (desc=0x0,
prv=0xfffffe011d7bb000) at
/usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7984
#6  0xffffffff81c52487 in dtrace_open (dev=0xfffffe02b6b5bc00, oflags=<value
optimized out>, devtype=<value optimized out>, td=<value optimized out>)
    at
/usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:15418
#7  0xffffffff80820658 in devfs_open (ap=0xffffff91e4ad07d0) at
/usr/src/sys/fs/devfs/devfs_vnops.c:1084
#8  0xffffffff80d6b434 in VOP_OPEN_APV (vop=0xffffffff812be940,
a=0xffffff91e4ad07d0) at vnode_if.c:447
#9  0xffffffff809e826b in vn_open_cred (ndp=0xffffff91e4ad0880,
flagp=0xffffff91e4ad087c, cmode=0, vn_open_flags=<value optimized out>,
cred=0xfffffe0018c6da00, fp=0xfffffe02b6ed36e0) at vnode_if.h:196
#10 0xffffffff809e3faa in kern_openat (td=0xfffffe0071d33920, fd=-100,
path=0x800aa89b2 <Error reading address 0x800aa89b2: Bad address>,
pathseg=UIO_USERSPACE, flags=3, mode=<value optimized out>)
    at /usr/src/sys/kern/vfs_syscalls.c:1132
#11 0xffffffff80cd523a in amd64_syscall (td=0xfffffe0071d33920, traced=0) at
subr_syscall.c:135
#12 0xffffffff80cbf7f7 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387
#13 0x0000000801a2f50c in ?? ()
(kgdb) fr 3
#3  0xffffffff80941ad6 in _sx_xlock_hard (sx=0xffffffff813ac880,
tid=18446741876595964192, opts=<value optimized out>, file=<value optimized
out>, line=<value optimized out>) at /usr/src/sys/kern/kern_sx.c:669
669     /usr/src/sys/kern/kern_sx.c: No such file or directory.
        in /usr/src/sys/kern/kern_sx.c
(kgdb) p/x *sx
$4 = {lock_object = {lo_name = 0xffffffff80f340fa, lo_flags = 0x2310000, lo_data
= 0x0, lo_witness = 0x0}, sx_lock = 0xfffffe007a109494}
(kgdb) p ((struct thread*)0xfffffe007a109490)->td_tid
$5 = 101190

The related snippets from the source code:
int
dtrace_register(const char *name, const dtrace_pattr_t *pap, uint32_t priv,
    cred_t *cr, const dtrace_pops_t *pops, void *arg, dtrace_provider_id_t *idp)
{
...
        mutex_enter(&dtrace_provider_lock); <------------
        mutex_enter(&dtrace_lock);
...


int
sdt_provider_listall(sdt_provider_listall_func_t callback_func, void *arg)
{
        int error;

        sx_xlock(&sdt_sx); <-----------
...

So it seems that thread 101190 has sdt_sx and wants to take dtrace_provider_lock
in dtrace_register.  At the same time thread 100433 has dtrace_provider_lock
(taken in dtrace_open right before dtrace_probe_provide) and wants to take
sdt_sx in sdt_provider_listall.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?521F29F2.2020208>