From owner-freebsd-stable@freebsd.org Sun Nov 13 13:28:34 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2B50AC3F19A for ; Sun, 13 Nov 2016 13:28:34 +0000 (UTC) (envelope-from hlh@restart.be) Received: from tignes.restart.be (tignes.restart.be [5.135.182.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "tignes.restart.be", Issuer "CA master" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8ED9A1908; Sun, 13 Nov 2016 13:28:32 +0000 (UTC) (envelope-from hlh@restart.be) X-Comment: SPF check N/A for local connections - client-ip=2001:41d0:8:bdbe:1:1::; helo=restart.be; envelope-from=hlh@restart.be; receiver=avg@freebsd.org DKIM-Filter: OpenDKIM Filter v2.10.3 tignes.restart.be 3tGvc74cBhzsKf DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=restart.be; s=tignes; t=1479043703; bh=BqBt0XHNDre+RDvQQYIrpVYWSVePprVwWZC394LdmEQ=; h=Subject:To:References:Cc:From:Date:In-Reply-To; z=Subject:=20Re:=20Freebsd=2011.0=20RELEASE=20-=20ZFS=20deadlock|To :=20Andriy=20Gapon=20,=20freebsd-stable@FreeBSD.o rg|References:=20<0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> =0D=0A=20=0D=0A= 20<43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org>=0D=0A=20=0D=0A=20<9d1f9a76-5a8 d-6eca-9a50-907d55099847@FreeBSD.org>=0D=0A=20<6bc95dce-31e1-3013- bfe3-7c2dd80f9d1e@restart.be>=0D=0A=20=0D=0A=20<23a66749-f138-1f1a-afae-c775f906ff 37@restart.be>=0D=0A=20<8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeB SD.org>=0D=0A=20<6d991cea-b420-531e-12cc-001e4aeed66b@restart.be>= 0D=0A=20<67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org>=0D=0A=2 0<1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be>=0D=0A=20=0D=0A=20<9f20020b-e2f1 -862b-c3fc-dc6ff94e301e@restart.be>=0D=0A=20=0D=0A=20<599c5a5b-aa08-2030-34f3-23ff 19d09a9b@restart.be>=0D=0A=20<32686283-948a-6faf-7ded-ed8fcd23affb @FreeBSD.org>|Cc:=20Konstantin=20Belousov=20|From :=20Henri=20Hennebert=20|Date:=20Sun,=2013=20Nov=2 02016=2014:28:20=20+0100|In-Reply-To:=20<32686283-948a-6faf-7ded-e d8fcd23affb@FreeBSD.org>; b=QaHDNfZ3qtgreXTBY7s/LqCZiKkJuLnqlIeElRS2VHQfxhnH4vuLb5cOHUnyyq7El /Dqvss1OfzCiu6ono5s+u8HpIUBcToA1w2FeuA0/ElBtCk8tRN9O1qGHPc9GjN2uYC Vw4oNyUBV0epkwU86xddCl7r5UpSf+levCv0GyUzM1/Wcm5cButd3zZ/QRRzhSPB81 N8bQOdZRXvoPkOnvE1T9Ld0wEJ73Ns+N3Otd0tmEW0zr7eMd5++Qh13W6Jzq6Uoxfz dEaTfx5QBbhBNZF6nCsFb2NbKLvM5TMsXRASJbLYBTWuCTMbgNel3QBRa1tESnvKsA sMm3SadtjXWKw== Received: from restart.be (avoriaz.restart.be [IPv6:2001:41d0:8:bdbe:1:1::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.restart.be", Issuer "CA master" (verified OK)) by tignes.restart.be (Postfix) with ESMTPS id 3tGvc74cBhzsKf; Sun, 13 Nov 2016 14:28:23 +0100 (CET) Received: from chamonix.restart.bel (chamonix.restart.bel [IPv6:2001:41d0:8:bdbe:1:9:0:0]) (authenticated bits=0) by restart.be (8.15.2/8.15.2) with ESMTPSA id uADDSKNg071397 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sun, 13 Nov 2016 14:28:22 +0100 (CET) (envelope-from hlh@restart.be) Subject: Re: Freebsd 11.0 RELEASE - ZFS deadlock To: Andriy Gapon , freebsd-stable@FreeBSD.org References: <0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> <43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org> <9d1f9a76-5a8d-6eca-9a50-907d55099847@FreeBSD.org> <6bc95dce-31e1-3013-bfe3-7c2dd80f9d1e@restart.be> <23a66749-f138-1f1a-afae-c775f906ff37@restart.be> <8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeBSD.org> <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be> <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org> <1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be> <9f20020b-e2f1-862b-c3fc-dc6ff94e301e@restart.be> <599c5a5b-aa08-2030-34f3-23ff19d09a9b@restart.be> <32686283-948a-6faf-7ded-ed8fcd23affb@FreeBSD.org> Cc: Konstantin Belousov From: Henri Hennebert Message-ID: Date: Sun, 13 Nov 2016 14:28:20 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <32686283-948a-6faf-7ded-ed8fcd23affb@FreeBSD.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Nov 2016 13:28:34 -0000 On 11/13/2016 11:06, Andriy Gapon wrote: > On 12/11/2016 14:40, Henri Hennebert wrote: >> I attatch it > > Thank you! > So, these two threads are trying to get the lock in the exclusive mode: > Thread 687 (Thread 101243): > #0 sched_switch (td=0xfffff800b642b500, newtd=0xfffff8000285ea00, flags= optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 > #1 0xffffffff80561ae2 in mi_switch (flags=, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:455 > #2 0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:646 > #3 0xffffffff8052f854 in sleeplk (lk=, flags= optimized out>, ilk=, wmesg=0xffffffff813be535 "zfs", > pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 > #4 0xffffffff8052f39d in __lockmgr_args (lk=, flags= optimized out>, ilk=, wmesg=, > pri=, timo=, file= out>, line=) at /usr/src/sys/kern/kern_lock.c:958 > #5 0xffffffff80616a8c in vop_stdlock (ap=) at lockmgr.h:98 > #6 0xffffffff8093784d in VOP_LOCK1_APV (vop=, a= optimized out>) at vnode_if.c:2087 > #7 0xffffffff8063c5b3 in _vn_lock (vp=, flags=548864, > file=, line=) at vnode_if.h:859 > #8 0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=548864, > td=0xfffff800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523 > #9 0xffffffff806118b9 in cache_lookup (dvp=, vpp= optimized out>, cnp=, tsp=, > ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 > #10 0xffffffff806133dc in vfs_cache_lookup (ap=) at > /usr/src/sys/kern/vfs_cache.c:1081 > #11 0xffffffff80935777 in VOP_LOOKUP_APV (vop=, a= optimized out>) at vnode_if.c:127 > #12 0xffffffff8061cdf1 in lookup (ndp=) at vnode_if.h:54 > #13 0xffffffff8061c492 in namei (ndp=) at > /usr/src/sys/kern/vfs_lookup.c:306 > #14 0xffffffff80509395 in kern_execve (td=, args= optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 > #15 0xffffffff80508ccc in sys_execve (td=0xfffff800b642b500, > uap=0xfffffe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 > #16 0xffffffff808d449e in amd64_syscall (td=, traced=0) at > subr_syscall.c:135 > #17 0xffffffff808b7ddb in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:396 > > Thread 681 (Thread 101147): > #0 sched_switch (td=0xfffff80065f4e500, newtd=0xfffff8000285f000, flags= optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 > #1 0xffffffff80561ae2 in mi_switch (flags=, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:455 > #2 0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:646 > #3 0xffffffff8052f854 in sleeplk (lk=, flags= optimized out>, ilk=, wmesg=0xffffffff813be535 "zfs", > pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 > #4 0xffffffff8052f39d in __lockmgr_args (lk=, flags= optimized out>, ilk=, wmesg=, > pri=, timo=, file= out>, line=) at /usr/src/sys/kern/kern_lock.c:958 > #5 0xffffffff80616a8c in vop_stdlock (ap=) at lockmgr.h:98 > #6 0xffffffff8093784d in VOP_LOCK1_APV (vop=, a= optimized out>) at vnode_if.c:2087 > #7 0xffffffff8063c5b3 in _vn_lock (vp=, flags=548864, > file=, line=) at vnode_if.h:859 > #8 0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=548864, > td=0xfffff80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523 > #9 0xffffffff806118b9 in cache_lookup (dvp=, vpp= optimized out>, cnp=, tsp=, > ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 > #10 0xffffffff806133dc in vfs_cache_lookup (ap=) at > /usr/src/sys/kern/vfs_cache.c:1081 > #11 0xffffffff80935777 in VOP_LOOKUP_APV (vop=, a= optimized out>) at vnode_if.c:127 > #12 0xffffffff8061cdf1 in lookup (ndp=) at vnode_if.h:54 > #13 0xffffffff8061c492 in namei (ndp=) at > /usr/src/sys/kern/vfs_lookup.c:306 > #14 0xffffffff80509395 in kern_execve (td=, args= optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 > #15 0xffffffff80508ccc in sys_execve (td=0xfffff80065f4e500, > uap=0xfffffe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218 > #16 0xffffffff808d449e in amd64_syscall (td=, traced=0) at > subr_syscall.c:135 > #17 0xffffffff808b7ddb in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:396 This 2 threads are innd processes. In core.txt.4: 8 14789 29165 0 24 4 40040 6612 zfs DN - 0:00.00 [innd] 8 29165 1 0 20 0 42496 6888 select Ds - 0:01.33 [innd] 8 49778 29165 0 24 4 40040 6900 zfs DN - 0:00.00 [innd] 8 82034 29165 0 24 4 132 0 zfs DN - 0:00.00 [innd] the corresponding info treads are: 687 Thread 101243 (PID=49778: innd) sched_switch (td=0xfffff800b642b500, newtd=0xfffff8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 681 Thread 101147 (PID=14789: innd) sched_switch (td=0xfffff80065f4e500, newtd=0xfffff8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 669 Thread 101250 (PID=82034: innd) sched_switch (td=0xfffff800b6429000, newtd=0xfffff8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 665 Thread 101262 (PID=29165: innd) sched_switch (td=0xfffff800b6b54a00, newtd=0xfffff8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 So your missing tread must be 101250: (kgdb) tid 101250 [Switching to thread 669 (Thread 101250)]#0 sched_switch (td=0xfffff800b6429000, newtd=0xfffff8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 1973 cpuid = PCPU_GET(cpuid); Current language: auto; currently minimal (kgdb) bt #0 sched_switch (td=0xfffff800b6429000, newtd=0xfffff8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0xffffffff80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0xffffffff8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0xffffffff813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0xffffffff8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0xffffffff80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0xffffffff8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0xffffffff8063c5b3 in _vn_lock (vp=, flags=525312, file=, line=) at vnode_if.h:859 #8 0xffffffff804e16cf in exec_elf64_imgact (imgp=) at /usr/src/sys/kern/imgact_elf.c:899 #9 0xffffffff8050983d in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:602 #10 0xffffffff80508ccc in sys_execve (td=0xfffff800b6429000, uap=0xfffffe010184fb80) at /usr/src/sys/kern/kern_exec.c:218 #11 0xffffffff808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #12 0xffffffff808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #13 0x000000080217edaa in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) > > And the original stuck thread wants to get the lock in the shared mode. > And there should be another thread that already holds the lock in the shared > mode. But I am not able to identify it. I wonder if the original thread could > be trying to get the lock recursively... > > It would be interesting to get more details from thread 101112. > You can switch to it using tid command, you can use 'fr' to select frames, 'info > local' and 'info args' to see what variables are available (not optimized out) > and the you can print any that look interesting. It would be nice to get a file > path and a directory vnode where the lookup is called. > > Thank you. > I find nothing interresting: (kgdb) fr 15 #15 0xffffffff806369cc in sys_fstatat (td=0x0, uap=0xfffffe010161db80) at /usr/src/sys/kern/vfs_syscalls.c:2136 2136 error = kern_statat(td, uap->flag, uap->fd, uap->path, (kgdb) print *uap $1 = {fd_l_ = 0xfffffe010161db80 "\234ÿÿÿ", fd = -100, fd_r_ = 0xfffffe010161db84 "", path_l_ = 0xfffffe010161db88 "\210£â", path = 0x800e2a388
, path_r_ = 0xfffffe010161db90 "\020£â", buf_l_ = 0xfffffe010161db90 "\020£â", buf = 0x800e2a310, buf_r_ = 0xfffffe010161db98 "", flag_l_ = 0xfffffe010161db98 "", flag = 512, flag_r_ = 0xfffffe010161db9c ""} (kgdb) print $1->path $2 = 0x800e2a388
(kgdb) fr 8 #8 0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=2121728, td=0xfffff80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 2523 if ((error = vn_lock(vp, flags)) != 0) { (kgdb) print *vp $3 = {v_tag = 0xffffffff813be535 "zfs", v_op = 0xffffffff813d0f70, v_data = 0xfffff80049c1f420, v_mount = 0xfffff800093aa660, v_nmntvnodes = {tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { tqh_first = 0xfffff800bfc8e3f0, tqh_last = 0xfffff800bfc8e410}, v_cache_dd = 0x0, v_lock = {lock_object = { lo_name = 0xffffffff813be535 "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0xffffffff8099e9e0 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xfffff80049c2c068, v_actfreelist = { tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049ae9bd0}, v_bufobj = {bo_lock = {lock_object = { lo_name = 0xffffffff8099e9f0 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0xffffffff80c4bf70, bo_object = 0xfffff800b62e9c60, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff80049c2c000, __bo_vnode = 0xfffff80049c2c000, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff80049c2c188}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 10, v_usecount = 6, v_iflag = 512, v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} (kgdb) I also try to get information from the execve of the other treads: for tid 101250: (kgdb) fr 10 #10 0xffffffff80508ccc in sys_execve (td=0xfffff800b6429000, uap=0xfffffe010184fb80) at /usr/src/sys/kern/kern_exec.c:218 218 error = kern_execve(td, &args, NULL); (kgdb) print *uap $4 = {fname_l_ = 0xfffffe010184fb80 "`\220\217\002\b", fname = 0x8028f9060
, fname_r_ = 0xfffffe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010184fb88 "`¶ÿÿÿ\177", argv = 0x7fffffffb660, argv_r_ = 0xfffffe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010184fb90 "\bÜÿÿÿ\177", envv = 0x7fffffffdc08, envv_r_ = 0xfffffe010184fb98 ""} (kgdb) for tid 101243: (kgdb) f 15 #15 0xffffffff80508ccc in sys_execve (td=0xfffff800b642b500, uap=0xfffffe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 218 error = kern_execve(td, &args, NULL); (kgdb) print *uap $5 = {fname_l_ = 0xfffffe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0
, fname_r_ = 0xfffffe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010182cb88 "`¶ÿÿÿ\177", argv = 0x7fffffffb660, argv_r_ = 0xfffffe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010182cb90 "\bÜÿÿÿ\177", envv = 0x7fffffffdc08, envv_r_ = 0xfffffe010182cb98 ""} (kgdb) Henri