Date: Thu, 10 Nov 2016 20:41:55 +0100 From: Henri Hennebert <hlh@restart.be> To: Andriy Gapon <avg@FreeBSD.org>, freebsd-stable@FreeBSD.org Cc: Konstantin Belousov <kib@FreeBSD.org> Subject: Re: Freebsd 11.0 RELEASE - ZFS deadlock Message-ID: <1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be> In-Reply-To: <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org> References: <0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> <aaf2df40-b0df-2141-9ed8-5b947d8d5a33@FreeBSD.org> <43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org> <a14d508d-351f-71f4-c7cc-ac73dbcde357@restart.be> <9d1f9a76-5a8d-6eca-9a50-907d55099847@FreeBSD.org> <6bc95dce-31e1-3013-bfe3-7c2dd80f9d1e@restart.be> <e4878992-a362-3f12-e743-8efa1347cabf@FreeBSD.org> <23a66749-f138-1f1a-afae-c775f906ff37@restart.be> <8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeBSD.org> <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be> <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/10/2016 19:40, Andriy Gapon wrote: > On 10/11/2016 19:55, Henri Hennebert wrote: >> >> >> On 11/10/2016 18:33, Andriy Gapon wrote: >>> On 10/11/2016 18:12, Henri Hennebert wrote: >>>> On 11/10/2016 16:54, Andriy Gapon wrote: >>>>> On 10/11/2016 17:20, Henri Hennebert wrote: >>>>>> On 11/10/2016 15:00, Andriy Gapon wrote: >>>>>>> Interesting. I can not spot any suspicious thread that would hold the vnode >>>>>>> lock. Could you please run kgdb (just like that, no arguments), then execute >>>>>>> 'bt' command and then select a frame when _vn_lock is called with 'fr N' >>>>>>> command. Then please 'print *vp' and share the result. >>>>>>> >>>>>> I Think I miss something in your request: >>>>> >>>>> Oh, sorry! The very first step should be 'tid 101112' to switch to the correct >>>>> context. >>>>> >>>> >>>> (kgdb) fr 7 >>>> #7 0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=2121728, >>> >>> "value optimized out" - not good >>> >>>> file=<value optimized out>, >>>> line=<value optimized out>) at vnode_if.h:859 >>>> 859 vnode_if.h: No such file or directory. >>>> in vnode_if.h >>>> (kgdb) print *vp >>> >>> I am not sure if this output is valid, because of the message above. >>> Could you please try to navigate to nearby frames and see if vp itself has a >>> valid value there. If you can find such a frame please do *vp there. >>> >> >> Does this seems better? > > Yes! > >> (kgdb) fr 8 >> #8 0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=2121728, >> td=0xfffff80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 >> 2523 if ((error = vn_lock(vp, flags)) != 0) { >> (kgdb) print *vp >> $1 = {v_tag = 0xffffffff813be535 "zfs", v_op = 0xffffffff813d0f70, v_data = >> 0xfffff80049c1f420, v_mount = 0xfffff800093aa660, >> v_nmntvnodes = {tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049c2bb30}, >> v_un = {vu_mount = 0x0, vu_socket = 0x0, >> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = >> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { >> tqh_first = 0xfffff800bfc8e3f0, tqh_last = 0xfffff800bfc8e410}, v_cache_dd = >> 0x0, v_lock = {lock_object = { >> lo_name = 0xffffffff813be535 "zfs", lo_flags = 117112832, lo_data = 0, >> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, >> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = >> 0xffffffff8099e9e0 "vnode interlock", lo_flags = 16973824, >> lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = >> 0xfffff80049c2c068, v_actfreelist = { >> tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049ae9bd0}, v_bufobj = >> {bo_lock = {lock_object = { >> lo_name = 0xffffffff8099e9f0 "bufobj interlock", lo_flags = 86179840, >> lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, >> bo_ops = 0xffffffff80c4bf70, bo_object = 0xfffff800b62e9c60, bo_synclist = >> {le_next = 0x0, le_prev = 0x0}, >> bo_private = 0xfffff80049c2c000, __bo_vnode = 0xfffff80049c2c000, bo_clean = >> {bv_hd = {tqh_first = 0x0, >> tqh_last = 0xfffff80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, >> bo_dirty = {bv_hd = {tqh_first = 0x0, >> tqh_last = 0xfffff80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, >> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, >> v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = >> {tqh_first = 0x0, tqh_last = 0xfffff80049c2c188}, >> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, >> v_holdcnt = 9, v_usecount = 6, v_iflag = 512, >> v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} >> (kgdb) > > flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT > lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS | > LK_SHARE > > So, here's what we have here: this thread tries to get a shared lock on the > vnode, the vnode is already locked in shared mode, but there is an exclusive > waiter (or, perhaps, multiple waiters). So, this thread can not get the lock > because of the exclusive waiter. And I do not see an easy way to identify that > waiter. > > In the procstat output that you provided earlier there was no other thread in > vn_lock. Hmm, I see this: > procstat: sysctl: kern.proc.kstack: 14789: Device busy > procstat: sysctl: kern.proc.kstack: 82034: Device busy > > Could you please check what those two processes are (if they are still running)? > Perhaps try procstat for each of the pids several times. > This 2 processes are the 2 instances of the innd daemon (news server) which seems in accordance with the directory /usr/local/news/bin. [root@avoriaz ~]# procstat 14789 PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 14789 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat 82034 PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 82034 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# Same result for 82034. By the way the news server seems OK: [root@avoriaz ~]# ctlinnd mode Server running Allowing remote connections Parameters c 14 i 50 (0) l 5000000 o 1010 t 300 H 2 T 60 X 0 normal specified Not reserved Readers follow enabled Perl filtering enabled [root@avoriaz ~]# I can access the news server with thunderbird. I can restart the news server if you ask Henri
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1cb09c54-5f0e-2259-a41a-fefe76b4fe8b>