Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Nov 2016 20:40:48 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Henri Hennebert <hlh@restart.be>, freebsd-stable@FreeBSD.org
Cc:        Konstantin Belousov <kib@FreeBSD.org>
Subject:   Re: Freebsd 11.0 RELEASE - ZFS deadlock
Message-ID:  <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org>
In-Reply-To: <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be>
References:  <0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> <aaf2df40-b0df-2141-9ed8-5b947d8d5a33@FreeBSD.org> <43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org> <a14d508d-351f-71f4-c7cc-ac73dbcde357@restart.be> <9d1f9a76-5a8d-6eca-9a50-907d55099847@FreeBSD.org> <6bc95dce-31e1-3013-bfe3-7c2dd80f9d1e@restart.be> <e4878992-a362-3f12-e743-8efa1347cabf@FreeBSD.org> <23a66749-f138-1f1a-afae-c775f906ff37@restart.be> <8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeBSD.org> <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/11/2016 19:55, Henri Hennebert wrote:
> 
> 
> On 11/10/2016 18:33, Andriy Gapon wrote:
>> On 10/11/2016 18:12, Henri Hennebert wrote:
>>> On 11/10/2016 16:54, Andriy Gapon wrote:
>>>> On 10/11/2016 17:20, Henri Hennebert wrote:
>>>>> On 11/10/2016 15:00, Andriy Gapon wrote:
>>>>>> Interesting.  I can not spot any suspicious thread that would hold the vnode
>>>>>> lock.  Could you please run kgdb (just like that, no arguments), then execute
>>>>>> 'bt' command and then select a frame when _vn_lock is called with 'fr N'
>>>>>> command.  Then please 'print *vp' and share the result.
>>>>>>
>>>>> I Think I miss something in your request:
>>>>
>>>> Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
>>>> context.
>>>>
>>>
>>> (kgdb) fr 7
>>> #7  0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=2121728,
>>
>> "value optimized out" - not good
>>
>>> file=<value optimized out>,
>>>     line=<value optimized out>) at vnode_if.h:859
>>> 859    vnode_if.h: No such file or directory.
>>>     in vnode_if.h
>>> (kgdb) print *vp
>>
>> I am not sure if this output is valid, because of the message above.
>> Could you please try to navigate to nearby frames and see if vp itself has a
>> valid value there.  If you can find such a frame please do *vp  there.
>>
> 
> Does this seems better?

Yes!

> (kgdb) fr 8
> #8  0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=2121728,
> td=0xfffff80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
> 2523        if ((error = vn_lock(vp, flags)) != 0) {
> (kgdb) print *vp
> $1 = {v_tag = 0xffffffff813be535 "zfs", v_op = 0xffffffff813d0f70, v_data =
> 0xfffff80049c1f420, v_mount = 0xfffff800093aa660,
>   v_nmntvnodes = {tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049c2bb30},
> v_un = {vu_mount = 0x0, vu_socket = 0x0,
>     vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev =
> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
>     tqh_first = 0xfffff800bfc8e3f0, tqh_last = 0xfffff800bfc8e410}, v_cache_dd =
> 0x0, v_lock = {lock_object = {
>       lo_name = 0xffffffff813be535 "zfs", lo_flags = 117112832, lo_data = 0,
> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
>     lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =
> 0xffffffff8099e9e0 "vnode interlock", lo_flags = 16973824,
>       lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock =
> 0xfffff80049c2c068, v_actfreelist = {
>     tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049ae9bd0}, v_bufobj =
> {bo_lock = {lock_object = {
>         lo_name = 0xffffffff8099e9f0 "bufobj interlock", lo_flags = 86179840,
> lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
>     bo_ops = 0xffffffff80c4bf70, bo_object = 0xfffff800b62e9c60, bo_synclist =
> {le_next = 0x0, le_prev = 0x0},
>     bo_private = 0xfffff80049c2c000, __bo_vnode = 0xfffff80049c2c000, bo_clean =
> {bv_hd = {tqh_first = 0x0,
>         tqh_last = 0xfffff80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0},
> bo_dirty = {bv_hd = {tqh_first = 0x0,
>         tqh_last = 0xfffff80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0},
> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
>   v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =
> {tqh_first = 0x0, tqh_last = 0xfffff80049c2c188},
>     rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
> v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
>   v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
> (kgdb)

flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS |
LK_SHARE

So, here's what we have here: this thread tries to get a shared lock on the
vnode, the vnode is already locked in shared mode, but there is an exclusive
waiter (or, perhaps, multiple waiters).  So, this thread can not get the lock
because of the exclusive waiter.  And I do not see an easy way to identify that
waiter.

In the procstat output that you provided earlier there was no other thread in
vn_lock.  Hmm, I see this:
procstat: sysctl: kern.proc.kstack: 14789: Device busy
procstat: sysctl: kern.proc.kstack: 82034: Device busy

Could you please check what those two processes are (if they are still running)?
Perhaps try procstat for each of the pids several times.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?67f2e8bd-bff0-f808-7557-7dabe5cad78c>