Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Jun 2016 06:44:27 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        David Adam <zanchey@ucc.gu.uwa.edu.au>
Cc:        freebsd-fs@freebsd.org, avg@freebsd.org
Subject:   Re: Processes wedging on ZFS accesses
Message-ID:  <20160617044427.GA6575@dft-labs.eu>
In-Reply-To: <alpine.DEB.2.11.1606162047020.26473@motsugo.ucc.gu.uwa.edu.au>
References:  <alpine.DEB.2.11.1606162047020.26473@motsugo.ucc.gu.uwa.edu.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 17, 2016 at 11:53:19AM +0800, David Adam wrote:
> Hi all,
> 
> We're still having trouble with our 10.3-RELEASE-p3 fileserver using a ZFS 
> pool.
> 
> After a certain amount of uptime (usually a week or so), a Samba process 
> will get stuck in D-state:
> 
> max  2075  0.0  0.2 339928 26616  -  D    26May16      0:19.59 
> /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
> 
> Running find(1) over the hierarchy that the smbd process has open will 
> also wedge in a D-state.
> 
> Our backups also seem to get stuck, presumably in the same spot.
> 
> `procstat -k` on the stuck processes (smbd and our stuck python-based 
> backup program) shows:
>   PID    TID COMM             TDNAME           KSTACK
>  2075 100587 smbd             -                mi_switch+0xe1 
> sleepq_wait+0x3a _sx_slock_hard+0x31b namei+0x1c5 vn_open_cred+0x24d 
> zfs_getextattr+0x1f2 VOP_GETEXTATTR_APV+0xa7 extattr_get_vp+0x15d 
> sys_extattr_get_file+0xf4 amd64_syscall+0x40f Xfast_syscall+0xfb
>  2075 100623 smbd             -                mi_switch+0xe1 
> sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0xca0 vop_stdlock+0x3c 
> VOP_LOCK1_APV+0xab _vn_lock+0x43 knlist_remove_kq+0x24 filt_vfsdetach+0x22 
> knote_fdclose+0xef closefp+0x42 amd64_syscall+0x40f Xfast_syscall+0xfb
> 21676 101572 python2.7        -                mi_switch+0xe1 

These 2 threads likely deadlocked each other:
the first one has the vnode locked and tries to take the filedesc lock
for reading
the second one has the filedesc taken for writing and tries to lock the
vnode

The filedesc lock is going to be split which will get rid of this
particular instance of the problem.

However, I think the real bug is the fact that zfs_getextattr calls
vn_open_cred with the vnode locked, but I don't know if we can simply
unlock it and not revalidate anything (likely yes).

Cc'ing some zfs people for comments.

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160617044427.GA6575>