Date: Fri, 17 Jun 2016 06:44:27 +0200 From: Mateusz Guzik <mjguzik@gmail.com> To: David Adam <zanchey@ucc.gu.uwa.edu.au> Cc: freebsd-fs@freebsd.org, avg@freebsd.org Subject: Re: Processes wedging on ZFS accesses Message-ID: <20160617044427.GA6575@dft-labs.eu> In-Reply-To: <alpine.DEB.2.11.1606162047020.26473@motsugo.ucc.gu.uwa.edu.au> References: <alpine.DEB.2.11.1606162047020.26473@motsugo.ucc.gu.uwa.edu.au>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 17, 2016 at 11:53:19AM +0800, David Adam wrote: > Hi all, > > We're still having trouble with our 10.3-RELEASE-p3 fileserver using a ZFS > pool. > > After a certain amount of uptime (usually a week or so), a Samba process > will get stuck in D-state: > > max 2075 0.0 0.2 339928 26616 - D 26May16 0:19.59 > /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf > > Running find(1) over the hierarchy that the smbd process has open will > also wedge in a D-state. > > Our backups also seem to get stuck, presumably in the same spot. > > `procstat -k` on the stuck processes (smbd and our stuck python-based > backup program) shows: > PID TID COMM TDNAME KSTACK > 2075 100587 smbd - mi_switch+0xe1 > sleepq_wait+0x3a _sx_slock_hard+0x31b namei+0x1c5 vn_open_cred+0x24d > zfs_getextattr+0x1f2 VOP_GETEXTATTR_APV+0xa7 extattr_get_vp+0x15d > sys_extattr_get_file+0xf4 amd64_syscall+0x40f Xfast_syscall+0xfb > 2075 100623 smbd - mi_switch+0xe1 > sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0xca0 vop_stdlock+0x3c > VOP_LOCK1_APV+0xab _vn_lock+0x43 knlist_remove_kq+0x24 filt_vfsdetach+0x22 > knote_fdclose+0xef closefp+0x42 amd64_syscall+0x40f Xfast_syscall+0xfb > 21676 101572 python2.7 - mi_switch+0xe1 These 2 threads likely deadlocked each other: the first one has the vnode locked and tries to take the filedesc lock for reading the second one has the filedesc taken for writing and tries to lock the vnode The filedesc lock is going to be split which will get rid of this particular instance of the problem. However, I think the real bug is the fact that zfs_getextattr calls vn_open_cred with the vnode locked, but I don't know if we can simply unlock it and not revalidate anything (likely yes). Cc'ing some zfs people for comments. -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160617044427.GA6575>