Date: Tue, 16 Jul 2013 22:32:41 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Adrian Chadd <adrian@FreeBSD.org>, freebsd-fs@FreeBSD.org Cc: freebsd-current <freebsd-current@FreeBSD.org> Subject: Re: Deadlock in nullfs/zfs somewhere Message-ID: <51E59FD9.4020103@FreeBSD.org> In-Reply-To: <CAJ-VmokctCmV4%2By17uvqO9wXEyh0s%2BaXZ9nggvoAgP5%2BZHSgFA@mail.gmail.com> References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com> <51DCFEDA.1090901@FreeBSD.org> <CAJ-VmokctCmV4%2By17uvqO9wXEyh0s%2BaXZ9nggvoAgP5%2BZHSgFA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
on 10/07/2013 19:50 Adrian Chadd said the following: > On 9 July 2013 23:27, Andriy Gapon <avg@freebsd.org> wrote: >> on 09/07/2013 16:03 Adrian Chadd said the following: >>> Does anyone have any ideas as to what's going on? >> >> Please provide output of 'thread apply all bt' from kgdb, then perhaps someone >> might be able to tell. > > Done - http://people.freebsd.org/~adrian/ath/20130710-vm0-zfs-hang.txt vmcore.0 was useless for some reason - an interesting address was not accessible. vmcore.1 seems to be very similar and is actually useful. This problem looks like an interesting deadlock involving ZFS and VFS and vnode shortage. The most obvious things are that many threads could not allocate a new vnode and are waiting in getnewvnode_reserve and also many threads are stuck waiting on vnode locks held by the former threads. In effect, they all wait for vnlru, which in turn is stuck in zfs_freebsd_reclaim on z_teardown_lock. That lock is held by a thread doing a rollback ioctl. And that thread waits for zfs sync thread to actually perform the rollback. The sync thread waits on zfs quiesce thread to declare the current transaction group as quiesced. The quiesce thread, obviously, waits for all operations running in the current transaction group to complete. Some of those operations are e.g. VOP_CREATE -> zfs_create. They already started a zfs transaction (as a part of the current transaction group) and they execute zfs_mknode which needs a new vnode. So these threads are waiting for a new vnode and do not let the current transaction group become quiesced. GOTO beginning. Compressing the above description to the extreme, it boils down to: ZFS needs a new vnode from vnlru and is waiting on it, while vnlru has to wait on ZFS. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51E59FD9.4020103>