From owner-freebsd-current@FreeBSD.ORG Tue Jul 16 19:33:40 2013 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 97BF6C55; Tue, 16 Jul 2013 19:33:40 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 79B0ACF2; Tue, 16 Jul 2013 19:33:39 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA15503; Tue, 16 Jul 2013 22:33:37 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UzB0D-000Lxa-CR; Tue, 16 Jul 2013 22:33:37 +0300 Message-ID: <51E59FD9.4020103@FreeBSD.org> Date: Tue, 16 Jul 2013 22:32:41 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130708 Thunderbird/17.0.7 MIME-Version: 1.0 To: Adrian Chadd , freebsd-fs@FreeBSD.org Subject: Re: Deadlock in nullfs/zfs somewhere References: <51DCFEDA.1090901@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jul 2013 19:33:40 -0000 on 10/07/2013 19:50 Adrian Chadd said the following: > On 9 July 2013 23:27, Andriy Gapon wrote: >> on 09/07/2013 16:03 Adrian Chadd said the following: >>> Does anyone have any ideas as to what's going on? >> >> Please provide output of 'thread apply all bt' from kgdb, then perhaps someone >> might be able to tell. > > Done - http://people.freebsd.org/~adrian/ath/20130710-vm0-zfs-hang.txt vmcore.0 was useless for some reason - an interesting address was not accessible. vmcore.1 seems to be very similar and is actually useful. This problem looks like an interesting deadlock involving ZFS and VFS and vnode shortage. The most obvious things are that many threads could not allocate a new vnode and are waiting in getnewvnode_reserve and also many threads are stuck waiting on vnode locks held by the former threads. In effect, they all wait for vnlru, which in turn is stuck in zfs_freebsd_reclaim on z_teardown_lock. That lock is held by a thread doing a rollback ioctl. And that thread waits for zfs sync thread to actually perform the rollback. The sync thread waits on zfs quiesce thread to declare the current transaction group as quiesced. The quiesce thread, obviously, waits for all operations running in the current transaction group to complete. Some of those operations are e.g. VOP_CREATE -> zfs_create. They already started a zfs transaction (as a part of the current transaction group) and they execute zfs_mknode which needs a new vnode. So these threads are waiting for a new vnode and do not let the current transaction group become quiesced. GOTO beginning. Compressing the above description to the extreme, it boils down to: ZFS needs a new vnode from vnlru and is waiting on it, while vnlru has to wait on ZFS. -- Andriy Gapon