From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 17:31:53 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FB8E1065674; Wed, 10 Mar 2010 17:31:53 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id B3B198FC27; Wed, 10 Mar 2010 17:31:51 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 614F045CDD; Wed, 10 Mar 2010 18:31:49 +0100 (CET) Received: from localhost (pdawidek.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 7FAEE45C9B; Wed, 10 Mar 2010 18:31:43 +0100 (CET) Date: Wed, 10 Mar 2010 18:31:43 +0100 From: Pawel Jakub Dawidek To: Borja Marcos Message-ID: <20100310173143.GD1715@garage.freebsd.pl> References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> <20100309122954.GE3155@garage.freebsd.pl> <20100309125815.GF3155@garage.freebsd.pl> <20100310110202.GA1715@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="BI5RvnYi6R4T2M87" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=4.5 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org, FreeBSD Stable Subject: Re: Many processes stuck in zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2010 17:31:53 -0000 --BI5RvnYi6R4T2M87 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 10, 2010 at 04:12:36PM +0100, Borja Marcos wrote: > =09 > On Mar 10, 2010, at 12:02 PM, Pawel Jakub Dawidek wrote: >=20 > > Once the deadlock occur, enter DDB and send me the output of: > >=20 > > ps > > show alllocks > > show lockedvnods > > show allchains > > alltrace >=20 > (Again, crossposted to -fs, ZFS related) >=20 >=20 > Previous one was a panic when performing the test with several tar jobs r= unning in parallel. >=20 > Now this is a capture of the deadlock itself, instead of a panic. (I call= ed panic from the debugger to generate a dump) [...] Hmm, interesting. Especially those two traces: Tracing command zfs pid 1820 tid 100105 td 0xffffff0002ca4000 [...] _cv_wait() at _cv_wait+0x17a txg_wait_synced() at txg_wait_synced+0x98 zfsvfs_teardown() at zfsvfs_teardown+0x1f6 zfs_suspend_fs() at zfs_suspend_fs+0x2b zfs_ioc_recv() at zfs_ioc_recv+0x28b zfsdev_ioctl() at zfsdev_ioctl+0x8d devfs_ioctl_f() at devfs_ioctl_f+0x76 kern_ioctl() at kern_ioctl+0xc5 ioctl() at ioctl+0xfd [...] Tracing command bsdtar pid 1699 tid 100093 td 0xffffff000262dae0 [...] _sx_slock_hard() at _sx_slock_hard+0x1b7 _sx_slock() at _sx_slock+0xc1=20 zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x63 VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xb5 vgonel() at vgonel+0x119 vnlru_free() at vnlru_free+0x345 getnewvnode() at getnewvnode+0x24f zfs_znode_cache_constructor() at zfs_znode_cache_constructor+0x43 zfs_znode_alloc() at zfs_znode_alloc+0x38 zfs_mknode() at zfs_mknode+0x259 zfs_freebsd_create() at zfs_freebsd_create+0x661 VOP_CREATE_APV() at VOP_CREATE_APV+0xb3 vn_open_cred() at vn_open_cred+0x473 kern_openat() at kern_openat+0x179 [...] This should be impossible. If we are that deep in zfsvfs_teardown(), it mea= ns that we hold the z_teardown_lock exclusively. And we do as 'show alllocks' output confirms. But if we are holding this lock exclusively we shouldn't be that deep in create code path, because we need hold this lock as reader. It isn't visible in 'show alllocks' output, because this lock is special (rrwlock.c). I see three possibilities: 1. We are looking at different file systems here. But where is deadlock coming from then? 2. There is a bug in rrwlock.c. Highly unlikely I think. 3. My thinking is incorrect somewhere. Let me do some more thinking and I'll get back to you (possibly with a patch that will help us to find right possibility). --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --BI5RvnYi6R4T2M87 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuX134ACgkQForvXbEpPzRsuACgzsjOtg3CjoVm65QoYNmS6GKg LasAoN0poZ4eavwo2Pl/LCiRUCGb67Vm =LFmy -----END PGP SIGNATURE----- --BI5RvnYi6R4T2M87--