From owner-freebsd-stable@FreeBSD.ORG Thu Mar 11 07:45:38 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A985E1065672; Thu, 11 Mar 2010 07:45:38 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 367388FC27; Thu, 11 Mar 2010 07:45:37 +0000 (UTC) Received: from outgoing.leidinger.net (pD9E2DD69.dip.t-dialin.net [217.226.221.105]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D27708444D8; Thu, 11 Mar 2010 08:45:30 +0100 (CET) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 8333F5402; Thu, 11 Mar 2010 08:45:27 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1268293527; bh=CULMdTKHhWLorRPNyx8Tmm4zgUkFlilXiK9S8maKbvc=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=0OUAAca2VtQKrzprenZ3TcjifLVYWsF4i1eiuT9Qb58aMZhnOuMECIbIHmbQa7hJq LSInvSodmUywg0PwEslXAl2Or2DrDyjqoiUfThzj8r84Fc20e2xLTM2jmYEwRIw4W/ uWh/mzAmZdh+Wm2+RyX8/WgWlgeBiroKCxzmi8VOUENYtjfkF4Fo6YYk7fBhf3bVf/ FZVYsyvSqYbPR0qctCaXkwlEhbGBUBb4bMStUFIYsptzo0Uve4w4bUVTkHKrW7ZDEM 2PR7l+I43IINwwq96knryZSVaTs7WwmAFJVmMowKDep4dJa41JQdhRNQLiZZ6W6oFN oPfSNibwSKwcA== Received: (from www@localhost) by webmail.leidinger.net (8.14.3/8.13.8/Submit) id o2B7jRka056645; Thu, 11 Mar 2010 08:45:27 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 11 Mar 2010 08:45:27 +0100 Message-ID: <20100311084527.2934034895hvgxaw@webmail.leidinger.net> Date: Thu, 11 Mar 2010 08:45:27 +0100 From: Alexander Leidinger To: Pawel Jakub Dawidek References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> <20100309122954.GE3155@garage.freebsd.pl> <20100309125815.GF3155@garage.freebsd.pl> <20100310110202.GA1715@garage.freebsd.pl> <20100310173143.GD1715@garage.freebsd.pl> In-Reply-To: <20100310173143.GD1715@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: D27708444D8.58B03 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.286, required 6, autolearn=disabled, ALL_TRUSTED -1.44, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, TW_DV 0.08, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1268898333.40525@YCCAFIHqqWQfp0/PGkOy+Q X-EBL-Spam-Status: No Cc: freebsd-fs@FreeBSD.org, Stable , FreeBSD, Borja Marcos Subject: Re: Many processes stuck in zfs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Mar 2010 07:45:38 -0000 Quoting Pawel Jakub Dawidek (from Wed, 10 Mar 2010 18:31:43 +0100): > On Wed, Mar 10, 2010 at 04:12:36PM +0100, Borja Marcos wrote: >> >> On Mar 10, 2010, at 12:02 PM, Pawel Jakub Dawidek wrote: >> >> > Once the deadlock occur, enter DDB and send me the output of: >> > >> > ps >> > show alllocks >> > show lockedvnods >> > show allchains >> > alltrace >> >> (Again, crossposted to -fs, ZFS related) >> >> >> Previous one was a panic when performing the test with several tar >> jobs running in parallel. >> >> Now this is a capture of the deadlock itself, instead of a panic. >> (I called panic from the debugger to generate a dump) > [...] > > Hmm, interesting. Especially those two traces: > > Tracing command zfs pid 1820 tid 100105 td 0xffffff0002ca4000 > [...] > _cv_wait() at _cv_wait+0x17a > txg_wait_synced() at txg_wait_synced+0x98 > zfsvfs_teardown() at zfsvfs_teardown+0x1f6 > zfs_suspend_fs() at zfs_suspend_fs+0x2b > zfs_ioc_recv() at zfs_ioc_recv+0x28b > zfsdev_ioctl() at zfsdev_ioctl+0x8d > devfs_ioctl_f() at devfs_ioctl_f+0x76 > kern_ioctl() at kern_ioctl+0xc5 > ioctl() at ioctl+0xfd > [...] > > Tracing command bsdtar pid 1699 tid 100093 td 0xffffff000262dae0 > [...] > _sx_slock_hard() at _sx_slock_hard+0x1b7 > _sx_slock() at _sx_slock+0xc1 > zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x63 > VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xb5 > vgonel() at vgonel+0x119 > vnlru_free() at vnlru_free+0x345 > getnewvnode() at getnewvnode+0x24f > zfs_znode_cache_constructor() at zfs_znode_cache_constructor+0x43 > zfs_znode_alloc() at zfs_znode_alloc+0x38 > zfs_mknode() at zfs_mknode+0x259 > zfs_freebsd_create() at zfs_freebsd_create+0x661 > VOP_CREATE_APV() at VOP_CREATE_APV+0xb3 > vn_open_cred() at vn_open_cred+0x473 > kern_openat() at kern_openat+0x179 > [...] > > This should be impossible. If we are that deep in zfsvfs_teardown(), it means > that we hold the z_teardown_lock exclusively. And we do as 'show alllocks' > output confirms. But if we are holding this lock exclusively we shouldn't be > that deep in create code path, because we need hold this lock as reader. > It isn't visible in 'show alllocks' output, because this lock is special > (rrwlock.c). > > I see three possibilities: > 1. We are looking at different file systems here. But where is deadlock > coming from then? > 2. There is a bug in rrwlock.c. Highly unlikely I think. > 3. My thinking is incorrect somewhere. There is a 4th possibility, if you can rule out everything else: bugs in the CPU. I stumbled upon this with ZFS (but UFS was exposing the problem much faster). The problem in my case was that the BIOS was not recognizing the CPU and as such was not uploading microcode updates. Borja, can you confirm that the CPU is correctly announced in FreeBSD (just look at "dmesg | grep CPU:" output, if it tells you it is a AMD or Intel XXX CPU it is correctly detected by the BIOS)? Bye, Alexander. -- Kissing a fish is like smoking a bicycle. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137