Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Nov 2007 12:57:11 +0100
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Peter Schuller <peter.schuller@infidyne.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: reproducable inability to accesss a pool (process hangs; other pools fine)
Message-ID:  <20071107115711.GM15618@garage.freebsd.pl>
In-Reply-To: <20071022153521.GB27594@hyperion.scode.org>
References:  <20071022153521.GB27594@hyperion.scode.org>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Mon, Oct 22, 2007 at 05:35:21PM +0200, Peter Schuller wrote:
> Hello,
> 
> On the same system I recently posted about on -stable, with RELENG_7
> from a few days ago, I am now running a SiL 3114 on a raidz2 in
> degraded mode with one disk missing (it is degraded by design because
> I wanted to create a 5 disk array but only had 4).
> 
> For the purpose of discovering any stability issues with the 3114
> controller I did some stress tests that have yet to reveil controller
> problems, but has triggered what appears to be a ZFS problem.
> 
> Test case:
> 
> /promraid       - root of the pool in question
> /promraid/ports - copy of /usr/ports tree from my machine
> /promraid/1     - empty directory
> /promraid/2     - empty directory
> 
> I now run concurrently in two shells:
> 
> while [ 1 ] ; do rsync -a /promraid/ports /promraid/1/pp ; rm -rf /promraid/1/pp ; done
> 
> and:
> 
> while [ 1 ] ; do rsync -a /promraid/ports /promraid/2/pp ; rm -rf /promraid/2/pp ; done
> 
> This runs fine for some hours, but eventually I end up with hung
> rsyncs in "zfs" state according to op. Attempting to e.g. ls /promraid
> hangs as well. Yet ZFS continues working (another pool is entirely
> fine), and there are no errors in dmesg.
> 
> iostat -x does NOT indicate that it is perpetually waiting on I/O from
> a disk or something likethat (0% utilization). The processes are
> unkillable, even by SIGKILL.
> 
> I should have this environment for a few more days, so can hopefully
> reproduce this again. It has happened at least twice already (the
> first time I was in X and X hung; I thought I had a panic so re-ran
> the tests in the console; these two times I didn't get a panic but I
> am unsure whether the failure case is different).
> 
> Does anyone have suggestions for what to do to produce the best
> information possible? Given that there are no errors, no panic, etc.
> 
> One obvious bit is to ktrace them I realize, if that gives me anything
> (the size of the trace if I were to trace it from the beginning would,
> I suspect, be prohibitive). Will do that next time.

I've found a deadlock recently. Can you enter DDB, find spa_zio_intr_X
threads, run 'tr <pid>' on theirs PIDs and send me the output?

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHMagXForvXbEpPzQRAl1kAJ9qLT7H8mmJdsrgwKDV3HmCZ3CEbwCgprXQ
tAha09rlzRo4K9UtUjyxeYI=
=+8sB
-----END PGP SIGNATURE-----
help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071107115711.GM15618>