From owner-freebsd-fs@FreeBSD.ORG Wed Nov 7 11:57:36 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CAE5416A41B for ; Wed, 7 Nov 2007 11:57:36 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id D571713C4B7 for ; Wed, 7 Nov 2007 11:57:34 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id E237D45F59; Wed, 7 Nov 2007 12:57:22 +0100 (CET) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 05FBF45684; Wed, 7 Nov 2007 12:57:18 +0100 (CET) Date: Wed, 7 Nov 2007 12:57:11 +0100 From: Pawel Jakub Dawidek To: Peter Schuller Message-ID: <20071107115711.GM15618@garage.freebsd.pl> References: <20071022153521.GB27594@hyperion.scode.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jozmn01XJZjDjM3N" Content-Disposition: inline In-Reply-To: <20071022153521.GB27594@hyperion.scode.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: reproducable inability to accesss a pool (process hangs; other pools fine) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2007 11:57:36 -0000 --jozmn01XJZjDjM3N Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Oct 22, 2007 at 05:35:21PM +0200, Peter Schuller wrote: > Hello, >=20 > On the same system I recently posted about on -stable, with RELENG_7 > from a few days ago, I am now running a SiL 3114 on a raidz2 in > degraded mode with one disk missing (it is degraded by design because > I wanted to create a 5 disk array but only had 4). >=20 > For the purpose of discovering any stability issues with the 3114 > controller I did some stress tests that have yet to reveil controller > problems, but has triggered what appears to be a ZFS problem. >=20 > Test case: >=20 > /promraid - root of the pool in question > /promraid/ports - copy of /usr/ports tree from my machine > /promraid/1 - empty directory > /promraid/2 - empty directory >=20 > I now run concurrently in two shells: >=20 > while [ 1 ] ; do rsync -a /promraid/ports /promraid/1/pp ; rm -rf /promra= id/1/pp ; done >=20 > and: >=20 > while [ 1 ] ; do rsync -a /promraid/ports /promraid/2/pp ; rm -rf /promra= id/2/pp ; done >=20 > This runs fine for some hours, but eventually I end up with hung > rsyncs in "zfs" state according to op. Attempting to e.g. ls /promraid > hangs as well. Yet ZFS continues working (another pool is entirely > fine), and there are no errors in dmesg. >=20 > iostat -x does NOT indicate that it is perpetually waiting on I/O from > a disk or something likethat (0% utilization). The processes are > unkillable, even by SIGKILL. >=20 > I should have this environment for a few more days, so can hopefully > reproduce this again. It has happened at least twice already (the > first time I was in X and X hung; I thought I had a panic so re-ran > the tests in the console; these two times I didn't get a panic but I > am unsure whether the failure case is different). >=20 > Does anyone have suggestions for what to do to produce the best > information possible? Given that there are no errors, no panic, etc. >=20 > One obvious bit is to ktrace them I realize, if that gives me anything > (the size of the trace if I were to trace it from the beginning would, > I suspect, be prohibitive). Will do that next time. I've found a deadlock recently. Can you enter DDB, find spa_zio_intr_X threads, run 'tr ' on theirs PIDs and send me the output? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --jozmn01XJZjDjM3N Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHMagXForvXbEpPzQRAl1kAJ9qLT7H8mmJdsrgwKDV3HmCZ3CEbwCgprXQ tAha09rlzRo4K9UtUjyxeYI= =+8sB -----END PGP SIGNATURE----- --jozmn01XJZjDjM3N--