From owner-freebsd-current@FreeBSD.ORG Thu Aug 30 18:13:48 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EDC316A419 for ; Thu, 30 Aug 2007 18:13:48 +0000 (UTC) (envelope-from M.S.Powell@salford.ac.uk) Received: from akis.salford.ac.uk (akis.salford.ac.uk [146.87.0.14]) by mx1.freebsd.org (Postfix) with SMTP id BD39A13C442 for ; Thu, 30 Aug 2007 18:13:47 +0000 (UTC) (envelope-from M.S.Powell@salford.ac.uk) Received: (qmail 5657 invoked by uid 98); 30 Aug 2007 19:13:28 +0100 Received: from 146.87.255.121 by akis.salford.ac.uk (envelope-from , uid 401) with qmail-scanner-2.01 (clamdscan: 0.90/3843. spamassassin: 3.1.8. Clear:RC:1(146.87.255.121):. Processed in 0.046144 secs); 30 Aug 2007 18:13:28 -0000 Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121) by akis.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP; Thu, 30 Aug 2007 19:13:27 +0100 Received: (qmail 61170 invoked by uid 1002); 30 Aug 2007 18:13:25 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 30 Aug 2007 18:13:25 -0000 Date: Thu, 30 Aug 2007 19:13:25 +0100 (BST) From: "Mark Powell" To: Mark Powell In-Reply-To: <20070830183305.X60345@rust.salford.ac.uk> Message-ID: <20070830190328.B60345@rust.salford.ac.uk> References: <20070830183305.X60345@rust.salford.ac.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org Subject: Re: Another ZFS kernel panic on same block on every drive in raidz X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Aug 2007 18:13:48 -0000 On Thu, 30 Aug 2007, Mark Powell wrote: > I am being told that a dma error is occuring on the same block on all 3 > drives at the same time: > > Just performing a scrub now to see what happens. The scrub performed fine. The panic is occuring under heavyish use; with 3 simultaneous rsync from an XP box over samba. Just recalled that it paniced earlier, but I was in X and couldn't see the message. Surprisingly it did log something: Aug 30 17:27:48 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435298 Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435298 Aug 30 17:28:29 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435426 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435426 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435425 Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435426 Here the blocks are different and 4 blocks overall are reported as having problems. In hex they all start FFFFFxx ? They are (including the one from the previous report): 268435297 fffff61 268435298 fffff62 268435340 fffff8c 268435425 fffffe1 268435426 fffffe2 Coincidence? This is on amd64 with all drives connected to the ICH9 ports on a Gigabyte Intel P35 based MB. Current is from 25/8/7. Cheers. -- Mark Powell - UNIX System Administrator - The University of Salford Information Services Division, Clifford Whitworth Building, Salford University, Manchester, M5 4WT, UK. Tel: +44 161 295 4837 Fax: +44 161 295 5888 www.pgp.com for PGP key