From owner-freebsd-current@FreeBSD.ORG Thu Aug 30 23:07:32 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E707F16A41B for ; Thu, 30 Aug 2007 23:07:32 +0000 (UTC) (envelope-from prvs=176275eba5=killing@multiplay.co.uk) Received: from multiplay.co.uk (core6.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 54FD413C46A for ; Thu, 30 Aug 2007 23:07:31 +0000 (UTC) (envelope-from prvs=176275eba5=killing@multiplay.co.uk) X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on core6.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-14.7 required=6.0 tests=BAYES_00, USER_IN_WHITELIST, USER_IN_WHITELIST_TO autolearn=ham version=3.1.8 Received: from r2d2 ([212.135.219.182]) by multiplay.co.uk (multiplay.co.uk [85.236.96.23]) (MDaemon PRO v9.6.0) with ESMTP id md50004157642.msg for ; Thu, 30 Aug 2007 23:49:27 +0100 Message-ID: <03b401c7eb57$ee714030$b6db87d4@multiplay.co.uk> From: "Steven Hartland" To: "Mark Powell" , References: <20070830183305.X60345@rust.salford.ac.uk> Date: Thu, 30 Aug 2007 23:48:49 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3138 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 X-MDRemoteIP: 212.135.219.182 X-Return-Path: prvs=176275eba5=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-current@freebsd.org X-Spam-Processed: multiplay.co.uk, Thu, 30 Aug 2007 23:49:27 +0100 X-MDAV-Processed: multiplay.co.uk, Thu, 30 Aug 2007 23:49:27 +0100 Cc: Subject: Re: Another ZFS kernel panic on same block on every drive in raidz X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Aug 2007 23:07:33 -0000 That sounds very much like an overflow error on the controller / drive. We had a very similar issue with the Highpoint 1820a drivers which turned out to be compatibility issue with the drive firmware and the controller. The controller was using standard LBA to access the drive up until the point where 48-bit LBA was required. This caused issues with, in this case Seagate drives, which would report an error when using this method after a specific point. The fix was for the controller to always use 48-bit addressing for the drives which supported it. Hope this helps. Regards Steve ----- Original Message ----- From: "Mark Powell" To: Sent: Thursday, August 30, 2007 6:47 PM Subject: Another ZFS kernel panic on same block on every drive in raidz > Hi, > I am testing a 3 drive raidz1 array which has been built with 3 new WD > 500GB SATA drives /dev/ad1[468], bought from 2 different sources. > I am being told that a dma error is occuring on the same block on all 3 > drives at the same time: > > Aug 30 18:13:15 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:15 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:15 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435340 > Aug 30 18:13:46 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435340 > Aug 30 18:13:46 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435340 > Aug 30 18:13:46 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435340 > Aug 30 18:13:46 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435340 > Aug 30 18:13:46 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435340 > Aug 30 18:13:46 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:46 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435340 > Aug 30 18:13:25 echo root: ZFS: vdev I/O failure, zpool=pool path=/dev/ad14s2 offset=132076011520 size=65536 error=5 > Aug 30 18:13:25 echo root: ZFS: vdev I/O failure, zpool=pool path=/dev/ad16s2 offset=132076011520 size=65536 error=5 > Aug 30 18:13:25 echo root: ZFS: vdev I/O failure, zpool=pool path=/dev/ad18s2 offset=132076011520 size=65536 error=5 > Aug 30 18:13:41 echo root: ZFS: vdev I/O failure, zpool=pool path=/dev/ad18s2 offset=132076011520 size=65536 error=5 > Aug 30 18:13:41 echo root: ZFS: vdev I/O failure, zpool=pool path=/dev/ad14s2 offset=132076011520 size=65536 error=5 > Aug 30 18:13:41 echo root: ZFS: vdev I/O failure, zpool=pool path=/dev/ad16s2 offset=132076011520 size=65536 error=5 > Aug 30 18:13:41 echo root: ZFS: vdev I/O failure, zpool=pool path= offset=396215451648 size=131072 error=5 > > And then the kernel panics: > > panic: ZFS: I/O failure (write on off 0: zio 0xffffff0013b0d000 > [L0 ZFS plain file] 20000L/20000P DVA[0]=<5:5c40480000:30000> fletcher2 > uncompressed LE contiguous birth=20167 fill=1 cksum=cfcfcfcfcfcfce00:cfcfcfcfcfcfce00:8a8a8a8a8a56e700:8a8a8a8a8a56e > cpuid = 0 > > I think I saw someone else have a similar problem to this. There were > told their hardware was probably flakey on to look for errors with geli. > Just performing a scrub now to see what happens. > Let me know if you need any further info. > Cheers. > > -- > Mark Powell - UNIX System Administrator - The University of Salford > Information Services Division, Clifford Whitworth Building, > Salford University, Manchester, M5 4WT, UK. > Tel: +44 161 295 4837 Fax: +44 161 295 5888 www.pgp.com for PGP key > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.