From owner-freebsd-hackers Sun Sep 23 9:43:34 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from hokkshideh.jetcafe.org (hokkshideh.jetcafe.org [205.147.43.4]) by hub.freebsd.org (Postfix) with ESMTP id 7738237B438 for ; Sun, 23 Sep 2001 09:43:26 -0700 (PDT) Received: from hokkshideh.jetcafe.org (localhost [127.0.0.1]) by hokkshideh.jetcafe.org (8.8.8/8.8.5) with ESMTP id JAA09454 for ; Sun, 23 Sep 2001 09:43:25 -0700 (PDT) Message-Id: <200109231643.JAA09454@hokkshideh.jetcafe.org> X-Mailer: exmh version 2.2 06/23/2000 with version: MH 6.8.4 #1[UCI] To: freebsd-hackers@freebsd.org Subject: Problems with many ATA drives Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 23 Sep 2001 09:43:25 -0700 From: Dave Hayes Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG We've been attempting to set up a vinum raid box with a bunch of IDE drives. Each drive is partitioned with a vinum partition on A, such that the entire drive is on partition a. Initial partitioning is done with /stand/sysinstall so it "fixes" my geometry, this has always worked in the past. I had been getting "funny" stuff from the drives, so I devised the following simple test: # dd if=/dev/rad1a of=/dev/null This eventually produces: ad1: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad1a: hard error reading fsbn 5068879 (ad1 bn 5068879; cn 315 tn 133 sn 25)ad1a: hard error reading fsbn 5068879 (ad1 bn 5068879; cn 315 tn 133 sn 25) status=59 error=40 I notice 3 out of 11 drives produce this error, so far one on each controller (ruling out a specific controller issue). I didn't want to just assume the failure rate of 80GB IDE drives is that large, so I'm asking this list for it's opinion: a) Is this a bug or consequence of software drivers? (see bug kern/17592) b) Or is it just that IDE drives are cheap and fail this much? Relevant data from dmesg: atapci0: port 0xb000-0xb00f,0xb400-0xb403,0xb800-0x b807,0xd000-0xd003,0xd400-0xd407 mem 0xf5800000-0xf5803fff irq 6 at device 10.0 on pci2 ata2: at 0xd400 on atapci0 ata3: at 0xb800 on atapci0 atapci1: port 0x9400-0x940f,0x9800-0x9803,0xa000-0x a007,0xa400-0xa403,0xa800-0xa807 mem 0xf5000000-0xf5003fff irq 9 at device 11.0 on pci2 ata4: at 0xa800 on atapci1 ata5: at 0xa000 on atapci1 ... atapci2: port 0x8800-0x880f at device 31.1 on pci0 ata0: at 0x1f0 irq 14 on atapci2 ata1: at 0x170 irq 15 on atapci2 ... ad0: 78167MB [158816/16/63] at ata0-master UDMA100 ad1: 78167MB [158816/16/63] at ata0-slave UDMA100 ad2: 78167MB [158816/16/63] at ata1-master UDMA100 ad3: 78167MB [158816/16/63] at ata1-slave UDMA100 ad4: 78167MB [158816/16/63] at ata2-master WDMA2 ad5: 78167MB [158816/16/63] at ata2-slave WDMA2 ad6: 78167MB [158816/16/63] at ata3-master WDMA2 ad7: 78167MB [158816/16/63] at ata3-slave WDMA2 ad8: 78167MB [158816/16/63] at ata4-master WDMA2 ad9: 78167MB [158816/16/63] at ata4-slave WDMA2 Yes, we know that the "WDMA2" is happening, this state proved to be independant of a drive failing. It has to do with 10 drives in a tower and cable lengths... =( ------ Dave Hayes - Consultant - Altadena CA, USA - dave@jetcafe.org >>> The opinions expressed above are entirely my own <<< There is no distinctly native American criminal class except Congress. -- Mark Twain To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message