Date: Mon, 23 Jul 2007 21:42:08 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Bill Swingle <unfurl@dub.net> Cc: freebsd-stable@freebsd.org Subject: Re: problems with Hitachi 1TB SATA drives Message-ID: <20070724044208.GA79101@eos.sc1.parodius.com> In-Reply-To: <46A56695.1000001@dub.net> References: <46A54B6F.9010100@dub.net> <200707241128.19418.doconnor@gsoft.com.au> <46A56695.1000001@dub.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 23, 2007 at 07:40:21PM -0700, Bill Swingle wrote: > Doh, I knew I forgot something in my original email. > Here's the full dmesg: http://dub.net/rum.dub.net.dmesg Actually you did include this in your original Email. I think Daniel overlooked it. :-) After looking at your dmesg and your claim, I got confused because your initial statement included the use of a 3ware card. A verbose description of your configuration: * ad0: 43979MB <IBM DTLA-307045 TX6OA50C> at ata0-master UDMA100 -- hooked to: atapci0: <Intel ICH5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 * ad4: 953869MB <Hitachi HDS721010KLA330 GKAOA70F> at ata2-master SATA150 * ad6: 953869MB <Hitachi HDS721010KLA330 GKAOA70F> at ata3-master SATA150 -- both hooked to: atapci1: <Intel ICH5 SATA150 controller> port 0xec00-0xec07,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc0f irq 18 at device 31.2 on pci0 ata2: <ATA channel 0> on atapci1 ata3: <ATA channel 1> on atapci1 * twed0: <Unit 0, RAID5, Normal> on twe0 twed0: 583440MB (1194885120 sectors) -- hoooked to: twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0xb800-0xb80f mem 0xfeaffc00-0xfeaffc0f,0xfe000000-0xfe7fffff irq 17 at device 2.0 on pci3 twe0: [GIANT-LOCKED] twe0: 4 ports, Firmware FE7X 1.05.00.063, BIOS BE7X 1.08.00.048 I have to assume that atapci0 is actually using IRQ 14 even though it's not shown (weird...). Additionally your ICH5 SATA controller is sharing an IRQ with a couple other devices on the PCI bus; this isn't bad, but I'm noting it here in case this turns out to be some weird interrupt problem: em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0xac00-0xac1f mem 0xfd9e0000-0xfd9fffff irq 18 at device 1.0 on pci2 uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xd400-0xd41f irq 18 at device 29.2 on pci0 On to this: > Jul 21 00:21:45 rum kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54194911 > Jul 21 00:22:20 rum kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=107260543 > Jul 21 00:22:57 rum kernel: ad4: FAILURE - device detached > Jul 21 00:22:57 rum kernel: subdisk4: detached > Jul 21 00:22:57 rum kernel: ad4: detached > Jul 21 00:24:19 rum kernel: ad6: FAILURE - device detached > Jul 21 00:24:19 rum kernel: subdisk6: detached > Jul 21 00:24:19 rum kernel: ad6: detached > > ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=1456106111 > ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=1456106111 > ad4: FAILURE - WRITE_DMA48 timed out LBA=1456106111 > ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54194911 > ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=461407775 > ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=461407775 > ad4: FAILURE - WRITE_DMA48 timed out LBA=461407775 But then: > When trying to newfs them both eventually failed with DMA READ or > WRITE timeouts. Now I'm confused. :-) I only see evidence of a failure on ad4. The ad6 disk disconnecting from the bus could be caused by the controller getting wedged while waiting for certain transactions sent to ad4 (which are failing). I've seen this scenario happen many times. The panic you got is probably also induced by the same issue. Does the WRITE_DMA/DMA48 problem happen for you when newfs'ing a slice on ad6? > I've read that bad SATA cables could cause this, the cables I'm using > are brand new but are probably pretty cheap. For testing purposes swap them out with some other cables. It may not be the cables at all, so keep the originals around. Also might try using some of that canned air to blow out any dust around the SATA connector ends on the cables, drives, and motherboard. Remaining questions I have: Q: Is your ICH5 controller actually ICH5R and you've turned on some Intel RAID option in the BIOS? Maybe turning it on but leaving the disks in a JBOD fashion (not defining an array)? The reason I ask is that you said you're going to use the Hitachi drives as "a pair of 1TB synchronised drives", which implies RAID-1, yet I don't see use of gmirror or ccd or anything else. :-) Q: What motherboard and model is this? Looks like an Intel. Q: If an Intel, have you gone looking at Intel's site for BIOS updates for that board? Intel is the one company who is thorough about documenting BIOS changes in their Release Notes. It would not surprise me if this turned out to be some kind of weird BIOS bug. Q: Some motherboards let you toggle certain "compatibility" mode stuff for the SATA controller in the BIOS. You might want to flip that to see what happens (if compatibility, try the opposite. And vice-versa of course). Q: Have you searched Google for issues others have reported (such as in Linux) with the HDS721010KLA330 or similar (differently-sized) models? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070724044208.GA79101>