Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Sep 2001 23:12:54 +0200 (CEST)
From:      Søren Schmidt <sos@freebsd.dk>
To:        Dave Hayes <dave@jetcafe.org>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Problems with many ATA drives
Message-ID:  <200109232112.f8NLCsE42136@freebsd.dk>
In-Reply-To: <200109231643.JAA09454@hokkshideh.jetcafe.org> "from Dave Hayes at Sep 23, 2001 09:43:25 am"

next in thread | previous in thread | raw e-mail | index | archive | help
It seems Dave Hayes wrote:
> 
> ad1: READ command timeout tag=0 serv=0 - resetting
> ata0: resetting devices .. done
> ad1a: hard error reading fsbn 5068879 (ad1 bn 5068879; cn 315 tn 133 sn 
> 25)ad1a: hard error reading fsbn 5068879 (ad1 bn 5068879; cn 315 tn 133 sn 25) 
> status=59 error=40
> 
> I notice 3 out of 11 drives produce this error, so far one on each
> controller (ruling out a specific controller issue). I didn't want to
> just assume the failure rate of 80GB IDE drives is that large, so
> I'm asking this list for it's opinion:
> 
> a) Is this a bug or consequence of software drivers? (see
> bug kern/17592)
> 
> b) Or is it just that IDE drives are cheap and fail this much?
> 
> Relevant data from dmesg:
> 
> atapci0: <Promise ATA100 controller> port 0xb000-0xb00f,0xb400-0xb403,0xb800-0x
> b807,0xd000-0xd003,0xd400-0xd407 mem 0xf5800000-0xf5803fff irq 6 at device 
> 10.0 on pci2
> ata2: at 0xd400 on atapci0
> ata3: at 0xb800 on atapci0
> atapci1: <Promise ATA100 controller> port 0x9400-0x940f,0x9800-0x9803,0xa000-0x
> a007,0xa400-0xa403,0xa800-0xa807 mem 0xf5000000-0xf5003fff irq 9 at device 
> 11.0 on pci2
> ata4: at 0xa800 on atapci1
> ata5: at 0xa000 on atapci1
> ...
> atapci2: <Intel ICH2 ATA100 controller> port 0x8800-0x880f at device 31.1 on 
> pci0
> ata0: at 0x1f0 irq 14 on atapci2
> ata1: at 0x170 irq 15 on atapci2
> ...
> ad0: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata0-master UDMA100
> ad1: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata0-slave UDMA100
> ad2: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata1-master UDMA100
> ad3: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata1-slave UDMA100
> ad4: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata2-master WDMA2
> ad5: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata2-slave WDMA2
> ad6: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata3-master WDMA2
> ad7: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata3-slave WDMA2
> ad8: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata4-master WDMA2
> ad9: 78167MB <Maxtor 4W080H6> [158816/16/63] at ata4-slave WDMA2
> 
> Yes, we know that the "WDMA2" is happening, this state proved to be
> independant of a drive failing. It has to do with 10 drives in a tower 
> and cable lengths... =(

Hmm, first of the error above looks very much to be a genuine media
error on the disks, are the bad spot always the same or random ?
Anyhow does the 3 bad ones produce the error regardless of what 
controller they are put on? I assume that its always the same 3
drives that are failing right ?

Oh, and you should take cable length seriously, remember you only
get ICRC errors (which the ATA driver retries) on UDMA33 and above,
at WDMA2 speed there is *NO* CRC check at all (the HW doesn't
support that), so you wont know when your data has been currupted :)
So thinking that you solved the problem by going to WDMA2 mode is
extremly dangerous, you are just hiding the problem as data
corruption will very likely still happen when you use off-spec
cableing.

-Søren

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200109232112.f8NLCsE42136>