Date: Tue, 16 Sep 2008 13:15:30 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Mike Tancsa <mike@sentex.net> Cc: stable@freebsd.org, Clint Olsen <clint.olsen@gmail.com> Subject: Re: Help debugging DMA_READ errors Message-ID: <20080916201530.GA72912@icarus.home.lan> In-Reply-To: <200809161934.m8GJY9oe039218@lava.sentex.ca> References: <20080916170452.GB4861@0lsen.net> <20080916175858.GA70396@icarus.home.lan> <20080916181903.GC7540@0lsen.net> <20080916185401.GA71275@icarus.home.lan> <200809161934.m8GJY9oe039218@lava.sentex.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 16, 2008 at 03:34:07PM -0400, Mike Tancsa wrote: > At 02:54 PM 9/16/2008, Jeremy Chadwick wrote: > >> However, there's no sign of DMA errors in the SMART log. I'm not sure >> what to make of that; I really would expect there to be some. > > Would not bad cables (or trays) be consistent with symptoms like that ? > i.e. the OS sees errors, but when we ask the drive, it says, "what > errors". I am sure there are other things that could cause this, but in > the past I would start with the cables and or trays. My official answer is: "I'm not sure". :-) Anything is possible. I'd expect carrier/tray problems to manifest themselves as constant data corruption, or disks falling off the bus (loose signal cable or losing power). I'd expect "detach" messages for the SATA channels. But remember, ICH5 lacks AHCI, and I don't know if the FreeBSD ata(4) driver would report detach/attach in that case. I guess a disk falling off the bus or disappearing could in fact lock the controller up in this scenario, I'd imagine. I'd expect cable problems to show constant data errors or loss, and regular DMA errors. FreeBSD would be quite chatty about this, I assume. He just started getting these, and they're only "every couple days". I'd also expect the attribute counters to be much higher -- a bad cable would eventually get noticed by both the controller and the disk, maybe just not consistently. ZFS could help with detecting this (checksum errors), but that's a different beast. I have doubts about the cables being bad because he's seeing issues on a SATA disk and a PATA disk. It seems very unlikely that separate SATA and PATA cables would go bad within a day or two of one another. Another possibility is that the firmware on his drives lack UDMA error logging in SMART. I've seen some drives do this (increase the attribute but not stick anything in the SMART log), but they were old Maxtors. UDMA CRCs were sky-high (to the point where the general drive health was FAIL, REPLACE NOW), but nothing in the SMART log. The acd0 thing bothers me the most, I think -- not because of the oddity, but because it tried to read the TOC of a disc that wasn't even there. A specific ATAPI command induces that, if I remember right. All that said: there is absolutely no harm in replacing the cables! By doing so you can rule those out. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080916201530.GA72912>