Date: Mon, 02 Oct 2006 22:54:48 -0700 From: Rich Wales <richw@richw.org> To: freebsd-hardware@freebsd.org Subject: Re: SATA-hdd or SATA-controller trouble. Message-ID: <20061003055448.B3E2B3C36B@whodunit.richw.org>
next in thread | raw e-mail | index | archive | help
"Anton" wrote: >> Aug 21 18:46:27 nrr kernel: ad4: TIMEOUT - READ_DMA retrying (2 retries left) LBA=344654303 >> Aug 21 18:46:32 nrr kernel: ad4: FAILURE - ATA_IDENTIFY timed out >> Aug 21 18:46:37 nrr kernel: ad4: FAILURE - ATA_IDENTIFY timed out >> Aug 21 18:46:37 nrr kernel: ad4: WARNING - removed from configuration >> Aug 21 18:46:37 nrr kernel: ata2-master: FAILURE - READ_DMA timed out "Veronica" replied: > I have had similar messages when my ATA cable was damaged. So I suggest > replacing your cable. I've been seeing similar problems to Anton, with brand-new SATA cables that are definitely not damaged. (Note that Anton was talking about a SATA disk, with a completely different kind of data cable from old ATA drives.) Veronica continued: > Also you might want to check the temperature of the disk using the > "smartmontools" utility from freebsd-ports. Harddrives should always > be kept very cool < 40 degrees if possible. A higher risk of data loss > and/or lower lifespan could be the result of a higher temperature. > Smartmontools can also run self-tests (short or long ones) to check for > problems with your drive. Although it's possible that Anton could be having hardware problems due to overheating or other drive flakiness, there have been lots of reports of timeout problems with SATA drives on Promise controllers under heavy I/O load, from many people, for quite some time now, and I would be surprised if they were all due to overheating. I'm currently running a "dd if=/dev/adXXX of=/dev/null bs=64k conv=noerror" command on each of my two Seagate 300GB SATA drives simultaneously (with "adXXX" replaced by the real drive device name in each case). I've got the case open, with a large external fan blowing air onto the drives. Running "smartctl -a /dev/adXXX" on each drive shows the temperature in each drive to be around 35C. Earlier self-tests on both drives finished successfully. Nevertheless, I'm seeing a bunch of timeout problems reported on both drives. Something is messed up -- maybe in the Promise controller, maybe in the FreeBSD driver, or (I'll admit for the sake of completeness) maybe in the drives or elsewhere in the system. And as I said, lots of people on the net have reported this problem, but no one (so far) has confessed to having a clue as to what is causing it or how to fix it. I'm running 6.1-RELEASE-p9 on an old 800-MHz Athlon (original "Slot A" CPU type), in a DTK VAM-0070 motherboard. I've seen other people, though, report this problem with much newer hardware. Rich Wales Palo Alto, CA, USA richw@richw.org http://www.richw.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061003055448.B3E2B3C36B>