Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Oct 2006 22:54:48 -0700
From:      Rich Wales <richw@richw.org>
To:        freebsd-hardware@freebsd.org
Subject:   Re: SATA-hdd or SATA-controller trouble.
Message-ID:  <20061003055448.B3E2B3C36B@whodunit.richw.org>

next in thread | raw e-mail | index | archive | help
"Anton" wrote:

>> Aug 21 18:46:27 nrr kernel: ad4: TIMEOUT - READ_DMA retrying (2 retries left) LBA=344654303
>> Aug 21 18:46:32 nrr kernel: ad4: FAILURE - ATA_IDENTIFY timed out
>> Aug 21 18:46:37 nrr kernel: ad4: FAILURE - ATA_IDENTIFY timed out
>> Aug 21 18:46:37 nrr kernel: ad4: WARNING - removed from configuration
>> Aug 21 18:46:37 nrr kernel: ata2-master: FAILURE - READ_DMA timed out

"Veronica" replied:

> I have had similar messages when my ATA cable was damaged.  So I suggest
> replacing your cable.

I've been seeing similar problems to Anton, with brand-new SATA cables that
are definitely not damaged.  (Note that Anton was talking about a SATA disk,
with a completely different kind of data cable from old ATA drives.)

Veronica continued:

> Also you might want to check the temperature of the disk using the
> "smartmontools" utility from freebsd-ports.  Harddrives should always
> be kept very cool < 40 degrees if possible.  A higher risk of data loss
> and/or lower lifespan could be the result of a higher temperature.
> Smartmontools can also run self-tests (short or long ones) to check for
> problems with your drive.

Although it's possible that Anton could be having hardware problems due to
overheating or other drive flakiness, there have been lots of reports of
timeout problems with SATA drives on Promise controllers under heavy I/O
load, from many people, for quite some time now, and I would be surprised
if they were all due to overheating.

I'm currently running a "dd if=/dev/adXXX of=/dev/null bs=64k conv=noerror"
command on each of my two Seagate 300GB SATA drives simultaneously (with
"adXXX" replaced by the real drive device name in each case).  I've got the
case open, with a large external fan blowing air onto the drives.  Running
"smartctl -a /dev/adXXX" on each drive shows the temperature in each drive
to be around 35C.  Earlier self-tests on both drives finished successfully.
Nevertheless, I'm seeing a bunch of timeout problems reported on both drives.

Something is messed up -- maybe in the Promise controller, maybe in the
FreeBSD driver, or (I'll admit for the sake of completeness) maybe in the
drives or elsewhere in the system.  And as I said, lots of people on the
net have reported this problem, but no one (so far) has confessed to having
a clue as to what is causing it or how to fix it.

I'm running 6.1-RELEASE-p9 on an old 800-MHz Athlon (original "Slot A" CPU
type), in a DTK VAM-0070 motherboard.  I've seen other people, though,
report this problem with much newer hardware.

Rich Wales
Palo Alto, CA, USA
richw@richw.org
http://www.richw.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061003055448.B3E2B3C36B>