Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Feb 2007 01:13:31 -0600 (CST)
From:      "Richard Lynch" <ceo@l-i-e.com>
To:        "Chuck Swiger" <cswiger@mac.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: READ_DMA48 error interpretation
Message-ID:  <2195.67.184.122.32.1170659611.squirrel@www.l-i-e.com>
In-Reply-To: <3E64E786-E7A9-4914-BF29-DE89F25597E3@mac.com>
References:  <1398.216.230.84.67.1168982036.squirrel@www.l-i-e.com> <3E64E786-E7A9-4914-BF29-DE89F25597E3@mac.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote:
> On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:
>> I know the messages below mean the hard drive or IDE cards are
>> having
>> problems.  But is this like RED ALERT or more like YELLOW or what?
...
>> +ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
>> +ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR>
>> error=10<NID_NOT_FOUND>
>> LBA=404955007
>> +g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5

> If you have current backups, it's a yellow alert.  Otherwise...
>
>> And what do I do about it?
>>
>> umount and fsck everything a lot?
>> swap cards/drives around until it stops?
>> Ignore it and pray?
>
> Try installing the sysutils/smartmontools port and run a drive self-
> test.  That will give you a much better assessment of the state of
> the drive and whether it is likely to completely fail in the next 24
> hours...

I ran the short test on the problem drives, and it said everything was
fine.

I'll try the long test at a later date.

Meanwhile, I turned on the smartd daemon, and am seeing two issues in
the logs...

#1. The drive temperatures seem ridiculously high to this naive
reader, but what do I know?...
110 to 190 Celcius?  Yikes...  Or maybe that's normal?
How hot is too hot?

#2. Sequences like this show up a fair amount:
Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
from 152 to 153
Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
from 153 to 152
Device: /dev/ad0, SMART Prefailure Attribute: 8 Seek_Time_Performance
changed from 251 to 250

So is the real "problem" just that the drives are spun down and can't
spin up fast enough? I can probably live with the consequences of
that, and just go on with life -- The occasional HTTP request for an
audio file will fail the first time, and they have to hit reload.

This box is the fail-safe roll-over server for audio files that are
all up online somewhere else managed by a professional (not me), so
it's no surprise that the rare time-out on the real server also ends
up with a drive spin up and failed request on the "backup".  Kind of
annoying, I guess, to an end user, but forcing the drives to always be
spinning is probably not a Good Idea.

Oh, here's a rather long excerpt of the log in case there's minutae
within it that I've failed to include:
http://l-i-e.com/smartd.log

Any help in interpreting these results is most appreciated!

THANKS!!!

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2195.67.184.122.32.1170659611.squirrel>