Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 08 Feb 2007 06:47:50 -0800
From:      Garrett Cooper <youshi10@u.washington.edu>
To:        freebsd-questions@freebsd.org
Subject:   Re: READ_DMA48 error interpretation
Message-ID:  <45CB3816.5050301@u.washington.edu>
In-Reply-To: <Pine.BSF.3.96.1070208215816.20114A-100000@gaia.nimnet.asn.au>
References:  <Pine.BSF.3.96.1070208215816.20114A-100000@gaia.nimnet.asn.au>

next in thread | previous in thread | raw e-mail | index | archive | help
Ian Smith wrote:
> On Wed, 7 Feb 2007, Richard Lynch wrote:
>  > [I've tried to snip away a lot of stuff, without losing any context...]
> 
> I'll prune a bit too, but will backtrack to earlier context, so thanks.
> 
>  > On Tue, February 6, 2007 2:50 am, Ian Smith wrote:
>  > > On Mon, 5 Feb 2007 01:13:31 -0600 (CST) Richard Lynch <ceo@l-i-e.com>
>  > > wrote:
>  > >  > On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote:
>  > >  > > On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:
>  > >  > ...
>  > >  > >> +ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
>  > >  > >> +ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR>
>  > >  > >> error=10<NID_NOT_FOUND>
>  > >  > >> LBA=404955007
>  > >  > >> +g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5
> 
> Looks like a not ready error maybe.  The only value in your ad1.txt that
> looks like it's ever been anywhere near any error threshold is ID# 11,
> Calibration_Retry_Count, and its current value is fine.  Power glitch?
> 
> Are you getting any other hard looking errors in /var/log/messages?  Is
> fsck happy?  It never hurts to run 'fsck -n' whenever you feel the urge.
> 
>  > >  > > Try installing the sysutils/smartmontools port and run a drive
>  > > self-
>  > 
>  > >  > I ran the short test on the problem drives, and it said everything
>  > > was
>  > >  > fine.
>  > >  >
>  > >  > I'll try the long test at a later date.
> 
> Only your ad3.txt referred to below shows a (short) test having been
> completed and logged.  You might check the smartctl -a results after
> running at least short tests initially (looks like the long ones will
> take 4-5 hours for your 4 drives) as Chuck has since suggested.
> 
>  > >  > #2. Sequences like this show up a fair amount:
>  > >  > Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time
>  > > changed
>  > >  > from 152 to 153
>  > >  > Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time
>  > > changed
>  > >  > from 153 to 152
>  > >  > Device: /dev/ad0, SMART Prefailure Attribute: 8
>  > > Seek_Time_Performance
>  > >  > changed from 251 to 250
> 
> I'm not sure of the degree of logging you're having smartd use here, but
> these small changes of value, especially up and down by 1 but a long way
> from any error threshold, seem to be excessive and relatively trivial
> perhaps debug-level detail?, ie most likely nothing of any concern.
> 
> I suggest reading man smartctl under '-A, --attributes' and then you'll
> know as much as I do about what these may mean, and maybe worry less ..
> 
>  > Here are all the smartctl -a outputs:
>  > 
>  > http://l-i-e.com/ad0.txt
>  > http://l-i-e.com/ad1.txt
>  > http://l-i-e.com/ad2.txt
>  > http://l-i-e.com/ad3.txt
>  > 
>  > ad3 is giving the most errors...
>  > ad1 gives a fair amount though
> 
> Do you mean according to that fine-detail attribute changes logging?  Or
> real read/write/seek etc errors being logged to messages?
> 
>  > And the ad0 and ad2 seem to be giving the spinup errors.
> 
> None of those reports seem to indicate any problems really, though if
> anyone else cares to peek and notices any anomalies, I'm all eyes.
> 
> As for temperatures, the readings for all 4 drives seem very cool, but
> then it is winter over there .. Temperature Celcius for ad0 to ad3 being
> 36, 27, 22 and 18 degrees C, each present and worst value well clear of
> error thresholds .. did you interprete those values as temperatures?
> 
>  > ad0 is pretty much full
>  > ad1 is the one I'm filling up currently
>  > ad2 and ad3 have no actual content on them yet, but will "soon"
>  > 
>  > All the drives are kind of in an old PC tower (XT? AT???), except the
>  > outer casing is, errr, not there...  Just the framework.
> 
> Might be worth checking that your power supply is up to handling 4 big
> drives, but they weren't running more than mildly warm when reported.
> 
>  > ad2 and ad3 are in one of these Thermaltake iCage things:
>  > http://www.performance-pcs.com/catalog/index.php?main_page=product_info&cPath=257&products_id=3533
>  > which converts the old-school floppy drive[s] bay into an IDE bay, and
>  > puts a big honking fan blowing on them.
> 
> These too were running nice and cool, 22 and 18C, when reported.  Cf my
> 40GB laptop drive (at smartctl version 5.36 [i386-portbld-freebsd5.5],
> rather more recent than your 5.33 freebsd6.0) this afternoon:
> 
>  194 Temperature_Celsius  0x0022  100  100  000   Old_age  Always  -  40 (Lifetime Min/Max 13/49)
> 
>  > I'm not claiming it's "good enough" but I tried.
>  > 
>  > I left the iCage "bay" between them empty for airflow/cooling.
>  > 
>  > ad0 and ad1 are in the usual IDE bay of a tower.
>  > I have a fan in there, but without the cover to shape the airflow,
>  > perhaps that is not doing much useful...
> 
> Perhaps it wasn't properly warmed up when you ran those reports, but on
> the data you've provided you don't have any sort of temperature problem. 
> 
>  > I can touch the exposed front and back top (above IDE cable) and lay
>  > my finger along it.  It's "hot" but not like, "ouch hot" :-)
> 
> Over 70C or so is too hot to touch except momentarily.  You're cool :)
> 
>  > I don't think it's 100C+ hot, as that's boiling -- but perhaps the
>  > thermometer is somewhere inside or...
>  > 
>  > Seems more likely, though, that that number is Fahrenheit (sp?) and
>  > not Celcius..

Depends on the drive' speed, but until your drives get up to around 115 
degrees F I wouldn't be too concerned. For an enclosed area that should 
be fine--besides some of the heat's probably transferring from one disk 
to the other using either the case or the iCage thing (metal likes to 
transfer heat), so that's to be expected.

As long as your drive isn't around 130 degrees F you should be ok. This 
happened before when I didn't have a fan setup in one of my towers next 
to a 10krpm SCSI disk; it was spinning down all the time to avoid 
overheating, and eventually the machine rebooted itself because it 
reached a heat threshold in the case =\.

-Garrett



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45CB3816.5050301>