Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Jul 2012 14:30:03 -0500
From:      dweimer <dweimer@dweimer.net>
To:        <freebsd-questions@freebsd.org>
Subject:   Re: Disk Errors
Message-ID:  <e7e1dc818414243c7df4f1c7aca5e834@dweimer.net>
In-Reply-To: <20120724180421.GF38393@dan.emsphone.com>
References:  <d65bfc394e4b31d92bb6ab9e8d220d16@dweimer.net> <20120724180421.GF38393@dan.emsphone.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2012-07-24 13:04, Dan Nelson wrote:
> In the last episode (Jul 24), dweimer said:
>> I have three 1TB disks I use for backup, two of them are Western 
>> Digital
>> drives I bought specifically for this purpose.  One is a Seagate 
>> drive
>> that came out of a barebones PC that I replaced with a couple 
>> smaller
>> drives in a stripe to gain performance.  I use the drives in an 
>> external
>> SATA dock, using geom eli encryption, the western digital drives 
>> give me
>> no problems, but the seagate drive gives me a lot of the following 
>> errors
>> under load.
>>
>> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=817755328
>> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) 
>> LBA=837397120
>> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=879786112
>> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) 
>> LBA=882931200
>> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) 
>> LBA=890542016
>> ad4: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) 
>> LBA=902767296
>> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=904071296
>
> If you install the sysutils/smartmontools port, you can run "smartctl 
> -x
> /dev/ad4" to dump the drive's SMART attribute table and error logs.  
> Those
> should give you an indication of whether the drive is going bad.  If 
> the
> drive is logging those write errors in its internal log, then you 
> know it's
> not a cabling issue.  If it's not logging errors, I suppose you might 
> have a
> loose SATA plug on the drive itself, which would explain why the 
> problem
> follows the drive around.
>

Running a long test on the drive now, doesn't seem to show anything 
that sticks out at me as failing right now.

smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.0-RELEASE-p3 amd64] (local 
build)
Copyright (C) 2002-12 by Bruce Allen, 
http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST31000528AS
Serial Number:    5VP7ST1C
LU WWN Device Id: 5 000c50 02f7a3bb4
Firmware Version: CC46
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Jul 24 14:29:08 2012 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     208 (intermediate), recommended: 208
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection 
activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 248)	Self-test routine in 
progress...
					80% of test remaining.
Total time to complete Offline
data collection: 		(  600) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before 
entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 173) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x103f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
   1 Raw_Read_Error_Rate     POSR--   117   099   006    -    145191418
   3 Spin_Up_Time            PO----   095   095   000    -    0
   4 Start_Stop_Count        -O--CK   100   100   020    -    114
   5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
   7 Seek_Error_Rate         POSR--   078   060   030    -    77590473
   9 Power_On_Hours          -O--CK   090   090   000    -    9156
  10 Spin_Retry_Count        PO--C-   100   100   097    -    0
  12 Power_Cycle_Count       -O--CK   100   100   020    -    46
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   098   000    -    
21475164202
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   062   052   045    -    38 
(Min/Max 35/38)
194 Temperature_Celsius     -O---K   038   048   000    -    38 (0 23 0 
0 0)
195 Hardware_ECC_Recovered  -O-RC-   025   023   000    -    145191418
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    833
240 Head_Flying_Hours       ------   100   253   000    -    
96417720837162
241 Total_LBAs_Written      ------   100   253   000    -    1480696469
242 Total_LBAs_Read         ------   100   253   000    -    922627427
                             ||||||_ K auto-keep
                             |||||__ C event count
                             ||||___ R error rate
                             |||____ S speed/performance
                             ||_____ O updated online
                             |______ P prefailure warning

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA 
commands not supported
Read GP Log Directory failed.

SMART Log Directory Version 1 [multi-sector log support]
SMART Log at address 0x00 has    1 sectors [Log Directory]
SMART Log at address 0x01 has    1 sectors [Summary SMART error log]
SMART Log at address 0x02 has    5 sectors [Comprehensive SMART error 
log]
SMART Log at address 0x06 has    1 sectors [SMART self-test log]
SMART Log at address 0x09 has    1 sectors [Selective self-test log]
SMART Log at address 0x80 has   16 sectors [Host vendor specific log]
SMART Log at address 0x81 has   16 sectors [Host vendor specific log]
SMART Log at address 0x82 has   16 sectors [Host vendor specific log]
SMART Log at address 0x83 has   16 sectors [Host vendor specific log]
SMART Log at address 0x84 has   16 sectors [Host vendor specific log]
SMART Log at address 0x85 has   16 sectors [Host vendor specific log]
SMART Log at address 0x86 has   16 sectors [Host vendor specific log]
SMART Log at address 0x87 has   16 sectors [Host vendor specific log]
SMART Log at address 0x88 has   16 sectors [Host vendor specific log]
SMART Log at address 0x89 has   16 sectors [Host vendor specific log]
SMART Log at address 0x8a has   16 sectors [Host vendor specific log]
SMART Log at address 0x8b has   16 sectors [Host vendor specific log]
SMART Log at address 0x8c has   16 sectors [Host vendor specific log]
SMART Log at address 0x8d has   16 sectors [Host vendor specific log]
SMART Log at address 0x8e has   16 sectors [Host vendor specific log]
SMART Log at address 0x8f has   16 sectors [Host vendor specific log]
SMART Log at address 0x90 has   16 sectors [Host vendor specific log]
SMART Log at address 0x91 has   16 sectors [Host vendor specific log]
SMART Log at address 0x92 has   16 sectors [Host vendor specific log]
SMART Log at address 0x93 has   16 sectors [Host vendor specific log]
SMART Log at address 0x94 has   16 sectors [Host vendor specific log]
SMART Log at address 0x95 has   16 sectors [Host vendor specific log]
SMART Log at address 0x96 has   16 sectors [Host vendor specific log]
SMART Log at address 0x97 has   16 sectors [Host vendor specific log]
SMART Log at address 0x98 has   16 sectors [Host vendor specific log]
SMART Log at address 0x99 has   16 sectors [Host vendor specific log]
SMART Log at address 0x9a has   16 sectors [Host vendor specific log]
SMART Log at address 0x9b has   16 sectors [Host vendor specific log]
SMART Log at address 0x9c has   16 sectors [Host vendor specific log]
SMART Log at address 0x9d has   16 sectors [Host vendor specific log]
SMART Log at address 0x9e has   16 sectors [Host vendor specific log]
SMART Log at address 0x9f has   16 sectors [Host vendor specific log]
SMART Log at address 0xa1 has   20 sectors [Device vendor specific log]
SMART Log at address 0xa8 has  129 sectors [Device vendor specific log]
SMART Log at address 0xa9 has    1 sectors [Device vendor specific log]
SMART Log at address 0xc0 has    1 sectors [Device vendor specific log]
SMART Log at address 0xe0 has    1 sectors [SCT Command/Status]
SMART Log at address 0xe1 has    1 sectors [SCT Data Transfer]

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
SMART Error Log Version: 1
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 80%      9156    
     -
# 2  Short offline       Completed without error       00%      9156    
     -

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute 
delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    38 Celsius
Power Cycle Min/Max Temperature:     35/38 Celsius
Lifetime    Min/Max Temperature:     23/48 Celsius
Under/Over Temperature Limit Count:   0/0
SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        59 minutes
Min/Max recommended Temperature:     14/55 Celsius
Min/Max Temperature Limit:           10/60 Celsius
Temperature History Size (Index):    128 (53)

Index    Estimated Time   Temperature Celsius
   54    2012-07-19 09:24    35  ****************
  ...    ..(  3 skipped).    ..  ****************
   58    2012-07-19 13:20    35  ****************
   59    2012-07-19 14:19    34  ***************
  ...    ..(  3 skipped).    ..  ***************
   63    2012-07-19 18:15    34  ***************
   64    2012-07-19 19:14    35  ****************
   65    2012-07-19 20:13    35  ****************
   66    2012-07-19 21:12    35  ****************
   67    2012-07-19 22:11    36  *****************
   68    2012-07-19 23:10    36  *****************
   69    2012-07-20 00:09    35  ****************
  ...    ..( 11 skipped).    ..  ****************
   81    2012-07-20 11:57    35  ****************
   82    2012-07-20 12:56    34  ***************
  ...    ..(  5 skipped).    ..  ***************
   88    2012-07-20 18:50    34  ***************
   89    2012-07-20 19:49    35  ****************
   90    2012-07-20 20:48    35  ****************
   91    2012-07-20 21:47    36  *****************
   92    2012-07-20 22:46    37  ******************
   93    2012-07-20 23:45    36  *****************
   94    2012-07-21 00:44    36  *****************
   95    2012-07-21 01:43    35  ****************
   96    2012-07-21 02:42    35  ****************
   97    2012-07-21 03:41    35  ****************
   98    2012-07-21 04:40    36  *****************
   99    2012-07-21 05:39    36  *****************
  100    2012-07-21 06:38    36  *****************
  101    2012-07-21 07:37    35  ****************
  ...    ..(  6 skipped).    ..  ****************
  108    2012-07-21 14:30    35  ****************
  109    2012-07-21 15:29    34  ***************
  110    2012-07-21 16:28    35  ****************
  ...    ..(  6 skipped).    ..  ****************
  117    2012-07-21 23:21    35  ****************
  118    2012-07-22 00:20    34  ***************
  119    2012-07-22 01:19    34  ***************
  120    2012-07-22 02:18    34  ***************
  121    2012-07-22 03:17    35  ****************
  ...    ..( 14 skipped).    ..  ****************
    8    2012-07-22 18:02    35  ****************
    9    2012-07-22 19:01     ?  -
   10    2012-07-22 20:00    35  ****************
   11    2012-07-22 20:59    35  ****************
   12    2012-07-22 21:58    38  *******************
   13    2012-07-22 22:57    38  *******************
   14    2012-07-22 23:56    38  *******************
   15    2012-07-23 00:55    39  ********************
   16    2012-07-23 01:54    38  *******************
   17    2012-07-23 02:53    38  *******************
   18    2012-07-23 03:52    39  ********************
   19    2012-07-23 04:51    39  ********************
   20    2012-07-23 05:50    38  *******************
  ...    ..( 11 skipped).    ..  *******************
   32    2012-07-23 17:38    38  *******************
   33    2012-07-23 18:37    37  ******************
  ...    ..(  3 skipped).    ..  ******************
   37    2012-07-23 22:33    37  ******************
   38    2012-07-23 23:32    38  *******************
   39    2012-07-24 00:31     ?  -
   40    2012-07-24 01:30    25  ******
   41    2012-07-24 02:29    25  ******
   42    2012-07-24 03:28    36  *****************
   43    2012-07-24 04:27    36  *****************
   44    2012-07-24 05:26     ?  -
   45    2012-07-24 06:25    36  *****************
   46    2012-07-24 07:24    36  *****************
   47    2012-07-24 08:23    35  ****************
   48    2012-07-24 09:22    36  *****************
   49    2012-07-24 10:21    35  ****************
   50    2012-07-24 11:20    36  *****************
   51    2012-07-24 12:19    36  *****************
   52    2012-07-24 13:18    35  ****************
   53    2012-07-24 14:17    38  *******************

SCT Error Recovery Control:
            Read: Disabled
           Write: Disabled

SATA Phy Event Counters (GP Log 0x11) not supported



-- 
Thanks,
    Dean E. Weimer
    http://www.dweimer.net/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e7e1dc818414243c7df4f1c7aca5e834>