Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Feb 2012 16:35:50 -0500
From:      Mike Tancsa <mike@sentex.net>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        Alexander Motin <mav@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: siisch1: Error while READ LOG EXT
Message-ID:  <4F358DB6.4030203@sentex.net>
In-Reply-To: <4F34124F.9090808@sentex.net>
References:  <4F32E289.4080806@sentex.net>	<mailpost.1328736521.3202974.81071.mailing.freebsd.stable@FreeBSD.cs.nctu.edu.tw>	<4F32F5B0.2060203@FreeBSD.org>	<20120208223819.GA27488@icarus.home.lan>	<4F32FB5E.7050102@FreeBSD.org> <4F33DB75.1080202@sentex.net>	<20120209152240.GA95470@icarus.home.lan>	<4F33F056.6070300@sentex.net>	<20120209163415.GA96451@icarus.home.lan> <4F34124F.9090808@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/9/2012 1:37 PM, Mike Tancsa wrote:
> On 2/9/2012 11:34 AM, Jeremy Chadwick wrote:
>>
>> You will probably need to "track these drives" on a regular basis.  That
>> is to say, set up some cronjob or similar that logs the above output to
>> a file (appends data to it), specifically output from smartctl -A (not
>> -a and not -x) and smartctl -l sataphy on a per-disk basis.  smartd can
>> track SMART attribute changes, but does not track GPLog changes.  Make
>> sure to put timestamps in your logs.
> 
> Thanks very much for having a look, and the suggestions. It think this
> is the way to go to see which drive my have errors incrementing.
> Alexander, is there a better way you can suggest ?

Got a few more of the READ LOG EXT errors and I did a snapshot of all the disks post error to compare with the snapshots from cron this AM. Unfortunately some of the deltas were on the one new port multiplier and some were on the motherboard sata.

Feb  9 04:34:55 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 10 16:05:53 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 10 16:06:53 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 10 16:07:06 backup3 last message repeated 3 times
Feb 10 16:18:24 backup3 last message repeated 16 times
Feb 10 16:18:24 backup3 kernel: 
Feb 10 16:18:39 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 10 16:19:10 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 10 16:20:27 backup3 last message repeated 4 times
Feb 10 16:20:27 backup3 kernel: 
Feb 10 16:20:30 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 10 16:21:33 backup3 kernel: siisch1: Error while READ LOG EXT
Feb 10 16:23:23 backup3 last message repeated 8 times



On ada4,

-199 UDMA_CRC_Error_Count    -O--CK   200   199   000    -    13
+199 UDMA_CRC_Error_Count    -O--CK   200   199   000    -    32
 SATA Phy Event Counters (GP Log 0x11)
 ID      Size     Value  Description
-0x0001  2           13  Command failed due to ICRC error
-0x0002  2           13  R_ERR response for data FIS
-0x0003  2           13  R_ERR response for device-to-host data FIS
+0x0001  2           32  Command failed due to ICRC error
+0x0002  2           32  R_ERR response for data FIS
+0x0003  2           32  R_ERR response for device-to-host data FIS
 0x0004  2            0  R_ERR response for host-to-device data FIS
-0x0005  2            0  R_ERR response for non-data FIS
-0x0006  2            0  R_ERR response for device-to-host non-data FIS
+0x0005  2            1  R_ERR response for non-data FIS
+0x0006  2            1  R_ERR response for device-to-host non-data FIS
 0x0007  2            0  R_ERR response for host-to-device non-data FIS
 0x000a  2            0  Device-to-host register FISes sent due to a COMRESET
 0x000b  2            0  CRC errors within host-to-device FIS
-0x8000  4       744462  Vendor specific
+0x8000  4       785195  Vendor specific
 
 General Purpose Log 0x10 [NCQ Command Error log], Page 0-0 (of 1)
-0000000: 05 00 41 84 04 9a 53 40 00 00 00 00 00 00 00 00 |..A...S@........|
+0000000: 06 00 41 84 f2 39 6d 40 2d 00 00 00 00 00 00 00 |..A..9m@-.......|
-00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa |................|
+00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 25 |...............%|


ada5

-199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    11
+199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    22
-0x0001  2           11  Command failed due to ICRC error
-0x0002  2           11  R_ERR response for data FIS
-0x0003  2           11  R_ERR response for device-to-host data FIS
+0x0001  2           22  Command failed due to ICRC error
+0x0002  2           22  R_ERR response for data FIS
+0x0003  2           22  R_ERR response for device-to-host data FIS


ada6
-199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    8
+199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    25
 SATA Phy Event Counters (GP Log 0x11)
 ID      Size     Value  Description
-0x0001  2            8  Command failed due to ICRC error
-0x0002  2            8  R_ERR response for data FIS
-0x0003  2            8  R_ERR response for device-to-host data FIS
+0x0001  2           25  Command failed due to ICRC error
+0x0002  2           25  R_ERR response for data FIS
+0x0003  2           25  R_ERR response for device-to-host data FIS
 0x0004  2            0  R_ERR response for host-to-device data FIS
 0x0005  2            0  R_ERR response for non-data FIS
 0x0006  2            0  R_ERR response for device-to-host non-data FIS
 0x0007  2            0  R_ERR response for host-to-device non-data FIS
 0x000a  2            0  Device-to-host register FISes sent due to a COMRESET
 0x000b  2            0  CRC errors within host-to-device FIS
-0x8000  4       744462  Vendor specific
+0x8000  4       785195  Vendor specific


ada7
-199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    13
+199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    30
 SATA Phy Event Counters (GP Log 0x11)
 ID      Size     Value  Description
-0x0001  2           13  Command failed due to ICRC error
-0x0002  2           13  R_ERR response for data FIS
-0x0003  2           13  R_ERR response for device-to-host data FIS
+0x0001  2           30  Command failed due to ICRC error
+0x0002  2           31  R_ERR response for data FIS
+0x0003  2           31  R_ERR response for device-to-host data FIS
 0x0004  2            0  R_ERR response for host-to-device data FIS
 0x0005  2            1  R_ERR response for non-data FIS
 0x0006  2            1  R_ERR response for device-to-host non-data FIS
 0x0007  2            0  R_ERR response for host-to-device non-data FIS
 0x000a  2            0  Device-to-host register FISes sent due to a COMRESET
 0x000b  2            0  CRC errors within host-to-device FIS
-0x8000  4       744460  Vendor specific
+0x8000  4       785193  Vendor specific

 General Purpose Log 0x10 [NCQ Command Error log], Page 0-0 (of 1)
-0000000: 19 00 41 84 74 3d 4a 40 29 00 00 00 00 00 00 00 |..A.t=J@).......|
+0000000: 15 00 41 84 d7 03 1f 40 2d 00 00 00 00 00 00 00 |..A....@-.......|
 0000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 0000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
@@ -238,5 +244,5 @@
 00001c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 00001d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 00001e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
-00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b3 |................|
+00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b5 |................|


ada9

 ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
-  1 Raw_Read_Error_Rate     POSR--   115   099   006    -    91821743
+  1 Raw_Read_Error_Rate     POSR--   117   099   006    -    155365055
   3 Spin_Up_Time            PO----   093   092   000    -    0
   4 Start_Stop_Count        -O--CK   100   100   020    -    68
   5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    2
-  7 Seek_Error_Rate         POSR--   088   060   030    -    792342525
-  9 Power_On_Hours          -O--CK   074   074   000    -    22792
+  7 Seek_Error_Rate         POSR--   088   060   030    -    792482445
+  9 Power_On_Hours          -O--CK   074   074   000    -    22803
  10 Spin_Retry_Count        PO--C-   100   100   097    -    2
  12 Power_Cycle_Count       -O--CK   100   100   020    -    68
 184 End-to-End_Error        -O--CK   100   100   099    -    0
 187 Reported_Uncorrect      -O--CK   095   095   000    -    5
 188 Command_Timeout         -O--CK   100   100   000    -    0
 189 High_Fly_Writes         -O-RCK   001   001   000    -    961
-190 Airflow_Temperature_Cel -O---K   064   056   045    -    36 (Min/Max 33/37)
-194 Temperature_Celsius     -O---K   036   044   000    -    36 (0 25 0 0 0)
-195 Hardware_ECC_Recovered  -O-RC-   050   030   000    -    91821743
+190 Airflow_Temperature_Cel -O---K   066   056   045    -    34 (Min/Max 33/37)
+194 Temperature_Celsius     -O---K   034   044   000    -    34 (0 25 0 0 0)
+195 Hardware_ECC_Recovered  -O-RC-   050   030   000    -    155365055
 197 Current_Pending_Sector  -O--C-   100   100   000    -    0
 198 Offline_Uncorrectable   ----C-   100   100   000    -    0
 199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0


ada10

 SMART Attributes Data Structure revision number: 10
 Vendor Specific SMART Attributes with Thresholds:
 ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
-  1 Raw_Read_Error_Rate     POSR--   118   099   006    -    196445860
+  1 Raw_Read_Error_Rate     POSR--   107   099   006    -    13128068
   3 Spin_Up_Time            PO----   095   095   000    -    0
   4 Start_Stop_Count        -O--CK   100   100   020    -    216
   5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
-  7 Seek_Error_Rate         POSR--   087   060   030    -    586360650
-  9 Power_On_Hours          -O--CK   077   077   000    -    20319
+  7 Seek_Error_Rate         POSR--   087   060   030    -    586495516
+  9 Power_On_Hours          -O--CK   077   077   000    -    20330
  10 Spin_Retry_Count        PO--C-   100   100   097    -    0
  12 Power_Cycle_Count       -O--CK   100   100   020    -    113
 183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
@@ -69,15 +69,15 @@
 187 Reported_Uncorrect      -O--CK   100   100   000    -    0
 188 Command_Timeout         -O--CK   100   100   000    -    0
 189 High_Fly_Writes         -O-RCK   099   099   000    -    1
-190 Airflow_Temperature_Cel -O---K   067   062   045    -    33 (Min/Max 31/34)
-194 Temperature_Celsius     -O---K   033   040   000    -    33 (0 22 0 0 0)
-195 Hardware_ECC_Recovered  -O-RC-   040   018   000    -    196445860
+190 Airflow_Temperature_Cel -O---K   068   062   045    -    32 (Min/Max 31/34)
+194 Temperature_Celsius     -O---K   032   040   000    -    32 (0 22 0 0 0)
+195 Hardware_ECC_Recovered  -O-RC-   028   018   000    -    13128068
 197 Current_Pending_Sector  -O--C-   100   100   000    -    0
 198 Offline_Uncorrectable   ----C-   100   100   000    -    0
 199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
-240 Head_Flying_Hours       ------   100   253   000    -    205935091929084
-241 Total_LBAs_Written      ------   100   253   000    -    1286405353
-242 Total_LBAs_Read         ------   100   253   000    -    708601879
+240 Head_Flying_Hours       ------   100   253   000    -    221530118180872
+241 Total_LBAs_Written      ------   100   253   000    -    3323838357
+242 Total_LBAs_Read         ------   100   253   000    -    1778396343


ada11

 ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
-  1 Raw_Read_Error_Rate     POSR--   120   097   006    -    242285977
+  1 Raw_Read_Error_Rate     POSR--   113   097   006    -    58229866
   3 Spin_Up_Time            PO----   092   091   000    -    0
   4 Start_Stop_Count        -O--CK   100   100   020    -    69
   5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
-  7 Seek_Error_Rate         POSR--   073   060   030    -    133894632808
-  9 Power_On_Hours          -O--CK   072   072   000    -    25283
+  7 Seek_Error_Rate         POSR--   073   060   030    -    133894764364
+  9 Power_On_Hours          -O--CK   072   072   000    -    25294
  10 Spin_Retry_Count        PO--C-   100   100   097    -    3
  12 Power_Cycle_Count       -O--CK   100   100   020    -    82
 184 End-to-End_Error        -O--CK   100   100   099    -    0
 187 Reported_Uncorrect      -O--CK   100   100   000    -    0
 188 Command_Timeout         -O--CK   100   089   000    -    124555952157
 189 High_Fly_Writes         -O-RCK   080   080   000    -    20
-190 Airflow_Temperature_Cel -O---K   059   050   045    -    41 (Min/Max 38/42)
-194 Temperature_Celsius     -O---K   041   050   000    -    41 (0 22 0 0 0)
-195 Hardware_ECC_Recovered  -O-RC-   051   032   000    -    242285977
+190 Airflow_Temperature_Cel -O---K   061   050   045    -    39 (Min/Max 38/42)
+194 Temperature_Celsius     -O---K   039   050   000    -    39 (0 22 0 0 0)
+195 Hardware_ECC_Recovered  -O-RC-   050   032   000    -    58229866
 197 Current_Pending_Sector  -O--C-   100   100   000    -    0
 198 Offline_Uncorrectable   ----C-   100   100   000    -    0
 199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0



-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F358DB6.4030203>