Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Dec 2012 01:01:10 -0800
From:      Derek Kulinski <kulinski@cs.ucla.edu>
To:        Alex Povolotsky <tarkhil@webmail.sub.ru>
Cc:        freebsd-stable@freebsd.org, freebsd-hardware@freebsd.org
Subject:   Re: Strange problem with... ZFS? Disk? Controller?
Message-ID:  <1664598999.20121222010110@cs.ucla.edu>
In-Reply-To: <50D56D4B.4060709@webmail.sub.ru>
References:  <50D56D4B.4060709@webmail.sub.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello Alex,

SMART values are collected by the disk itself (smartmontools is only
reading it).

This would imply that the problem is between disk and controller.

Since you have tons of Hardware_ECC_Recovered and none of
UDMA_CRC_Error_Count I would think that the problem is with disk
itself.

I think the long waits are due to disk trying to re-read given sector
multiple times.

Your drive is 2TB, and according to this the bigger the drive the more
likely you'll run into problems like these:
http://forums.storagereview.com/index.php/topic/27994-smart-hardware-ecc-recovered-values/

I don't know how serious it is but if you keep anything important
there I would recommend a backup.

You should try SMART self tests.

Best regards,
Derek

Saturday, December 22, 2012, 12:20:27 AM, you wrote:

> Hello,

> I'm running FreeBSD 9.0/amd64, pure ZFS setup, one Seagate disk 
> ST2000NM0011 SN02 on LSI Logic (mpt) controller.

> Yes, I know that running one disk on RAID controller is a bit weird, I
> have to find yet if it is possible to connect disk to internal SATA 
> controller.

> About two days ago, system became SLOW. Disk usage is constantly 100%,
> and sometimes I'm getting swap_pager: indefinite wait buffer error. I 
> had to reset computer twice in two days.

> mptutil does not show any errors, and smartctl shows

> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
> WHEN_FAILED RAW_VALUE
>    1 Raw_Read_Error_Rate     0x000f   067   063   044    Pre-fail 
> Always       -       6218970
>    3 Spin_Up_Time            0x0003   093   092   000    Pre-fail 
> Always       -       0
>    4 Start_Stop_Count        0x0032   100   100   020    Old_age 
> Always       -       14
>    5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail 
> Always       -       21
>    7 Seek_Error_Rate         0x000f   091   060   030    Pre-fail 
> Always       -       1433294073
>    9 Power_On_Hours          0x0032   090   090   000    Old_age 
> Always       -       8825
>   10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
> Always       -       0
>   12 Power_Cycle_Count       0x0032   100   100   020    Old_age 
> Always       -       16
> 184 End-to-End_Error        0x0032   100   100   099    Old_age 
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age 
> Always       -       0
> 188 Command_Timeout         0x0032   100   099   000    Old_age 
> Always       -       12885098499
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age 
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   068   047   045    Old_age 
> Always       -       32 (Min/Max 31/32)
> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age 
> Always       -       859
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age 
> Always       -       15
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age 
> Always       -       26
> 194 Temperature_Celsius     0x0022   032   053   000    Old_age 
> Always       -       32 (0 21 0 0 0)
> 195 Hardware_ECC_Recovered  0x001a   103   099   000    Old_age 
> Always       -       6218970
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age 
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age 
> Always       -       0

> SMART Error Log Version: 1
> No Errors Logged

> I have removed most of snapshots, it does not help.

> I have stopped all active processes, disk load did not decrease, same 100%.

> What can I check and/or replace to get the problem fixed? Any ideas?

> Alex
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"



-- 
Best regards,
 Derek                            mailto:kulinski@cs.ucla.edu

If you choke a Smurf, what color does it turn?




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1664598999.20121222010110>