Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Feb 2012 14:54:35 +0100
From:      Victor Balada Diaz <victor@bsdes.net>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        stable@FreeBSD.org
Subject:   Re: problems with AHCI on FreeBSD 8.2
Message-ID:  <20120214135435.GQ2010@equilibrium.bsdes.net>
In-Reply-To: <20120214100513.GA94501@icarus.home.lan>
References:  <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 14, 2012 at 02:05:13AM -0800, Jeremy Chadwick wrote:
> On Tue, Feb 14, 2012 at 10:19:09AM +0100, Victor Balada Diaz wrote:
> > We're having some troubles with AHCI under FreeBSD 8.2 and 8-STABLE. The error is:
> > 
> > ahcich0: Timeout on slot 8
> > ahcich0: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr 00000000
> > ahcich0: AHCI reset...
> > ahcich0: SATA connect time=0ms status=00000123
> > ahcich0: ready wait time=18ms
> > ahcich0: AHCI reset done: device found
> > (ada0:ahcich0:0:0:0): Request requeued
> > (ada0:ahcich0:0:0:0): Retrying command
> > (ada0:ahcich0:0:0:0): Command timed out
> > (ada0:ahcich0:0:0:0): Retrying command
> > ahcich0: Timeout on slot 8
> > ahcich0: is 00000000 cs 007ff000 ss 007fff00 rs 007fff00 tfd c0 serr 00000000
> > ahcich0: AHCI reset...
> > ahcich0: SATA connect time=0ms status=00000123
> > ahcich0: ready wait time=84ms
> > ahcich0: AHCI reset done: device found
> > (ada0:ahcich0:0:0:0): Request requeued
> > (ada0:ahcich0:0:0:0): Retrying command
> > (ada0:ahcich0:0:0:0): Command timed out
> > (ada0:ahcich0:0:0:0): Retrying command
> > (ada0:ahcich0:0:0:0): Request requeued
> > [...]
> > 
> > If we use old ATA driver we have no problems. If we just use the first disk (ada0) with ahci,
> > no problems either. If we use both disks (ada0 and ada1) in gmirror setup with ahci, we
> > got the above error. If we use both disks in gmirror with old ata driver, no problems.
> 
> Please provide SMART statistics for both disks by installing
> ports/sysutils/smartmontools (5.42 or newer please) and running
> "smartctl -a" against both disks (ada0/ada1, or ad4/ad10 -- doesn't
> matter which driver you're using).  I will review the output.

Just forgot to say that from time to time, after system hangs and i need
to reboot, one of the disks is lost. It doesn't even show after a few reboots,
nor on Linux live system.

You can see smartctl output here:

ada0:

# smartctl -a /dev/ada0
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG
Device Model:     SAMSUNG HD154UI
Serial Number:    S24EJ9BB200080
LU WWN Device Id: 5 0024e9 2047cb78f
Firmware Version: 1AG01118
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:    Tue Feb 14 13:51:18 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (18863) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (  33) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   072   072   011    Pre-fail  Always       -       9330
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       22
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       13677
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4688
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       22
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   067   000    Old_age   Always       -       31 (Min/Max 31/31)
194 Temperature_Celsius     0x0022   068   067   000    Old_age   Always       -       32 (Min/Max 31/32)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       113154
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       28
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4430         -
# 2  Extended offline    Completed without error       10%      4410         -
# 3  Extended offline    Completed without error       00%        27         -
# 4  Short offline       Completed without error       00%        14         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Ada1:

# smartctl -a /dev/ada1
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG
Device Model:     SAMSUNG HD154UI
Serial Number:    S24EJ9BB200082
LU WWN Device Id: 5 0024e9 2047cb7a5
Firmware Version: 1AG01118
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:    Tue Feb 14 13:52:09 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (19064) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (  33) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   071   071   011    Pre-fail  Always       -       9360
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       21
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       12804
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4583
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       21
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   069   000    Old_age   Always       -       29 (Min/Max 29/29)
194 Temperature_Celsius     0x0022   070   068   000    Old_age   Always       -       30 (Min/Max 29/30)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1870564
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       2
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4430         -
# 2  Extended offline    Completed without error       10%      4409         -
# 3  Extended offline    Completed without error       00%        28         -
# 4  Short offline       Completed without error       00%        14         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



-- 
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros. 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120214135435.GQ2010>