Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 May 2012 01:50:49 +0200
From:      Leon =?iso-8859-15?Q?Me=DFner?= <l.messner@physik.tu-berlin.de>
To:        freebsd-stable@freebsd.org
Subject:   Probable drive failure not recognized by ZFS on mps(4)
Message-ID:  <20120505235049.GH20333@emmi.physik-pool.tu-berlin.de>

next in thread | raw e-mail | index | archive | help
Hi,

running 9-STABLE from 2 weeks ago i'm having a problem where ZFS is not
recognizing a failing SATA disk on an LSI SAS2x36 expander. The gnop(8)
device in the zpool status output is for testing purpose. ZFS fails
those alright. What could i do to check if the SCSI sense code actually
makes sense for this drive ?

Thanks,
Leon

uname :
FreeBSD fred.physik-pool.tu-berlin.de 9.0-STABLE FreeBSD 9.0-STABLE #0: Wed Apr 18 20:05:08 CEST 2012 
master@fred.physik-pool.tu-berlin.de:/usr/obj/usr/src/sys/GENERIC  amd64

/var/log/messages (a lot of this and similar):
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab a3 1 0 length 512 SMID 809 terminated ioc 804b scsi 0 state 0 xfer 0
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab a4 1 0 length 512 SMID 633 terminated ioc 804b scsi 0 state 0 xfer 0
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e af 31 1 0 length 512 SMID 253 terminated ioc 804b scsi 0 state 0 xfer 0
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 5 79 c2 a6 0 0 1 0
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): CAM status: SCSI Status Error
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): SCSI status: Check Condition
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
May  6 01:32:53 fred kernel: (da17:mps0:0:26:0): Info: 0x579c2a6
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab ee 1 0 length 512 SMID 344 terminated ioc 804b scsi 0 state 0 xfer 0
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 3a 38 3c 10 0 0 10 0 length 8192 SMID 304 terminated ioc 804b scsi 0 state 0 xfer 0
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 3a 38 3a 10 0 0 10 0 length 8192 SMID 712 terminated ioc 804b scsi 0 state 0 xfer 0
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 5 79 c2 56 0 0 46 0
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): CAM status: SCSI Status Error
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): SCSI status: Check Condition
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
May  6 01:32:58 fred kernel: (da17:mps0:0:26:0): Info: 0x579c298

smartctl -a /dev/da17 (excerpt):
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   163   163   051    Pre-fail  Always       -       929442
  3 Spin_Up_Time            0x0027   238   238   021    Pre-fail  Always       -       1083
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       40
  5 Reallocated_Sector_Ct   0x0033   174   174   140    Pre-fail  Always       -       207
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       4077
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       33
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       40
194 Temperature_Celsius     0x0022   118   104   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       207
197 Current_Pending_Sector  0x0032   184   183   000    Old_age   Always       -       1342
198 Offline_Uncorrectable   0x0030   186   183   000    Old_age   Offline      -       1168
199 UDMA_CRC_Error_Count    0x0032   200   199   000    Old_age   Always       -       9
200 Multi_Zone_Error_Rate   0x0008   001   001   000    Old_age   Offline      -       397969

zpool status:
# zpool status
  pool: POOL
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat May  5 23:55:44 2012
        606G scanned out of 3.22T at 104M/s, 7h23m to go
        2.26G resilvered, 18.38% done
config:

        NAME                       STATE     READ WRITE CKSUM
        POOL                       DEGRADED     0     0     0
          raidz2-0                 ONLINE       0     0     0
            gpt/port0-2035c2485    ONLINE       0     0     0
            gpt/port2-0565e5416    ONLINE       0     0     0
            gpt/port4-200162460    ONLINE       0     0     0
            gpt/port6-2556b79f8    ONLINE       0     0     0
            gpt/port8-2aac22cb4    ONLINE       0     0     0
            gpt/port10-2aac226d2   ONLINE       0     0     0
            gpt/port12-0ad6e26d8   ONLINE       0     0     0
            gpt/port14-2b0024fed   ONLINE       0     0    10  (resilvering)
            gpt/port16-2afc39a37   ONLINE       0     0     0
            gpt/port18-2556b7770   ONLINE       0     0     0
          raidz2-1                 DEGRADED     0     0     0
            gpt/port1-2acfb0988    ONLINE       0     0     0
            gpt/port3-202b5e684    ONLINE       0     0     0
            gpt/port5-2025090a1    ONLINE       0     0     0
            gpt/port7-2557e4c7a    ONLINE       0     0     0
            gpt/port9-2adcaf4a5    ONLINE       0     0     0
            gpt/port11-2acfb6ab3   ONLINE       0     0     0
            gpt/port13-2afc67e75   ONLINE       0     0     0
            gpt/port15-25aaca07f   ONLINE       0     0     0
            gpt/port17-2ad60c96d   ONLINE       0     0    40  (resilvering)
            replacing-9            OFFLINE      0     0     0
              2488369476163776260  OFFLINE      0     0     0  was /dev/da19p1
              da19p1.nop           ONLINE       0     0     0  (resilvering)

errors: No known data errors



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120505235049.GH20333>