From owner-freebsd-stable@FreeBSD.ORG Sat May 5 23:50:58 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C8E52106564A for ; Sat, 5 May 2012 23:50:58 +0000 (UTC) (envelope-from elon@emmi.physik-pool.tu-berlin.de) Received: from mail.tu-berlin.de (mail.tu-berlin.de [130.149.7.33]) by mx1.freebsd.org (Postfix) with ESMTP id 6BFE08FC08 for ; Sat, 5 May 2012 23:50:58 +0000 (UTC) X-tubIT-Incoming-IP: 130.149.58.163 Received: from mail.physik-pool.tu-berlin.de ([130.149.58.163] helo=mail.physik.tu-berlin.de) by mail.tu-berlin.de (exim-4.75/mailfrontend-4) with esmtp for id 1SQoka-00007U-Bw; Sun, 06 May 2012 01:50:57 +0200 Received: from emmi.physik-pool.tu-berlin.de (emmi.physik-pool.tu-berlin.de [130.149.58.146]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.physik.tu-berlin.de (Postfix) with ESMTPS id D504B11402 for ; Sun, 6 May 2012 01:50:49 +0200 (CEST) Received: (from elon@localhost) by emmi.physik-pool.tu-berlin.de (8.14.5/8.14.5/Submit) id q45NonQS029330 for freebsd-stable@freebsd.org; Sun, 6 May 2012 01:50:49 +0200 (CEST) (envelope-from elon) Date: Sun, 6 May 2012 01:50:49 +0200 From: Leon =?iso-8859-15?Q?Me=DFner?= To: freebsd-stable@freebsd.org Message-ID: <20120505235049.GH20333@emmi.physik-pool.tu-berlin.de> Mail-Followup-To: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Probable drive failure not recognized by ZFS on mps(4) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 May 2012 23:50:58 -0000 Hi, running 9-STABLE from 2 weeks ago i'm having a problem where ZFS is not recognizing a failing SATA disk on an LSI SAS2x36 expander. The gnop(8) device in the zpool status output is for testing purpose. ZFS fails those alright. What could i do to check if the SCSI sense code actually makes sense for this drive ? Thanks, Leon uname : FreeBSD fred.physik-pool.tu-berlin.de 9.0-STABLE FreeBSD 9.0-STABLE #0: Wed Apr 18 20:05:08 CEST 2012 master@fred.physik-pool.tu-berlin.de:/usr/obj/usr/src/sys/GENERIC amd64 /var/log/messages (a lot of this and similar): May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab a3 1 0 length 512 SMID 809 terminated ioc 804b scsi 0 state 0 xfer 0 May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab a4 1 0 length 512 SMID 633 terminated ioc 804b scsi 0 state 0 xfer 0 May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e af 31 1 0 length 512 SMID 253 terminated ioc 804b scsi 0 state 0 xfer 0 May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 5 79 c2 a6 0 0 1 0 May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): CAM status: SCSI Status Error May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): SCSI status: Check Condition May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): Info: 0x579c2a6 May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab ee 1 0 length 512 SMID 344 terminated ioc 804b scsi 0 state 0 xfer 0 May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 3a 38 3c 10 0 0 10 0 length 8192 SMID 304 terminated ioc 804b scsi 0 state 0 xfer 0 May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 3a 38 3a 10 0 0 10 0 length 8192 SMID 712 terminated ioc 804b scsi 0 state 0 xfer 0 May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 5 79 c2 56 0 0 46 0 May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): CAM status: SCSI Status Error May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): SCSI status: Check Condition May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): Info: 0x579c298 smartctl -a /dev/da17 (excerpt): Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 163 163 051 Pre-fail Always - 929442 3 Spin_Up_Time 0x0027 238 238 021 Pre-fail Always - 1083 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 40 5 Reallocated_Sector_Ct 0x0033 174 174 140 Pre-fail Always - 207 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 4077 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 33 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 40 194 Temperature_Celsius 0x0022 118 104 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 207 197 Current_Pending_Sector 0x0032 184 183 000 Old_age Always - 1342 198 Offline_Uncorrectable 0x0030 186 183 000 Old_age Offline - 1168 199 UDMA_CRC_Error_Count 0x0032 200 199 000 Old_age Always - 9 200 Multi_Zone_Error_Rate 0x0008 001 001 000 Old_age Offline - 397969 zpool status: # zpool status pool: POOL state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat May 5 23:55:44 2012 606G scanned out of 3.22T at 104M/s, 7h23m to go 2.26G resilvered, 18.38% done config: NAME STATE READ WRITE CKSUM POOL DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/port0-2035c2485 ONLINE 0 0 0 gpt/port2-0565e5416 ONLINE 0 0 0 gpt/port4-200162460 ONLINE 0 0 0 gpt/port6-2556b79f8 ONLINE 0 0 0 gpt/port8-2aac22cb4 ONLINE 0 0 0 gpt/port10-2aac226d2 ONLINE 0 0 0 gpt/port12-0ad6e26d8 ONLINE 0 0 0 gpt/port14-2b0024fed ONLINE 0 0 10 (resilvering) gpt/port16-2afc39a37 ONLINE 0 0 0 gpt/port18-2556b7770 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 gpt/port1-2acfb0988 ONLINE 0 0 0 gpt/port3-202b5e684 ONLINE 0 0 0 gpt/port5-2025090a1 ONLINE 0 0 0 gpt/port7-2557e4c7a ONLINE 0 0 0 gpt/port9-2adcaf4a5 ONLINE 0 0 0 gpt/port11-2acfb6ab3 ONLINE 0 0 0 gpt/port13-2afc67e75 ONLINE 0 0 0 gpt/port15-25aaca07f ONLINE 0 0 0 gpt/port17-2ad60c96d ONLINE 0 0 40 (resilvering) replacing-9 OFFLINE 0 0 0 2488369476163776260 OFFLINE 0 0 0 was /dev/da19p1 da19p1.nop ONLINE 0 0 0 (resilvering) errors: No known data errors