From owner-freebsd-scsi@freebsd.org Thu Jun 23 14:55:13 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6A49CB72577 for ; Thu, 23 Jun 2016 14:55:13 +0000 (UTC) (envelope-from dan@langille.org) Received: from clavin1.langille.org (clavin.langille.org [162.208.116.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "clavin.langille.org", Issuer "StartCom Class 2 Primary Intermediate Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1633D1009 for ; Thu, 23 Jun 2016 14:55:12 +0000 (UTC) (envelope-from dan@langille.org) Received: from (clavin1.int.langille.org (clavin1.int.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) with ESMTPSA id 79DED3B04 for ; Thu, 23 Jun 2016 14:54:58 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 From: Dan Langille In-Reply-To: Date: Thu, 23 Jun 2016 10:54:57 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <2E8752E5-76AF-4042-86D9-8C6733658A80@langille.org> <5EEF0794-B06E-4A72-89DA-7DCD94AE1FC6@langille.org> <072CEC8B-9392-4378-8DF5-63D05901850B@langille.org> <0d7401d19f10$ee329300$ca97b900$@broadcom.com> <068601d1bb57$f675f710$e361e530$@broadcom.com> To: freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2016 14:55:13 -0000 > On May 31, 2016, at 12:20 PM, Dan Langille wrote: >=20 >> On May 31, 2016, at 12:17 PM, Stephen McConnell = wrote: >>=20 >>=20 >>=20 >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of Dan Langille >>> Sent: Monday, May 30, 2016 12:28 PM >>> To: freebsd-scsi@freebsd.org >>> Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 >>>=20 >>>> On Apr 25, 2016, at 12:38 PM, Stephen McConnell >>> wrote: >>>>=20 >>>>=20 >>>>=20 >>>>> -----Original Message----- >>>>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>>>> scsi@freebsd.org] On Behalf Of Dan Langille >>>>> Sent: Monday, April 25, 2016 9:40 AM >>>>> To: freebsd-scsi@freebsd.org >>>>> Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 >>>>>=20 >>>>>> On Apr 25, 2016, at 8:17 AM, Dan Langille = wrote: >>>>>>=20 >>>>>>>=20 >>>>>>> On Apr 24, 2016, at 9:35 AM, Dan Langille = wrote: >>>>>>>=20 >>>>>>> More of the pasted output is also at >>>>> https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae >>>>> = >>>>> and added smartctl output. >>>>>>>=20 >>>>>>> I have a FreeBSD 10.2-RELEASE-p14 box in which there is an LSI >>>>>>> SAS2008 >>>>> card. It's running a zfs root system. >>>>>>>=20 >>>>>>> This morning the system was unresponsive via ssh. Attempts to = log >>>>>>> in at >>>>> the console did not yield a password prompt. >>>>>>>=20 >>>>>>> A power cycle brought the system online. Inspecting >>>>>>> /var/log/messages, >>>> I >>>>> found about 63,000 entries similar to those which appear below. >>>>>>>=20 >>>>>>> zpool status of all are OK. A scrub is in progress for one pool >>>>>>> (since >>>> before >>>>> this issue arose). da7 is in that pool. >>>>>>>=20 >>>>>>>=20 >>>>>>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: = 28 >>>>>>> 00 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc = 804b >>>>>>> scsi >>>>>>> 0 state c xfer 0 Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): >>>>>>> READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID = 614 >>>>>>> terminated ioc 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew >>>>>>> kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 = 00 >>>>>>> 20 >>>>>>> 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer = 0 >>>>>>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: = 28 >>>>>>> 00 8b d9 97 08 00 00 20 00 length 16384 SMID 974 terminated ioc >>>>>>> 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew kernel: >> (da7:mps1:0:17:0): >>>>>>> READ(10). CDB: 28 00 8b 6f ef 50 00 00 08 00 length 4096 SMID = 674 >>>>>>> terminated ioc 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew >>>>>>> kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 8b 0f a2 48 00 = 00 >>>>>>> 18 >>>>>>> 00 length 12288 SMID 177 terminated ioc 804b scsi 0 state c xfer >>>>>>> 12288 Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). = CDB: >>>>>>> 28 00 ab 8f a1 38 00 00 08 00 length 4096 SMID 908 terminated = ioc >>>>>>> 804b scsi 0 state c xfer 0 Apr 24 11:25:56 knew kernel: >>>>>>> (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 >>>>>>> length 16384 SMID 376 terminated ioc 804b scsi 0 state c xfer 0 = Apr >>>>>>> 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 = 8b >>>>>>> d9 97 50 00 00 20 00 length 16384 SMID 172 terminated ioc 804b = scsi >>>>>>> 0 state c xfer 0 >>>>>>>=20 >>>>>>> Is this a cabling issue? The drive is a SATA device (smartctl >>>>>>> output >>>> in the >>>>> URL above). Anyone familiar with these errors? >>>>>>=20 >>>>>> This morning: >>>>>>=20 >>>>>> 13410079654596185797 REMOVED 0 0 0 was /dev/da7p3 >>>>>>=20 >>>>>> At least I know i'm looking for Serial Number: 13Q8PNBYS >>>>>>=20 >>>>>> =46rom the logs: >>>>>>=20 >>>>>> Apr 25 05:34:50 knew kernel: da7 at mps1 bus 0 scbus1 target 17 = lun >>>>>> 0 Apr 25 05:34:50 knew kernel: da7: = s/n >>>>> 13Q8PNBYS detached >>>=20 >>> Just for the record, this happened again this morning. Fixed by = power >> cycle. >>>=20 >>> May 30 03:22:08 knew kernel: mps1: mpssas_prepare_remove: Sending = reset >>> for target ID 17 May 30 03:22:10 knew kernel: da7 at mps1 bus 0 = scbus1 >> target >>> 17 lun 0 >>> May 30 03:22:10 knew kernel: da7: s/n >>> 13Q8PNBYS detached >>> May 30 03:22:10 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 = 8c 5c >>> 91 c0 00 00 08 00 length 4096 SMID 179 terminated ioc 804b scsi 0 = state c >> xfer >>> 0 May 30 03:22:10 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a = 00 6b >>> bf db a0 00 00 f0 00 length 122880 SMID 938 terminated ioc 804b scsi = 0 >> state c >>> xf(da7:mps1:0:17:0): READ(10). CDB: 28 00 8c 5c 91 c0 00 00 08 00 = May 30 >>> 03:22:10 knew kernel: er 122880 >>>=20 >> I just realized that you're using mps, not mpr. The fix went into = the mpr >> driver, but not mps yet. It'll have to be ported over to mps. >=20 > This hit me again last night. Same drive again. Power cycle cleared = it. >=20 > Now I'm wondering if it's heat or dud drive related. It might be the heat. It recurred three times today. I replaced the = SATA cable after the third incident.