From owner-freebsd-scsi@freebsd.org Tue May 10 10:12:52 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 77C1FB34DB1 for ; Tue, 10 May 2016 10:12:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6888D19B3 for ; Tue, 10 May 2016 10:12:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4AACq6B023459 for ; Tue, 10 May 2016 10:12:52 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 208798] [iscsi] MaxRecvDataSegmentLength and sockbufsize ignored Date: Tue, 10 May 2016 10:12:52 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: trasz@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: trasz@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2016 10:12:52 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D208798 Edward Tomasz Napierala changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |trasz@FreeBSD.org Assignee|freebsd-scsi@FreeBSD.org |trasz@FreeBSD.org --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-scsi@freebsd.org Tue May 10 11:10:08 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D8A1B3505A for ; Tue, 10 May 2016 11:10:08 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 263551A95 for ; Tue, 10 May 2016 11:10:07 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 3EBFE213F2 for ; Tue, 10 May 2016 07:10:01 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute2.internal (MEProxy); Tue, 10 May 2016 07:10:01 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=CWMzQwFwTju9E/4 m9s4dtJtesuE=; b=noPHjs2c78RdoGVqPdEaAMRcU8w2xqQfDe9VrEKxiWeEwbT yZK9e9EuTdw2K3B7OiXuZvIq2BBcfoy9h7BdcIS96IMoIJR/Fc1aECd8lC0JaSY5 VA73cNvrSZDnqkxzniVzZmqUbBLRYdsDC8djgFS2bgpKxZSYAvL6D77O5T8M= X-Sasl-enc: JcZ0q3OJWMIfUgQCZh5kUAIGu7c1qqFulInJ9iN0zDtO 1462878600 Received: from [10.0.0.5] (c-73-88-28-163.hsd1.mn.comcast.net [73.88.28.163]) by mail.messagingengine.com (Postfix) with ESMTPA id E2DDBC0001B; Tue, 10 May 2016 07:10:00 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 From: Josh Paetzel X-Mailer: iPhone Mail (13E238) In-Reply-To: <3D06CE25-4159-4F30-A8C5-8A188144681B@yahoo.com> Date: Tue, 10 May 2016 06:10:00 -0500 Cc: freebsd-scsi@freebsd.org, Scott Long Content-Transfer-Encoding: quoted-printable Message-Id: <0BA358CB-779B-460E-9FFC-FA033C136825@tcbug.org> References: <2E8752E5-76AF-4042-86D9-8C6733658A80@langille.org> <5EEF0794-B06E-4A72-89DA-7DCD94AE1FC6@langille.org> <072CEC8B-9392-4378-8DF5-63D05901850B@langille.org> <0d7401d19f10$ee329300$ca97b900$@broadcom.com> <3D06CE25-4159-4F30-A8C5-8A188144681B@yahoo.com> To: Stephen McConnell X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2016 11:10:08 -0000 > On Apr 25, 2016, at 12:30 PM, Scott Long via freebsd-scsi wrote: >=20 >=20 >> On Apr 25, 2016, at 10:38 AM, Stephen McConnell via freebsd-scsi wrote: >>=20 >>=20 >>=20 >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of Dan Langille >>> Sent: Monday, April 25, 2016 9:40 AM >>> To: freebsd-scsi@freebsd.org >>> Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 >>>=20 >>>>> On Apr 25, 2016, at 8:17 AM, Dan Langille wrote: >>>>>=20 >>>>>=20 >>>>> On Apr 24, 2016, at 9:35 AM, Dan Langille wrote: >>>>>=20 >>>>> More of the pasted output is also at >>> https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae >>> >>> and added smartctl output. >>>>>=20 >>>>> I have a FreeBSD 10.2-RELEASE-p14 box in which there is an LSI SAS2008= >>> card. It's running a zfs root system. >>>>>=20 >>>>> This morning the system was unresponsive via ssh. Attempts to log in a= t >>> the console did not yield a password prompt. >>>>>=20 >>>>> A power cycle brought the system online. Inspecting /var/log/messages= , >> I >>> found about 63,000 entries similar to those which appear below. >>>>>=20 >>>>> zpool status of all are OK. A scrub is in progress for one pool (since= >> before >>> this issue arose). da7 is in that pool. >>>>>=20 >>>>>=20 >>>>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 >>>>> 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc 804b scsi >>>>> 0 state c xfer 0 Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): >>>>> READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 614 >>>>> terminated ioc 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew >>>>> kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 >>>>> 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer 0 >>>>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 >>>>> 8b d9 97 08 00 00 20 00 length 16384 SMID 974 terminated ioc 804b >>>>> scsi 0 state c xfer 0 Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): >>>>> READ(10). CDB: 28 00 8b 6f ef 50 00 00 08 00 length 4096 SMID 674 >>>>> terminated ioc 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew >>>>> kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 8b 0f a2 48 00 00 18 >>>>> 00 length 12288 SMID 177 terminated ioc 804b scsi 0 state c xfer >>>>> 12288 Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: >>>>> 28 00 ab 8f a1 38 00 00 08 00 length 4096 SMID 908 terminated ioc >>>>> 804b scsi 0 state c xfer 0 Apr 24 11:25:56 knew kernel: >>>>> (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 >>>>> length 16384 SMID 376 terminated ioc 804b scsi 0 state c xfer 0 Apr >>>>> 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b >>>>> d9 97 50 00 00 20 00 length 16384 SMID 172 terminated ioc 804b scsi 0 >>>>> state c xfer 0 >>>>>=20 >>>>> Is this a cabling issue? The drive is a SATA device (smartctl output >> in the >>> URL above). Anyone familiar with these errors? >>>>=20 >>>> This morning: >>>>=20 >>>> 13410079654596185797 REMOVED 0 0 0 was /dev/da7p3 >>>>=20 >>>> At least I know i'm looking for Serial Number: 13Q8PNBYS >>>>=20 >>>> =46rom the logs: >>>>=20 >>>> Apr 25 05:34:50 knew kernel: da7 at mps1 bus 0 scbus1 target 17 lun 0 >>>> Apr 25 05:34:50 knew kernel: da7: s/n >>> 13Q8PNBYS detached >>>> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 >>>> d8 33 53 e0 00 00 08 00 length 4096 SMID 88 terminated ioc 804b scsi 0 >>>> state c xfer 0 Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): >>>> READ(10). CDB: 28 00 d8 33 26 f8 00 00 20 00 length 16384 SMID 204 >>>> terminated ioc 804b scsi 0 state c xfer(da7:mps1:0:17:0): READ(10). CDB= : >> 28 >>> 00 d8 33 53 e0 00 00 08 00 Apr 25 05:34:51 knew kernel: 0 Apr 25 05:34:5= 1 >>> knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue >>> Request Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 2= 8 >>> 00 d8 33 26 d8 00 00 20 00 length 16384 SMID 260 terminated ioc 804b scs= i >> 0 >>> state c xfer(da7: 0 >>>> Apr 25 05:34:51 knew kernel: mps1:0: (da7:mps1:0:17:0): READ(10). >> CDB: >>> 28 00 e6 6c 42 40 00 00 10 00 length 8192 SMID 484 terminated ioc 804b >> scsi 0 >>> state c xfer 17:0 >>>> Apr 25 05:34:51 knew kernel: 0): (da7:mps1:0:17:0): WRITE(10). >> CDB: 2a >>> 00 e4 d8 2a 90 00 00 90 00 length 73728 SMID 548 terminated ioc 804b scs= i >> 0 >>> state c xfeError 5, Periph was invalidated >>>> Apr 25 05:34:51 knew kernel: r 0 >>>> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 >>>> d8 33 26 f8 00 00 20 00 Apr 25 05:34:51 knew kernel: >>>> (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 08 00 length >>>> 4096 SMID 435 terminated ioc 804b scsi 0 state c xfer >>>> (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request Apr 25 >>>> 05:34:51 knew kernel: 0 Apr 25 05:34:51 knew kernel: (da7:mps1: >>>> mps1:0:IOCStatus =3D 0x4b while resetting device 0xa Apr 25 05:34:51 >>>> knew kernel: 17:mps1: 0): Unfreezing devq for target ID 17 Apr 25 >>>> 05:34:51 knew kernel: Error 5, Periph was invalidated Apr 25 05:34:51 >>>> knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 d8 00 00 >>>> 20 00 Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: >>>> Unconditionally Re-queue Request Apr 25 05:34:51 knew kernel: >>>> (da7:mps1:0:17:0): Error 5, Periph was invalidated Apr 25 05:34:51 >>>> knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 e6 6c 42 40 00 00 >>>> 10 00 Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: >>>> Unconditionally Re-queue Request Apr 25 05:34:51 knew kernel: >>>> (da7:mps1:0:17:0): Error 5, Periph was invalidated Apr 25 05:34:51 >>>> knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 e4 d8 2a 90 00 >>>> 00 90 00 Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: >>> Unconditionally Re-queue Request Apr 25 05:34:51 knew kernel: >>> (da7:mps1:0:17:0): Error 5, Periph was invalidated Apr 25 05:34:51 knew >>> kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 08 00 >> Apr >>> 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally >> Re- >>> queue Request Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, >>> Periph was invalidated Apr 25 05:34:51 knew kernel: GEOM_MIRROR: Device >>> swap: provider da7p2 disconnected. >>>> Apr 25 05:34:51 knew devd: Executing 'logger -p kern.notice -t ZFS 'vde= v >> is >>> removed, pool_guid=3D15378250086669402288 >>> vdev_guid=3D13410079654596185797'' >>>> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Periph destroyed Apr >>>> 25 05:34:51 knew ZFS: vdev is removed, pool_guid=3D15378250086669402288= >>>> vdev_guid=3D13410079654596185797 >>>=20 >>> Current status: after powering off the box, reseating the cables for tha= t >> drive, I >>> powered up the system and a resilver commenced which completed in 30 >>> minutes. >>>=20 >>> Seems OK now. I am not sure if the two events are related. >>=20 >> Recently, a bug was uncovered where a device is gets 'lost'. >>=20 >> Here's what happens: >> The first message in your "failure on Monday" log is for >> 'mpssas_prepare_remove'. This message is likely logged because the FW se= nds >> an event to the driver that the device is no longer responsive (pulled, >> cable issue, or something else). When the driver gets this event, it sen= ds >> a reset to the device to clear out any pending I/O. This is where the >> 'terminated ioc' messages come from. When the reset completes, the drive= r >> is supposed to send a SAS_IO_UNIT message to FW so that the DevHandle for= >> that disk is removed from FW's list. Then, when the device comes back >> on-line, everything is fine. But, with this bug, before that SAS_IO_UNIT >> message is sent to FW, the driver exits the function where that happens >> (mpssas_remove_device). This happens where you see the log message, >> "IOCStatus - 0x4b while resetting device 0x0a". The driver logs that >> message and then exits. What the driver should do is log that message an= d >> continue on to send the SAS_IO_UNIT message to FW. The fix is to remove t= he >> two lines in the driver shown here with '>>': >>=20 >> if (le16toh(reply->IOCStatus) !=3D MPI2_IOCSTATUS_SUCCESS) { >> mps_printf(sc, "IOCStatus =3D 0x%x while resetting device >> 0x%x\n", >> le16toh(reply->IOCStatus), handle); >>>> mpssas_free_tm(sc, tm); >>>> return; >> } >>=20 >> A reboot will solve the problem, as you saw, but the real fix is to remov= e >> the DevHandle as described above. This fix will go into the driver befor= e >> the next scheduled release and then MFC'd to 10.x. >=20 > Thanks for the diagnosis, Steve. I forgot about that case. We should als= o make > this chain of events more evident in the syslog, it=E2=80=99s very confusi= ng when it happens. > I=E2=80=99m not exactly sure yet what it should look like. >=20 > Scott >=20 Steve, We are seeing this same error case for mpr and mps in FreeNAS. We are decoup= led from FreeBSD releases and have an auto-update mechanism. I'd be very happy to test patches against stable/10 on our nightly users.=20= From owner-freebsd-scsi@freebsd.org Tue May 10 15:15:22 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82B80B36C14 for ; Tue, 10 May 2016 15:15:22 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-pf0-x231.google.com (mail-pf0-x231.google.com [IPv6:2607:f8b0:400e:c00::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5C05E1F04 for ; Tue, 10 May 2016 15:15:22 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-pf0-x231.google.com with SMTP id 77so6896076pfv.2 for ; Tue, 10 May 2016 08:15:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:to:cc:references:in-reply-to:subject:date:message-id :mime-version:content-transfer-encoding:thread-index :content-language; bh=QMfFCAy1GXqSXxhH4l4rw+Uce76jdMqMoscICEcjIH0=; b=Zf3bp2/Q3iJfacQ5Ov6GO0vx1YBnb7rbYQDZzoKrUvSU4GH7rKacNpRey6874JMLhM e3lqLvsNoI7VnoSzwsHtdeystHDuHdEFgMBNoxtSI8t6buRQtttBL29JR4HYkWmPcX7I DJrUQPSYDib9FeFgFBvVP9w/C97np1EG4pP7s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:references:in-reply-to:subject:date :message-id:mime-version:content-transfer-encoding:thread-index :content-language; bh=QMfFCAy1GXqSXxhH4l4rw+Uce76jdMqMoscICEcjIH0=; b=G7F2oOnomIQ+Fnl7W4YPziAPcdYvg7ml/c9adfLDfgLiyyD0D5UWOFwjFn1l9gC5ZC eqM8xKDcWVEbSyanXBUm0120jvwZjZkDFaaFYl/t/6X+XwL1gZRo1g6SGRANA+PYpAFu db6KQ+80UIzHj0a30b7XAlI9AOSE+mfkZX7lfdzzl9zd7aPQ9kExtGUTjXSCoHpmr3OH 2QdUr2CNGwHnVqmjipv3fy661lqeiLIlA3gC5EI/BQ9pPFtGKGb7nvMQFpcYq8uVuHrX G+O5vyohjeBSM7EUiKgzy4WpLmbIJjJLP7hYVGiqReWd5JvCNx3r450Xbcd6vj1nEwOW 7wkw== X-Gm-Message-State: AOPr4FX/lkFJvw2NOXi8bWK+Eq3TzFVO2+7wbnlccFNcr1DGDG+EXOgErnGv9Kn9VFy9214D X-Received: by 10.98.49.133 with SMTP id x127mr57285619pfx.36.1462893321361; Tue, 10 May 2016 08:15:21 -0700 (PDT) Received: from C5SDN12 ([192.19.220.253]) by smtp.gmail.com with ESMTPSA id 76sm5298742pfz.44.2016.05.10.08.15.19 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 May 2016 08:15:20 -0700 (PDT) From: "Stephen McConnell" To: "'Josh Paetzel'" Cc: , "'Scott Long'" References: <2E8752E5-76AF-4042-86D9-8C6733658A80@langille.org> <5EEF0794-B06E-4A72-89DA-7DCD94AE1FC6@langille.org> <072CEC8B-9392-4378-8DF5-63D05901850B@langille.org> <0d7401d19f10$ee329300$ca97b900$@broadcom.com> <3D06CE25-4159-4F30-A8C5-8A188144681B@yahoo.com> <0BA358CB-779B-460E-9FFC-FA033C136825@tcbug.org> In-Reply-To: <0BA358CB-779B-460E-9FFC-FA033C136825@tcbug.org> Subject: RE: terminated ioc 804b scsi 0 state c xfer 0 Date: Tue, 10 May 2016 09:15:18 -0600 Message-ID: <22c001d1aace$c41ffa70$4c5fef50$@broadcom.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQJ4ucc4xV6h5VXTBEGxwBGwzw4I/wFGbfNtAl4WzZcCWIVPzgJCjEJAAVcibJ2eF3OGsA== Content-Language: en-us X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2016 15:15:22 -0000 > -----Original Message----- > From: Josh Paetzel [mailto:josh@tcbug.org] > Sent: Tuesday, May 10, 2016 5:10 AM > To: Stephen McConnell > Cc: freebsd-scsi@freebsd.org; Scott Long > Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 >=20 >=20 >=20 > > On Apr 25, 2016, at 12:30 PM, Scott Long via freebsd-scsi scsi@freebsd.org> wrote: > > > > > >> On Apr 25, 2016, at 10:38 AM, Stephen McConnell via freebsd-scsi > wrote: > >> > >> > >> > >>> -----Original Message----- > >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > >>> scsi@freebsd.org] On Behalf Of Dan Langille > >>> Sent: Monday, April 25, 2016 9:40 AM > >>> To: freebsd-scsi@freebsd.org > >>> Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 > >>> > >>>>> On Apr 25, 2016, at 8:17 AM, Dan Langille = wrote: > >>>>> > >>>>> > >>>>> On Apr 24, 2016, at 9:35 AM, Dan Langille = wrote: > >>>>> > >>>>> More of the pasted output is also at > >>> https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae > >>> > > >>> and added smartctl output. > >>>>> > >>>>> I have a FreeBSD 10.2-RELEASE-p14 box in which there is an LSI > >>>>> SAS2008 > >>> card. It's running a zfs root system. > >>>>> > >>>>> This morning the system was unresponsive via ssh. Attempts to = log > >>>>> in at > >>> the console did not yield a password prompt. > >>>>> > >>>>> A power cycle brought the system online. Inspecting > >>>>> /var/log/messages, > >> I > >>> found about 63,000 entries similar to those which appear below. > >>>>> > >>>>> zpool status of all are OK. A scrub is in progress for one pool > >>>>> (since > >> before > >>> this issue arose). da7 is in that pool. > >>>>> > >>>>> > >>>>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: = 28 > >>>>> 00 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc > >>>>> 804b scsi > >>>>> 0 state c xfer 0 Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): > >>>>> READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID = 614 > >>>>> terminated ioc 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew > >>>>> kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 = 00 > >>>>> 20 > >>>>> 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer = 0 > >>>>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: = 28 > >>>>> 00 8b d9 97 08 00 00 20 00 length 16384 SMID 974 terminated ioc > >>>>> 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew kernel: > (da7:mps1:0:17:0): > >>>>> READ(10). CDB: 28 00 8b 6f ef 50 00 00 08 00 length 4096 SMID = 674 > >>>>> terminated ioc 804b scsi 0 state c xfer 0 Apr 24 11:25:55 knew > >>>>> kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 8b 0f a2 48 00 = 00 > >>>>> 18 > >>>>> 00 length 12288 SMID 177 terminated ioc 804b scsi 0 state c xfer > >>>>> 12288 Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). = CDB: > >>>>> 28 00 ab 8f a1 38 00 00 08 00 length 4096 SMID 908 terminated = ioc > >>>>> 804b scsi 0 state c xfer 0 Apr 24 11:25:56 knew kernel: > >>>>> (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 > >>>>> length 16384 SMID 376 terminated ioc 804b scsi 0 state c xfer 0 > >>>>> Apr > >>>>> 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 > >>>>> 8b > >>>>> d9 97 50 00 00 20 00 length 16384 SMID 172 terminated ioc 804b > >>>>> scsi 0 state c xfer 0 > >>>>> > >>>>> Is this a cabling issue? The drive is a SATA device (smartctl > >>>>> output > >> in the > >>> URL above). Anyone familiar with these errors? > >>>> > >>>> This morning: > >>>> > >>>> 13410079654596185797 REMOVED 0 0 0 was /dev/da7p3 > >>>> > >>>> At least I know i'm looking for Serial Number: 13Q8PNBYS > >>>> > >>>> From the logs: > >>>> > >>>> Apr 25 05:34:50 knew kernel: da7 at mps1 bus 0 scbus1 target 17 = lun > >>>> 0 Apr 25 05:34:50 knew kernel: da7: > s/n > >>> 13Q8PNBYS detached > >>>> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 > >>>> 00 > >>>> d8 33 53 e0 00 00 08 00 length 4096 SMID 88 terminated ioc 804b > >>>> scsi 0 state c xfer 0 Apr 25 05:34:51 knew kernel: = (da7:mps1:0:17:0): > >>>> READ(10). CDB: 28 00 d8 33 26 f8 00 00 20 00 length 16384 SMID = 204 > >>>> terminated ioc 804b scsi 0 state c xfer(da7:mps1:0:17:0): = READ(10). CDB: > >> 28 > >>> 00 d8 33 53 e0 00 00 08 00 Apr 25 05:34:51 knew kernel: 0 Apr 25 > >>> 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: = Unconditionally > >>> Re-queue Request Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): > >>> READ(10). CDB: 28 > >>> 00 d8 33 26 d8 00 00 20 00 length 16384 SMID 260 terminated ioc = 804b > >>> scsi > >> 0 > >>> state c xfer(da7: 0 > >>>> Apr 25 05:34:51 knew kernel: mps1:0: (da7:mps1:0:17:0): = READ(10). > >> CDB: > >>> 28 00 e6 6c 42 40 00 00 10 00 length 8192 SMID 484 terminated ioc > >>> 804b > >> scsi 0 > >>> state c xfer 17:0 > >>>> Apr 25 05:34:51 knew kernel: 0): (da7:mps1:0:17:0): = WRITE(10). > >> CDB: 2a > >>> 00 e4 d8 2a 90 00 00 90 00 length 73728 SMID 548 terminated ioc = 804b > >>> scsi > >> 0 > >>> state c xfeError 5, Periph was invalidated > >>>> Apr 25 05:34:51 knew kernel: r 0 > >>>> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 > >>>> 00 > >>>> d8 33 26 f8 00 00 20 00 Apr 25 05:34:51 knew kernel: > >>>> (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 08 00 > >>>> length > >>>> 4096 SMID 435 terminated ioc 804b scsi 0 state c xfer > >>>> (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request = Apr > >>>> 25 > >>>> 05:34:51 knew kernel: 0 Apr 25 05:34:51 knew kernel: (da7:mps1: > >>>> mps1:0:IOCStatus =3D 0x4b while resetting device 0xa Apr 25 = 05:34:51 > >>>> knew kernel: 17:mps1: 0): Unfreezing devq for target ID 17 Apr 25 > >>>> 05:34:51 knew kernel: Error 5, Periph was invalidated Apr 25 > >>>> 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 = 33 > >>>> 26 d8 00 00 > >>>> 20 00 Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: > >>>> Unconditionally Re-queue Request Apr 25 05:34:51 knew kernel: > >>>> (da7:mps1:0:17:0): Error 5, Periph was invalidated Apr 25 = 05:34:51 > >>>> knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 e6 6c 42 40 = 00 > >>>> 00 > >>>> 10 00 Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: > >>>> Unconditionally Re-queue Request Apr 25 05:34:51 knew kernel: > >>>> (da7:mps1:0:17:0): Error 5, Periph was invalidated Apr 25 = 05:34:51 > >>>> knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 e4 d8 2a 90 > >>>> 00 > >>>> 00 90 00 Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM = status: > >>> Unconditionally Re-queue Request Apr 25 05:34:51 knew kernel: > >>> (da7:mps1:0:17:0): Error 5, Periph was invalidated Apr 25 05:34:51 > >>> knew > >>> kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 = 08 > >>> 00 > >> Apr > >>> 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: > >>> Unconditionally > >> Re- > >>> queue Request Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): = Error > >>> 5, Periph was invalidated Apr 25 05:34:51 knew kernel: = GEOM_MIRROR: > >>> Device > >>> swap: provider da7p2 disconnected. > >>>> Apr 25 05:34:51 knew devd: Executing 'logger -p kern.notice -t = ZFS > >>>> 'vdev > >> is > >>> removed, pool_guid=3D15378250086669402288 > >>> vdev_guid=3D13410079654596185797'' > >>>> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Periph destroyed > >>>> Apr > >>>> 25 05:34:51 knew ZFS: vdev is removed, > >>>> pool_guid=3D15378250086669402288 > >>>> vdev_guid=3D13410079654596185797 > >>> > >>> Current status: after powering off the box, reseating the cables = for > >>> that > >> drive, I > >>> powered up the system and a resilver commenced which completed in = 30 > >>> minutes. > >>> > >>> Seems OK now. I am not sure if the two events are related. > >> > >> Recently, a bug was uncovered where a device is gets 'lost'. > >> > >> Here's what happens: > >> The first message in your "failure on Monday" log is for > >> 'mpssas_prepare_remove'. This message is likely logged because the > >> FW sends an event to the driver that the device is no longer > >> responsive (pulled, cable issue, or something else). When the = driver > >> gets this event, it sends a reset to the device to clear out any > >> pending I/O. This is where the 'terminated ioc' messages come = from. > >> When the reset completes, the driver is supposed to send a > >> SAS_IO_UNIT message to FW so that the DevHandle for that disk is > >> removed from FW's list. Then, when the device comes back on-line, > >> everything is fine. But, with this bug, before that SAS_IO_UNIT > >> message is sent to FW, the driver exits the function where that > >> happens (mpssas_remove_device). This happens where you see the log > >> message, "IOCStatus - 0x4b while resetting device 0x0a". The = driver > >> logs that message and then exits. What the driver should do is log > >> that message and continue on to send the SAS_IO_UNIT message to FW. > The fix is to remove the two lines in the driver shown here with '>>': > >> > >> if (le16toh(reply->IOCStatus) !=3D MPI2_IOCSTATUS_SUCCESS) { > >> mps_printf(sc, "IOCStatus =3D 0x%x while resetting device > >> 0x%x\n", > >> le16toh(reply->IOCStatus), handle); > >>>> mpssas_free_tm(sc, tm); > >>>> return; > >> } > >> > >> A reboot will solve the problem, as you saw, but the real fix is to > >> remove the DevHandle as described above. This fix will go into the > >> driver before the next scheduled release and then MFC'd to 10.x. > > > > Thanks for the diagnosis, Steve. I forgot about that case. We = should > > also make this chain of events more evident in the syslog, = it=E2=80=99s very confusing > when it happens. > > I=E2=80=99m not exactly sure yet what it should look like. > > > > Scott > > >=20 > Steve, >=20 > We are seeing this same error case for mpr and mps in FreeNAS. We are > decoupled from FreeBSD releases and have an auto-update mechanism. >=20 > I'd be very happy to test patches against stable/10 on our nightly = users. Hi Josh, this fix just went into head with commit r299274. I'll try to = get this into stable/10 next week. Do you need anything else? Steve From owner-freebsd-scsi@freebsd.org Tue May 10 17:34:01 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3F73B35E74 for ; Tue, 10 May 2016 17:34:01 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 729131230 for ; Tue, 10 May 2016 17:34:01 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id C86B120641 for ; Tue, 10 May 2016 13:33:59 -0400 (EDT) Received: from web2 ([10.202.2.212]) by compute2.internal (MEProxy); Tue, 10 May 2016 13:33:59 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=/0Tp32REbNg71gs pyryy6WXx5So=; b=elleTMdylUoYox1A2QyZ4sh0Ii42aZBosYzWxpWMvSbwQm8 m1tXDWsYCQ5NHcsN/csJUE+51Qgdcxv98kKTh9bEiDhHJuB84KyX2tFGKSr6vRDm 5L2iifRvtdUYSo3N921BsZqIhm5K/tG8/t+CaR3DsFcLV9RPtBvUOo8mstaw= Received: by web2.nyi.internal (Postfix, from userid 99) id 96093540200; Tue, 10 May 2016 13:33:59 -0400 (EDT) Message-Id: <1462901639.1965380.603786289.12B5909E@webmail.messagingengine.com> X-Sasl-Enc: wRi0BSUy3+ZE47FVjy9Kb+haETv/0/+NbApsN4dWPXHZ 1462901639 From: Josh Paetzel To: Stephen McConnell Cc: freebsd-scsi@freebsd.org, Scott Long MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-de9b841d Subject: Re: terminated ioc 804b scsi 0 state c xfer 0 Date: Tue, 10 May 2016 12:33:59 -0500 In-Reply-To: <22c001d1aace$c41ffa70$4c5fef50$@broadcom.com> References: <2E8752E5-76AF-4042-86D9-8C6733658A80@langille.org> <5EEF0794-B06E-4A72-89DA-7DCD94AE1FC6@langille.org> <072CEC8B-9392-4378-8DF5-63D05901850B@langille.org> <0d7401d19f10$ee329300$ca97b900$@broadcom.com> <3D06CE25-4159-4F30-A8C5-8A188144681B@yahoo.com> <0BA358CB-779B-460E-9FFC-FA033C136825@tcbug.org> <22c001d1aace$c41ffa70$4c5fef50$@broadcom.com> X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2016 17:34:01 -0000 > > Steve, > > > > We are seeing this same error case for mpr and mps in FreeNAS. We are > > decoupled from FreeBSD releases and have an auto-update mechanism. > > > > I'd be very happy to test patches against stable/10 on our nightly users. > > Hi Josh, this fix just went into head with commit r299274. I'll try to > get this into stable/10 next week. Do you need anything else? > > Steve > No, that's great. -- Thanks, Josh Paetzel From owner-freebsd-scsi@freebsd.org Tue May 10 21:29:51 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 98DA9B352D5 for ; Tue, 10 May 2016 21:29:51 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8970F1688 for ; Tue, 10 May 2016 21:29:51 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4ALTo0q003022 for ; Tue, 10 May 2016 21:29:51 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 204901] GEOM doesn't see new disk capacity after VMDK resize without a reboot Date: Tue, 10 May 2016 21:29:50 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: trasz@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: trasz@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2016 21:29:51 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D204901 Edward Tomasz Napierala changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |trasz@FreeBSD.org Assignee|freebsd-scsi@FreeBSD.org |trasz@FreeBSD.org --=20 You are receiving this mail because: You are the assignee for the bug.=