From owner-freebsd-bugs@freebsd.org Tue Sep 29 06:51:44 2015 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A06D4A0A7B1 for ; Tue, 29 Sep 2015 06:51:44 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 855091FBF for ; Tue, 29 Sep 2015 06:51:44 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8T6pi38013334 for ; Tue, 29 Sep 2015 06:51:44 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 191348] [mps] LSI2308 with WD3000FYYZ drives disappears after hotswapping Date: Tue, 29 Sep 2015 06:51:43 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-RELEASE X-Bugzilla-Keywords: patch-ready X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: Karli.Sjoberg@slu.se X-Bugzilla-Status: In Progress X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Sep 2015 06:51:44 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D191348 --- Comment #22 from Karli.Sjoberg@slu.se --- No it can=C2=B4t, it=C2=B4s not really fixed. We have upgraded several of our systems to have this driver and also flashed the firmware of our HBA's to P19. Tried to flash with firmware 20.00.04.00 = to match the 20.00.00.00 driver as well, but then ZFS went nuts displaying checksum errors all over. Reverting to P19 fixed that. I have captured what happened the last time a drive (WD40EZRX) went bye-bye: Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): SYNCHRONIZE CACHE(10). CD= B: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 368 command timeout cm 0xfffffe0000cb8300 ccb 0xfffff80302ab4800 Sep 27 01:39:18 zfs1-1 kernel: (noperiph:mps0:0:4294967295:0): SMID 1 Abort= ing command 0xfffffe0000cb8300 Sep 27 01:39:18 zfs1-1 kernel: mps0: Sending reset from mpssas_send_abort f= or target ID 16 Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 3f d8 00 00 08 00 length 4096 SMID 411 command timeout cm 0xfffffe0000cbbb70 ccb 0xfffff80302355800 Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0= e 4c d8 00 00 80 00 length 65536 SMID 378 command timeout cm 0xfffffe0000cb9020 = ccb 0xfffff802e6b06000 Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0= e 4d d8 00 00 80 00 length 65536 SMID 404 command timeout cm 0xfffffe0000cbb240 = ccb 0xfffff800670db000 Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0= e 4d 58 00 00 80 00 length 65536 SMID 885 command timeout cm 0xfffffe0000ce2990 = ccb 0xfffff801eefbb000 Sep 27 01:39:18 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 8d 38 00 00 08 00 length 4096 SMID 234 command timeout cm 0xfffffe0000cad320 ccb 0xfffff8022997c000 Sep 27 01:39:20 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 8e 50 00 01 00 00 length 131072 SMID 79 command timeout cm 0xfffffe0000ca07= b0 ccb 0xfffff801ee18f800 Sep 27 01:39:20 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 8d 50 00 01 00 00 length 131072 SMID 218 command timeout cm 0xfffffe0000cab= e20 ccb 0xfffff80067d7c800 Sep 27 01:39:21 zfs1-1 kernel: mps0: mpssas_prepare_remove: Sending reset f= or target ID 16 Sep 27 01:39:21 zfs1-1 kernel: da9 at mps0 bus 0 scbus0 target 16 lun 0 Sep 27 01:39:21 zfs1-1 kernel: da9: s/n=20=20= =20=20=20 WD-WCC4E4YSXNYH detached Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 8e 50 00 01 00 00 length 131072 SMID 79 terminated ioc 804b scsi 0 state c = xfer 0 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 8d 50 00 01 00 00 length 131072 SMID 218 terminated ioc 804b scsi 0 state c xf(da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e 5f 8e 50 00 01 00 00=20 Sep 27 01:39:22 zfs1-1 kernel: er 0 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional= ly Re-queue Request Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 8d 38 00 00 08 00 length 4096 SMID 234 terminated ioc 804b scsi 0 state c xfer(da9: 0 Sep 27 01:39:22 zfs1-1 kernel: mps0:0: (da9:mps0:0:16:0): READ(10). CDB: 2= 8 00 13 0e 4d 58 00 00 80 00 length 65536 SMID 885 terminated ioc 804b scsi 0 st= ate c xfer16: 0 Sep 27 01:39:22 zfs1-1 kernel: 0): (da9:mps0:0:16:0): READ(10). CDB: 2= 8 00 13 0e 4c d8 00 00 80 00 length 65536 SMID 378 terminated ioc 804b scsi 0 st= ate c xferError 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: 0 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 8d 50 00 01 00 00=20 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0= e 4d d8 00 00 80 00 length 65536 SMID 404 terminated ioc 804b scsi 0 state c xfer(da9:mps0:0:16:0): CAM status: Unconditionally Re-queue Request Sep 27 01:39:22 zfs1-1 kernel: 0 Sep 27 01:39:22 zfs1-1 kernel: (da9: (da9:mps0:0:16:0): WRITE(10). CDB: = 2a 00 1e 5f 3f d8 00 00 08 00 length 4096 SMID 411 terminated ioc 804b scsi 0 state c xfermps0:0: 0 Sep 27 01:39:22 zfs1-1 kernel: 16:mps0: 0): IOCStatus =3D 0x4b while resett= ing device 0x14 Sep 27 01:39:22 zfs1-1 kernel: Error 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: mps0: (da9:mps0:0:16:0): WRITE(10). CDB: 2a = 00 1e 5f 8d 38 00 00 08 00=20 Sep 27 01:39:22 zfs1-1 kernel: Unfreezing devq for target ID 16 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional= ly Re-queue Request Sep 27 01:39:22 zfs1-1 kernel: mps0: (da9:Unfreezing devq for target ID 16 Sep 27 01:39:22 zfs1-1 kernel: mps0:0:16:0): Error 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0= e 4d 58 00 00 80 00=20 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional= ly Re-queue Request Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0= e 4c d8 00 00 80 00=20 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional= ly Re-queue Request Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): READ(10). CDB: 28 00 13 0= e 4d d8 00 00 80 00=20 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional= ly Re-queue Request Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 1e = 5f 3f d8 00 00 08 00=20 Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Unconditional= ly Re-queue Request Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): SYNCHRONIZE CACHE(10). CD= B: 35 00 00 00 00 00 00 00 00 00=20 Sep 27 01:39:22 zfs1-1 kernel: ctl_datamove: tag 0x183baa00 on (0:9:0:0) aborted Sep 27 01:39:22 zfs1-1 kernel: ctl_datamove: tag 0x433baa00 on (0:9:0:0) aborted Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): CAM status: Command timeo= ut Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Error 5, Periph was invalidated Sep 27 01:39:22 zfs1-1 kernel: (da9:mps0:0:16:0): Periph destroyed Sep 27 01:39:21 zfs1-1 devd: Executing 'logger -p kern.notice -t ZFS 'vdev = is removed, pool_guid=3D11769113696885915207 vdev_guid=3D10111278074591061297'' Sep 27 01:39:21 zfs1-1 ZFS: vdev is removed, pool_guid=3D117691136968859152= 07 vdev_guid=3D10111278074591061297 This server is running 10.1-STABLE r281643, close to 10.2-RELEASE. When reinserting a new SATA drive that has never previously been in the system, nothing prints in the logs and that bay is "blocked" until you reboot the server. We have also added 'dev.mps.0.spinup_wait_time=3D"5"' to loader.con= f and it haven=C2=B4t made any difference. The number of drives in this system ar= e only 14 so I don=C2=B4t think extending the timeout makes any difference in this ca= se. Oddly enough, I had the opportunity to test inserting a SAS drive and it successfully showed up in the OS, so whatever happens is only affecting SAT= A. Any way of further debugging this serious problem would be greatly apprecia= ted! /K --=20 You are receiving this mail because: You are the assignee for the bug.=