Date: Thu, 27 Oct 2016 08:04:44 -0400 From: Jeremy Beker <gothmog@confusticate.com> To: freebsd-scsi@freebsd.org Subject: FreeBSD 11.0 and LSI SAS3081E losing all devices Message-ID: <FE314B05-A666-4B90-BEA4-534406C1D424@confusticate.com>
next in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Good Morning! Since upgrading my home server from 10.3 to 11.0-RELEASE-p1 about a week ago, I have twice had a serious problem where my LSI adapter is having errors and dropping all the drives out of my ZFS pool. Hardware: - LSI SAS3081E-R PCI-E card with the IT firmware loaded - 6x2TB WD Black drives - 1 SSD - Supermicro X10SLL-F MB (not sure that is relevant) This system has been running with this exact hardware for about a year with no problems under the 10.X versions of FreeBSD. Last weekend, I upgraded the system to 11.0-RELEASE-p1. Since then, twice, all of the drives have been marked as unavailable to ZFS after generating a stream of errors. The problems start with a number of errors like this: Oct 26 03:28:29 rivendell kernel: mpt0: request 0xfffffe0000f73058:57643 timed out for ccb 0xfffff803456ea000 (req->ccb 0xfffff803456ea000) Oct 26 03:28:29 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000f73058:57643 function 0 Oct 26 03:28:29 rivendell kernel: mpt0: completing timedout/aborted req 0xfffffe0000f73058:57643 Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04 c4 91 c0 00 00 08 00 Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): CAM status: CCB request terminated by the host Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): mpt0: Retrying command Oct 26 03:28:29 rivendell kernel: abort of req 0xfffffe0000f73058:0 completed Oct 26 03:28:49 rivendell kernel: mpt0: request 0xfffffe0000f6c3b0:57658 timed out for ccb 0xfffff803456ea000 (req->ccb 0xfffff803456ea000) Oct 26 03:28:49 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000f6c3b0:57658 function 0 Oct 26 03:28:49 rivendell kernel: mpt0: completing timedout/aborted req 0xfffffe0000f6c3b0:57658 Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04 c4 91 c0 00 00 08 00 Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): CAM status: CCB request terminated by the host Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): Retrying command Oct 26 03:28:49 rivendell kernel: mpt0: abort of req 0xfffffe0000f6c3b0:0 completed Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04 c4 91 c0 00 00 08 00 Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): CAM status: SCSI Status Error Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): SCSI status: Check Condition Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): Retrying command (per sense data) Also these: Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): CAM status: SCSI Status Error Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SCSI status: Check Condition Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): Error 6, Retries exhausted Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): Invalidating pack After a bunch of rounds of the errors above, I get this: Oct 26 03:35:17 rivendell kernel: mpt0: request 0xfffffe0000f73350:62027 timed out for ccb 0xfffff800160ce000 (req->ccb 0xfffff800160ce000) Oct 26 03:35:17 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000f73350:62027 function 0 Oct 26 03:35:18 rivendell kernel: mpt0: mpt_wait_req(1) timed out Oct 26 03:35:18 rivendell kernel: mpt0: mpt_recover_commands: abort timed-out. Resetting controller Oct 26 03:35:18 rivendell kernel: mpt0: mpt_cam_event: 0x0 Oct 26 03:35:18 rivendell kernel: mpt0: mpt_cam_event: 0x0 Oct 26 03:35:18 rivendell kernel: mpt0: completing timedout/aborted req 0xfffffe0000f73350:62027 After which all the drives seem to disappear and the system detaches all of them: Oct 26 03:35:33 rivendell kernel: da1 at mpt0 bus 0 scbus0 target 14 lun 0 Oct 26 03:35:33 rivendell kernel: da1: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WMAY01559141 detached Oct 26 03:35:33 rivendell kernel: da2 at mpt0 bus 0 scbus0 target 15 lun 0 Oct 26 03:35:33 rivendell kernel: da2: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WMAY01603430 detached Oct 26 03:35:33 rivendell kernel: da5 at mpt0 bus 0 scbus0 target 18 lun 0 Oct 26 03:35:33 rivendell kernel: da5: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WMAY01159727 detached Oct 26 03:35:33 rivendell kernel: da6 at mpt0 bus 0 scbus0 target 19 lun 0 Oct 26 03:35:33 rivendell kernel: da6: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WMAY02971691 detached Oct 26 03:35:33 rivendell kernel: da4 at mpt0 bus 0 scbus0 target 17 lun 0 Oct 26 03:35:33 rivendell kernel: da4: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WMAY01470856 detached Oct 26 03:35:33 rivendell kernel: da3 at mpt0 bus 0 scbus0 target 16 lun 0 Oct 26 03:35:33 rivendell kernel: da3: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WMAY01602648 detached At this point I have had to reboot the server and then all the drives magically reappear. Any help would be greatly appreciated. -Jeremy -- Jeremy Beker - @gothmog http://www.confusticate.com Condensing fact from the vapor of nuance. [-- Attachment #2 --] 0 *H 010 + 0 *H 00>h3rاV}0 *H 0u10 UIL10U StartCom Ltd.1)0'U StartCom Certification Authority1#0!UStartCom Class 1 Client CA0 160327120425Z 170327120425Z0L1!0Ugothmog@confusticate.com1'0% *H gothmog@confusticate.com0"0 *H 0 J!u-Tּ~s=nQx}l479` I-QU?{%G8 aR/s:ۘ>%?-}MD}X[mqTLbV˘BM:8Q&k b~)߇"%^VF$ y}~o>'0:ry+M q&tt-OǓuEPZ0@Lk=RD7=I6 LO+9|39 00U0U%0++0 U0 0UZ~v8+g5EA+0U#0$l9aIF+('Hmh0o+c0a0$+0http://ocsp.startssl.com09+0-http://aia.startssl.com/certs/sca.client1.crt08U10/0-+)'http://crl.startssl.com/sca-client1.crl0#U0gothmog@confusticate.com0#U0http://www.startssl.com/0FU ?0=0;+70,0*+http://www.startssl.com/policy0 *H [}=X2}'FPa~8|t&bUIH{Hʻ<c!Vcr1^s͍mgs *Ă!joA-`~o8o`qFM` 9ʉ/]? ߤZc yV;maBCW% [[Na#dEz'fK"Kl;NN|ېd ].@jQ}P00ʠk} Q Y0 *H 0}10 UIL10U StartCom Ltd.1+0)U"Secure Digital Certificate Signing1)0'U StartCom Certification Authority0 151216010005Z 301216010005Z0u10 UIL10U StartCom Ltd.1)0'U StartCom Certification Authority1#0!UStartCom Class 1 Client CA0"0 *H 0 }â}[[_u$Wy5 |̔ vnqY)\aL$dYG|B"QǤĩVD#'F k9O_]*ςz_kU.u3r #:C<ogT)K Xah8v[\KqdlO)3+u7J5";[vfL/"2ϩJ#4ד[U TB,a˖a7H< =q d0`0U0U%0++0U0 02U+0)0'%#!http://crl.startssl.com/sfsca.crl0f+Z0X0$+0http://ocsp.startssl.com00+0$http://aia.startssl.com/certs/ca.crt0U$l9aIF+('Hmh0U#0N@[i04hCA0?U 80604U 0,0*+http://www.startssl.com/policy0 *H [#'#4pnRۡЗN⛭`]K"#H*߷Թψ;UA8Ҟeg{ozmYE60A)wXRK6c^-Al^k[':G=;oLv{$B5;8b,ZP4{o[-j m)[땭[4 s.c|ҴvYLJ<|ӯgu0jD2 @hl+:j\ze_ևa@HyMHINxpK?% 㤺RC:=?^&7m´)A2;E~VB1$EvcKj؝(OoپU`"$a;ҡj0$&<$ۊ+/xjzb,7}W*1ܺtDv#8K %^P>/i?)yRuQg^z`~sP91N0J00u10 UIL10U StartCom Ltd.1)0'U StartCom Certification Authority1#0!UStartCom Class 1 Client CA>h3rاV}0 + 0 *H 1 *H 0 *H 1 161027120445Z0# *H 1D_ K?O P0 +7100u10 UIL10U StartCom Ltd.1)0'U StartCom Certification Authority1#0!UStartCom Class 1 Client CA>h3rاV}0*H 10u10 UIL10U StartCom Ltd.1)0'U StartCom Certification Authority1#0!UStartCom Class 1 Client CA>h3rاV}0 *H G_"-&u,WļqUƤkeXEoLbYV:KGqE NsNOt³ ƅG@70c eˀs~ r5{+n̠7 _w:%YD^X75/%eG[N*7bI68Nf=7(YmjĹ2tFV(Yia)Հh֎`p
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE314B05-A666-4B90-BEA4-534406C1D424>
