Date: Sun, 30 Oct 2016 07:52:00 -0400 From: Jeremy Beker <gothmog@confusticate.com> To: freebsd-stable@freebsd.org Subject: FreeBSD 11.0 and LSI SAS3081E losing all devices Message-ID: <FF400F3A-350A-4133-BED1-78087F1657F3@confusticate.com>
next in thread | raw e-mail | index | archive | help
--Apple-Mail-2B53EFC2-3302-4FCB-A6F6-4CDFECE20F4D Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Good Morning! Since upgrading my home server from 10.3 to 11.0-RELEASE-p1 about a week ago= , I have twice had a serious problem where my LSI adapter is having errors a= nd dropping all the drives out of my ZFS pool. Hardware: - LSI SAS3081E-R PCI-E card with the IT firmware loaded=20 - 6x2TB WD Black drives - 1 SSD - Supermicro X10SLL-F MB (not sure that is relevant)=20 This system has been running with this exact hardware for about a year with n= o problems under the 10.X versions of FreeBSD. Last weekend, I upgraded the s= ystem to 11.0-RELEASE-p1. Since then, twice, all of the drives have been mar= ked as unavailable to ZFS after generating a stream of errors. The problems start with a number of errors like this: Oct 26 03:28:29 rivendell kernel: mpt0: request 0xfffffe0000f73058:57643 tim= ed out for ccb 0xfffff803456ea000 (req->ccb 0xfffff803456ea000)=20 Oct 26 03:28:29 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000= f73058:57643 function 0=20 Oct 26 03:28:29 rivendell kernel: mpt0: completing timedout/aborted req 0xff= fffe0000f73058:57643=20 Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04= c4 91 c0 00 00 08 00=20 Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): CAM status: CCB request= terminated by the host=20 Oct 26 03:28:29 rivendell kernel: (da0:mpt0:0:10:0): mpt0: Retrying command=20= Oct 26 03:28:29 rivendell kernel: abort of req 0xfffffe0000f73058:0 complete= d=20 Oct 26 03:28:49 rivendell kernel: mpt0: request 0xfffffe0000f6c3b0:57658 tim= ed out for ccb 0xfffff803456ea000 (req->ccb 0xfffff803456ea000)=20 Oct 26 03:28:49 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000= f6c3b0:57658 function 0=20 Oct 26 03:28:49 rivendell kernel: mpt0: completing timedout/aborted req 0xff= fffe0000f6c3b0:57658=20 Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04= c4 91 c0 00 00 08 00=20 Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): CAM status: CCB request= terminated by the host=20 Oct 26 03:28:49 rivendell kernel: (da0:mpt0:0:10:0): Retrying command=20 Oct 26 03:28:49 rivendell kernel: mpt0: abort of req 0xfffffe0000f6c3b0:0 co= mpleted=20 Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): READ(10). CDB: 28 00 04= c4 91 c0 00 00 08 00=20 Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): CAM status: SCSI Status= Error=20 Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): SCSI status: Check Cond= ition=20 Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): SCSI sense: UNIT ATTENT= ION asc:29,0 (Power on, reset, or bus device reset occurred)=20 Oct 26 03:28:51 rivendell kernel: (da0:mpt0:0:10:0): Retrying command (per s= ense data)=20 Also these: Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SYNCHRONIZE CACHE(10). C= DB: 35 00 00 00 00 00 00 00 00 00 Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): CAM status: SCSI Status= Error Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SCSI status: Check Cond= ition Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): SCSI sense: UNIT ATTENT= ION asc:29,0 (Power on, reset, or bus device reset occurred) Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): Error 6, Retries exhaus= ted Oct 26 03:29:55 rivendell kernel: (da1:mpt0:0:14:0): Invalidating pack After a bunch of rounds of the errors above, I get this: Oct 26 03:35:17 rivendell kernel: mpt0: request 0xfffffe0000f73350:62027 tim= ed out for ccb 0xfffff800160ce000 (req->ccb 0xfffff800160ce000) Oct 26 03:35:17 rivendell kernel: mpt0: attempting to abort req 0xfffffe0000= f73350:62027 function 0 Oct 26 03:35:18 rivendell kernel: mpt0: mpt_wait_req(1) timed out Oct 26 03:35:18 rivendell kernel: mpt0: mpt_recover_commands: abort timed-ou= t. Resetting controller Oct 26 03:35:18 rivendell kernel: mpt0: mpt_cam_event: 0x0 Oct 26 03:35:18 rivendell kernel: mpt0: mpt_cam_event: 0x0 Oct 26 03:35:18 rivendell kernel: mpt0: completing timedout/aborted req 0xff= fffe0000f73350:62027 After which all the drives seem to disappear and the system detaches all of t= hem: Oct 26 03:35:33 rivendell kernel: da1 at mpt0 bus 0 scbus0 target 14 lun 0 Oct 26 03:35:33 rivendell kernel: da1: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM= AY01559141 detached Oct 26 03:35:33 rivendell kernel: da2 at mpt0 bus 0 scbus0 target 15 lun 0 Oct 26 03:35:33 rivendell kernel: da2: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM= AY01603430 detached Oct 26 03:35:33 rivendell kernel: da5 at mpt0 bus 0 scbus0 target 18 lun 0 Oct 26 03:35:33 rivendell kernel: da5: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM= AY01159727 detached Oct 26 03:35:33 rivendell kernel: da6 at mpt0 bus 0 scbus0 target 19 lun 0 Oct 26 03:35:33 rivendell kernel: da6: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM= AY02971691 detached Oct 26 03:35:33 rivendell kernel: da4 at mpt0 bus 0 scbus0 target 17 lun 0 Oct 26 03:35:33 rivendell kernel: da4: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM= AY01470856 detached Oct 26 03:35:33 rivendell kernel: da3 at mpt0 bus 0 scbus0 target 16 lun 0 Oct 26 03:35:33 rivendell kernel: da3: <ATA WDC WD2002FAEX-0 1D05> s/n WD-WM= AY01602648 detached At this point I have had to reboot the server and then all the drives magica= lly reappear. Any help would be greatly appreciated. -Jeremy --=20 Jeremy Beker - @gothmog=20 http://www.confusticate.com Condensing fact from the vapor of nuance. --Apple-Mail-2B53EFC2-3302-4FCB-A6F6-4CDFECE20F4D Content-Type: application/pkcs7-signature; name=smime.p7s Content-Disposition: attachment; filename=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFDDCCBQgw ggPwoAMCAQICED7waKcRDjPZchDYp4xW7X0wDQYJKoZIhvcNAQELBQAwdTELMAkGA1UEBhMCSUwx FjAUBgNVBAoTDVN0YXJ0Q29tIEx0ZC4xKTAnBgNVBAsTIFN0YXJ0Q29tIENlcnRpZmljYXRpb24g QXV0aG9yaXR5MSMwIQYDVQQDExpTdGFydENvbSBDbGFzcyAxIENsaWVudCBDQTAeFw0xNjAzMjcx MjA0MjVaFw0xNzAzMjcxMjA0MjVaMEwxITAfBgNVBAMMGGdvdGhtb2dAY29uZnVzdGljYXRlLmNv bTEnMCUGCSqGSIb3DQEJARYYZ290aG1vZ0Bjb25mdXN0aWNhdGUuY29tMIIBIjANBgkqhkiG9w0B AQEFAAOCAQ8AMIIBCgKCAQEA3RoESoAhdajTxi3KVNa8fnM9blHxqbylwHh9bDQ3A+w5xguZlOxg pLAJSczpLVGRilU/e6UlRzgXCaRhEFIv6rb5czqxqq+Aktvus9uY99Q+vCU/LbnutPeF/X0Hr01E ff+Ts+wVBVjnj1vuvW1x/lSzTGKCVsuYhvOb5ULXTTp/OLpRJhprpZXJCmJ+6LQftykLBR/fhyL9 jIEPAxa7JV64VkYk/qANeX29j36y1W8+J5CV2egwrrXpOnIOsY15K00eHIoNcRiXJnR0LfDST8eT dUVQWjBA5gzbTGs96hlS2EQ3Dz3jSZc2CsdM5k8rgzkdBXwzpvz6kbWNE8Q5ywIDAQABo4IBuzCC AbcwDgYDVR0PAQH/BAQDAgSwMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcDBDAJBgNVHRME AjAAMB0GA1UdDgQWBBRaB+bmgn7KdjjHK5ZnNUVBAuycKzAfBgNVHSMEGDAWgBQkgWw5Yb5JD4+3 G0YrySi1J0htaDBvBggrBgEFBQcBAQRjMGEwJAYIKwYBBQUHMAGGGGh0dHA6Ly9vY3NwLnN0YXJ0 c3NsLmNvbTA5BggrBgEFBQcwAoYtaHR0cDovL2FpYS5zdGFydHNzbC5jb20vY2VydHMvc2NhLmNs aWVudDEuY3J0MDgGA1UdHwQxMC8wLaAroCmGJ2h0dHA6Ly9jcmwuc3RhcnRzc2wuY29tL3NjYS1j bGllbnQxLmNybDAjBgNVHREEHDAagRhnb3RobW9nQGNvbmZ1c3RpY2F0ZS5jb20wIwYDVR0SBBww GoYYaHR0cDovL3d3dy5zdGFydHNzbC5jb20vMEYGA1UdIAQ/MD0wOwYLKwYBBAGBtTcBAgQwLDAq BggrBgEFBQcCARYeaHR0cDovL3d3dy5zdGFydHNzbC5jb20vcG9saWN5MA0GCSqGSIb3DQEBCwUA A4IBAQACW4t9PdRYwzKMfSdGBlBhkcd+OAF8lHT3Jh/FYgRVrkkPvEh7SIPa7wPKuzwf9hFjhxPE zyG264lW1WNyMbD3Hl4Djwu8tXPNjW1nxXO3iRIA9acqpvivp8SCIWoO5AigAm8G6KEIQS3rYPV+ q28YEziMoRGvb+seEBQCYANxRtEVTaQfYA3iOezKiYmftC+EXT/J3AqerQD7v9+kyloZ62OhHgof yAvXeVY7sK8BmG1h9LDPQgxDVwW1JRQJmw6WHVu2twj3W+DTTmEjZM9F8XqNvScaZvPhSx7ZIkvU bNo7rK5O+05825BkqJwgrwuhXS7utuBA3Gr6UYz9fdxQMYIDTjCCA0oCAQEwgYkwdTELMAkGA1UE BhMCSUwxFjAUBgNVBAoTDVN0YXJ0Q29tIEx0ZC4xKTAnBgNVBAsTIFN0YXJ0Q29tIENlcnRpZmlj YXRpb24gQXV0aG9yaXR5MSMwIQYDVQQDExpTdGFydENvbSBDbGFzcyAxIENsaWVudCBDQQIQPvBo pxEOM9lyENinjFbtfTAJBgUrDgMCGgUAoIIBmTAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwG CSqGSIb3DQEJBTEPFw0xNjEwMzAxMTUyMDBaMCMGCSqGSIb3DQEJBDEWBBSH6mGTBhkIJiCjdV4B R0iw7DsviTCBmgYJKwYBBAGCNxAEMYGMMIGJMHUxCzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFy dENvbSBMdGQuMSkwJwYDVQQLEyBTdGFydENvbSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTEjMCEG A1UEAxMaU3RhcnRDb20gQ2xhc3MgMSBDbGllbnQgQ0ECED7waKcRDjPZchDYp4xW7X0wgZwGCyqG SIb3DQEJEAILMYGMoIGJMHUxCzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFydENvbSBMdGQuMSkw JwYDVQQLEyBTdGFydENvbSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTEjMCEGA1UEAxMaU3RhcnRD b20gQ2xhc3MgMSBDbGllbnQgQ0ECED7waKcRDjPZchDYp4xW7X0wDQYJKoZIhvcNAQEBBQAEggEA QLiXvIFR/yC9YcHSVYv/D7u30LomvtnQ6f3NrFzbsFF2nrUKMtAl+pxQ+YShol+sPHQ2hfdRt8fZ qE/bddHO8hCgziBTkFTMPqQm4EZGN2bDtKGeOHTJlP3/af0b0nzYHHGaznIVJHE9eWQvoX12153V ljsBw8DO7N0VvlgaAJwd4uSsEwc+eSJbdaqzRrZta6iPgq+znz+4e2ulzr7al+uRUcbVIeuotMJQ 8yfxdbwdL37Uyb+jEwh+Ld+O/jjJv7GmkhFuV2FTtOjyU2y/j/3zkibXYzghWcge64yBcUOFsgbr 21Hk+tRYkmDILO8qj6r2tEQUicX+9RzkXNcK0AAAAAAAAA== --Apple-Mail-2B53EFC2-3302-4FCB-A6F6-4CDFECE20F4D--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FF400F3A-350A-4133-BED1-78087F1657F3>