Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Feb 2019 12:02:56 -0600
From:      Karl Denninger <karl@denninger.net>
To:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   9211 (LSI/SAS) issues on 11.2-STABLE
Message-ID:  <7bb25f55-fa77-f67e-11f3-b2240b01e25a@denninger.net>

next in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030400050108040308040200
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

I recently started having some really oddball things=C2=A0 happening unde=
r
stress.=C2=A0 This coincided with the machine being updated to 11.2-STABL=
E
(FreeBSD 11.2-STABLE #1 r342918:) from 11.1.

Specifically, I get "errors" like this:

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (da12:mps0:0:37:0): READ(10). =
CDB: 28 00 af 82 bb 08 00 01 00 00
length 131072 SMID 269 Aborting command 0xfffffe0001179110
mps0: Sending reset from mpssas_send_abort for target ID 37
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (da12:mps0:0:37:0): READ(10). =
CDB: 28 00 af 82 bc 08 00 01 00 00
length 131072 SMID 924 terminated ioc 804b loginfo 31140000 scsi 0 state
c xfer 0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (da12:mps0:0:37:0): READ(10). =
CDB: 28 00 af 82 ba 08 00 01 00 00
length 131072 SMID 161 terminated ioc 804b loginfo 31140000 scsi 0 state
c xfer 0
mps0: Unfreezing devq for target ID 37
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bc 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: CCB request completed with an error
(da12:mps0:0:37:0): Retrying command
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 bb 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: Command timeout
(da12:mps0:0:37:0): Retrying command
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: CCB request completed with an error
(da12:mps0:0:37:0): Retrying command
(da12:mps0:0:37:0): READ(10). CDB: 28 00 af 82 ba 08 00 01 00 00
(da12:mps0:0:37:0): CAM status: SCSI Status Error
(da12:mps0:0:37:0): SCSI status: Check Condition
(da12:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on,
reset, or bus device reset occurred)
(da12:mps0:0:37:0): Retrying command (per sense data)

The "Unit Attention" implies the drive reset.=C2=A0 It only occurs on cer=
tain
drives under very heavy load (e.g. a scrub.)=C2=A0 I've managed to provok=
e it
on two different brands of disk across multiple firmware and capacities,
however, which tends to point away from a drive firmware problem.

A look at the pool data shows /no /errors (e.g. no checksum problems,
etc) and a look at the disk itself (using smartctl) shows no problems
either -- whatever is going on here the adapter is recovering from it
without any data corruption or loss registered on *either end*!

The configuration is an older SuperMicro Xeon board (X8DTL-IF) and shows:=


mps0: <Avago Technologies (LSI) SAS2008> port 0xc000-0xc0ff mem
0xfbb3c000-0xfbb3ffff,0xfbb40000-0xfbb7ffff irq 30 at device 0.0 on pci3
mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities:
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc=
>

There is also a SAS expander connected to that with all but the boot
drives on it (the LSI card will not boot from the expander so the boot
mirror is directly connected to the adapter.)

Thinking this might be a firmware/driver compatibility related problem I
flashed the card to 20.00.07.00, which is the latest available.=C2=A0 Tha=
t
made the situation **MUCH** worse; now instead of getting unit attention
issues I got *controller* resets (!!) which invariably some random
device (and sometimes more than one) in one of the pools to get
detached, as the controller didn't come back up fast enough for ZFS and
it declares the device(s) in question "removed".

Needless to say I immediately flashed the card back to 19.00.00.00!

This configuration has been completely stable on 11.1 for upwards of a
year, and only started misbehaving when I updated the OS to 11.2.=C2=A0 I=
've
pounded the living daylights out of this box for a very long time on a
succession of FreeBSD OS builds and up to 11.1 have never seen anything
like this; if I had a bad drive, it was clearly the drive.

Looking at the commit logs for the mps driver it appears there isn't
much here that *could* be involved, unless there's an interrupt issue
with some of the MSI changes that is interacting with my specific
motherboard line.

Any ideas on running this down would be appreciated; it's not easy to
trigger it on the 19.0 firmware but on 20. I can force a controller
reset and detach within a few minutes by running scrubs so if there are
things I can try (I have a sandbox machine with the same hardware in it
that won't make me cry much if I blow it up) that would great.

Thanks!

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030400050108040308040200
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL
MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf
BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4
MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD
dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1
ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI
KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD
0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY
vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn
uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24
SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E
6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH
YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL
h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd
zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE
FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q
EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ
TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5
c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY
MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC
AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN
gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9
oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj
tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K
uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv
HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK
17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/
Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA
6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY
UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO
62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp
ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD
QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx
MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw
ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A
16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg
96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg
y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs
YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg
6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX
SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM
cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN
5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l
CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg
CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v
b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF
oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG
SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O
BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC
GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE
BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh
IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ
zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN
AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR
dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu
2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X
lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK
FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q
IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA
1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL
shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ
qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE
QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw
GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwMjAyMTgwMjU2
WjBPBgkqhkiG9w0BCQQxQgRAq7E3uXYK2K6WzEOD9Gt9G5bkQESgnm+fJxwbh7YPyBVbk2CY
5MxclBShyudZ1DS8hJCZoYkCgwp0Ci1IEK9FjzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl
AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3
DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg
gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz
dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0
ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF
AASCAgAx/yKVj1afa9YkPf5giv8oIzHSHY4EJqAbz2thmNoRBEUJT4SBmhDPyFd414gyEoNv
xTWr48H8GBQxBPHm6T2EQPBQc8I8tTbvgpXQnS4sd/cC+EZPMa/el27nGVEWJx/zQFU1Wx6t
WAJOeQVzxljDqTnP8jM9BZs9vlE9Ty4fY9uQ1ql+x9vFENgZoahgPUVy8sogJVze7RCEc5gJ
l26lK95wBgNEL/dKnCDIkCUHks3KMtWH6Y/6dgVIiT5A3ZXLp+99Bpz0RegrQxOY1QtIUMNg
FnWk9StbswfiV9wuAUxEz9PKdzA0ZFJtwIyGTBqp+z/0eRej20YZOii9JqahMV3tCwnwFrGI
VLY4G5TjJA8f7gq5wTw4DIhpyhkXQOg8gO3+n+cAbTUsxCPtLjd3H/dGeBnFPd5GaidmGkh4
ulvL0PKpY6MfKgnMrNlIpvlHKIZLZ4N8o+NgtM+SL1gJtxKQG2oinE1cCzJEygXqn1Rd1TMx
ftZRKBl7vOe8x0VNk609RdMgyfShu3JOvfbz54QT/U6x4FBgOog6f7GHEP9cqIfujWbl34ZS
IYnukP4zkGCRyXR0ue4CMJymbnWIFhziySAiF3GCEUwtgELVGQpF4P9idD8gzHcy5SErM4S4
DWDE+Gjko0Lbj3f8l8KkDNKI6pIxD+zq6ABwQPbXggAAAAAAAA==
--------------ms030400050108040308040200--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7bb25f55-fa77-f67e-11f3-b2240b01e25a>