Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Apr 2019 09:38:19 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
Message-ID:  <2bc8a172-6168-5ba9-056c-80455eabc82b@denninger.net>
In-Reply-To: <3d2ad225-b223-e9db-cce8-8250571b92c9@FreeBSD.org>
References:  <f87f32f2-b8c5-75d3-4105-856d9f4752ef@denninger.net> <c96e31ad-6731-332e-5d2d-7be4889716e1@FreeBSD.org> <9a96b1b5-9337-fcae-1a2a-69d7bb24a5b3@denninger.net> <CACpH0MdLNQ_dqH%2Bto=amJbUuWprx3LYrOLO0rQi7eKw-ZcqWJw@mail.gmail.com> <1866e238-e2a1-ef4e-bee5-5a2f14e35b22@denninger.net> <3d2ad225-b223-e9db-cce8-8250571b92c9@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030608080001090807000405
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 4/10/2019 08:45, Andriy Gapon wrote:
> On 10/04/2019 04:09, Karl Denninger wrote:
>> Specifically, I *explicitly* OFFLINE the disk in question, which is a
>> controlled operation and *should* result in a cache flush out of the Z=
FS
>> code into the drive before it is OFFLINE'd.
>>
>> This should result in the "last written" TXG that the remaining online=

>> members have, and the one in the offline member, being consistent.
>>
>> Then I "camcontrol standby" the involved drive, which forces a writeba=
ck
>> cache flush and a spindown; in other words, re-ordered or not, the
>> on-platter data *should* be consistent with what the system thinks
>> happened before I yank the physical device.
> This may not be enough for a specific [RAID] controller and a specific
> configuration.  It should be enough for a dumb HBA.  But, for example, =
mrsas(9)
> can simply ignore the synchronize cache command (meaning neither the on=
-board
> cache is flushed nor the command is propagated to a disk).  So, if you =
use some
> advanced controller it would make sense to use its own management tool =
to
> offline a disk before pulling it.
>
> I do not preclude a possibility of an issue in ZFS.  But it's not the o=
nly
> possibility either.

In this specific case the adapter in question is...

mps0: <Avago Technologies (LSI) SAS2116> port 0xc000-0xc0ff mem
0xfbb3c000-0xfbb3ffff,0xfbb40000-0xfbb7ffff irq 30 at device 0.0 on pci3
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities:
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc=
>

Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
his drives via dumb on-MoBo direct SATA connections.

What I don't know (yet) is if the update to firmware 20.00.07.00 in the
HBA has fixed it.=C2=A0 The 11.2 and 12.0 revs of FreeBSD through some
mechanism changed timing quite materially in the mps driver; prior to
11.2 I ran with a Lenovo SAS expander connected to SATA disks without
any problems at all, even across actual disk failures through the years,
but in 11.2 and 12.0 doing this resulted in spurious retries out of the
CAM layer that allegedly came from timeouts on individual units (which
looked very much like a lost command sent to the disk), but only on
mirrored volume sets -- yet there were no errors reported by the drive
itself, nor did either of my RaidZ2 pools (one spinning rust, one SSD)
experience problems of any sort.=C2=A0=C2=A0 Flashing the HBA forward to
20.00.07.00 with the expander in resulted in the=C2=A0 *driver* (mps) tak=
ing
disconnects and resets instead of the targets, which in turn caused
random drive fault events across all of the pools.=C2=A0 For obvious reas=
ons
that got backed out *fast*.

Without the expander 19.00.00.00 has been stable over the last few
months *except* for this circumstance, where an intentionally OFFLINE'd
disk in a mirror that is brought back online after some reasonably long
period of time (days to a week) results in a successful resilver but
then a small number of checksum errors on that drive -- always on the
one that was OFFLINEd, never on the one(s) not taken OFFLINE -- appear
and are corrected when a scrub is subsequently performed.=C2=A0 I am now =
on
20.00.07.00 and so far -- no problems.=C2=A0 But I've yet to do the backu=
p
disk swap on 20.00.07.00 (scheduled for late week or Monday) so I do not
know if the 20.00.07.00 roll-forward addresses the scrub issue or not.=C2=
=A0
I have no reason to believe it is involved, but given the previous
"iffy" nature of 11.2 and 12.0 on 19.0 with the expander it very well
might be due to what appear to be timing changes in the driver architectu=
re.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030608080001090807000405
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL
MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf
BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4
MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD
dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1
ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI
KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD
0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY
vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn
uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24
SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E
6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH
YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL
h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd
zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE
FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q
EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ
TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5
c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY
MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC
AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN
gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9
oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj
tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K
uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv
HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK
17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/
Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA
6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY
UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO
62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp
ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD
QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx
MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw
ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A
16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg
96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg
y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs
YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg
6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX
SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM
cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN
5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l
CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg
CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v
b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF
oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG
SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O
BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC
GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE
BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh
IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ
zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN
AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR
dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu
2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X
lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK
FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q
IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA
1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL
shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ
qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE
QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw
GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwNDEwMTQzODE5
WjBPBgkqhkiG9w0BCQQxQgRAo/0Eo3QU7U0Rt47rVYVCU/7AdRepKOmMiUfpapuV4ykDaTH4
lsdw/cXNuMNKrHT20lTAgPSSef5mCYHbL3bl0zBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl
AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3
DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg
gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz
dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0
ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF
AASCAgC9r8hGG2fG7K0hTeHKGuSzH0u9PKo+7cyRclAgQs+EKwbnPCHsb9Dtjj4ABFCQ/e+1
OeUKRlV1+o7tBeeln4cIkbXleRDg+p2a7N275F74qvvdm5+x8hP1b5eFpCKHnbbQ+y9blFOd
P6+YudwmzAKJJygyZvBkE5LiP7SDp+XzTWJHTpiZ05+V82hmOeX181nu2fOLWFHlunHGlYyu
Etq2cKd7y6srf6rkykMXbgkt3tYVN3pge18sg2F3uOasZ6zXhJ3/RZx5Li/iVvkl+RYbHQNE
EsFKRRKtfSdD3LWJFLWxX0qevL7IaGF3TQ/t+trrxnHWYphEjYAGLgsYXEV80RrCd2LUmK4y
XC+Z5KH8ifYlnWyTtvDAXwJVWxmdCDYvBK5Ek6CEd0DzjslSPcluXcjMROaxJ8Z7Npcr77dh
Mo0noZFXst1iOY3BmlPRI/W2M+Li6Ewdo8gHqMm4YueDNvtCgjXxmkYcLJgUCR4F9ue3FzHL
dsw3z+J79jpExGSUeY6IBnaQQqMebr8OoKkljcuwTxOwsys66Omb6z769F1HkycQ07XGti1b
zEheSjkY1d+QOesKjI9jBpUz4Pe9po3Ga+EKlpVOgLzEqvmN2fUN/SLM2pGrkDwt/CaxxtjK
2BTwBQ5ciBSsKFd3pwNjVUtS3Q2xVow7ydE27kHBhAAAAAAAAA==
--------------ms030608080001090807000405--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2bc8a172-6168-5ba9-056c-80455eabc82b>