Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 May 2019 11:28:57 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: ZFS...
Message-ID:  <d9086e22-fa73-1b01-d455-c32e0be70783@denninger.net>
In-Reply-To: <92330c95-7348-c5a2-9c13-f4cbc99bc649@sorbs.net>
References:  <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> <A535026E-F9F6-4BBA-8287-87EFD02CF207@sorbs.net> <26B407D8-3EED-47CA-81F6-A706CF424567@gromit.dlib.vt.edu> <42ba468a-2f87-453c-0c54-32edc98e83b8@sorbs.net> <4A485B46-1C3F-4EE0-8193-ADEB88F322E8@gromit.dlib.vt.edu> <14ed4197-7af7-f049-2834-1ae6aa3b2ae3@sorbs.net> <453BCBAC-A992-4E7D-B2F8-959B5C33510E@gromit.dlib.vt.edu> <92330c95-7348-c5a2-9c13-f4cbc99bc649@sorbs.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms050606090205020207040608
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 5/8/2019 10:14, Michelle Sullivan wrote:
> Paul Mather wrote:
>> On May 8, 2019, at 9:59 AM, Michelle Sullivan <michelle@sorbs.net>
>> wrote:
>>
>>>> Did you have regular pool scrubs enabled?=C2=A0 It would have picked=
 up
>>>> silent data corruption like this.=C2=A0 It does for me.
>>> Yes, every month (once a month because, (1) the data doesn't change
>>> much (new data is added, old it not touched), and (2) because to
>>> complete it took 2 weeks.)
>>
>>
>> Do you also run sysutils/smartmontools to monitor S.M.A.R.T.
>> attributes?=C2=A0 Although imperfect, it can sometimes signal trouble
>> brewing with a drive (e.g., increasing Reallocated_Sector_Ct and
>> Current_Pending_Sector counts) that can lead to proactive remediation
>> before catastrophe strikes.
> not Automatically
>>
>> Unless you have been gathering periodic drive metrics, you have no
>> way of knowing whether these hundreds of bad sectors have happened
>> suddenly or slowly over a period of time.
> no, it something i have thought about but been unable to spend the
> time on.
>
There are two issues here that would concern me greatly and IMHO you
should address.

I have a system here with about the same amount of net storage on it as
you did.=C2=A0 It runs scrubs regularly; none of them take more than 8 ho=
urs
on *any* of the pools.=C2=A0 The SSD-based pool is of course *much* faste=
r
but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it
kicks off automatically at 2:00 AM when the time comes but is complete
before noon.=C2=A0 I run them on 14 day intervals.

If you have pool(s) that are taking *two weeks* to run a scrub IMHO
either something is badly wrong or you need to rethink organization of
the pool structure -- that is, IMHO you likely either have a severe
performance problem with one or more members or an architectural problem
you *really* need to determine and fix.=C2=A0 If a scrub takes two weeks
*then a resilver could conceivably take that long as well* and that's
*extremely* bad as the window for getting screwed is at its worst when a
resilver is being run.

Second, smartmontools/smartd isn't the be-all, end-all but it *does*
sometimes catch incipient problems with specific units before they turn
into all-on death and IMHO in any installation of any material size
where one cares about the data (as opposed to "if it fails just restore
it from backup") it should be running.=C2=A0 It's very easy to set up and=

there are no real downsides to using it.=C2=A0 I have one disk that I rot=
ate
in and out that was bought as a "refurb" and has 70 permanent relocated
sectors on it.=C2=A0 It has never grown another one since I acquired it, =
but
every time it goes in the machine within minutes I get an alert on
that.=C2=A0 If I was to ever get *71*, or a *different* drive grew a new =
one
said drive would get replaced *instantly*.=C2=A0 Over the years it has
flagged two disks before they "hard failed" and both were immediately
taken out of service, replaced and then destroyed and thrown away.=C2=A0
Maybe that's me being paranoid but IMHO it's the correct approach to
such notifications.

BTW that tool will *also* tell you if something else software-wise is
going on that you *might* think is drive-related.=C2=A0 For example recen=
tly
here on the list I ran into a really oddball thing happening with SAS
expanders that showed up with 12-STABLE and was *not* present in the
same box with 11.1.=C2=A0 Smartmontools confirmed that while the driver w=
as
reporting errors from the disks *the disks themselves were not in fact
taking errors.*=C2=A0 Had I not had that information I might well have
traveled down a road that led to a catastrophic pool failure by
attempting to replace disks that weren't actually bad.=C2=A0 The SAS expa=
nder
wound up being taken out of service and replaced with an HBA that has
more ports -- the issues disappeared.

Finally, while you *think* you only have a metadata problem I'm with the
other people here in expressing disbelief that the damage is limited to
that.=C2=A0 There is enough redundancy in the metadata on ZFS that if *al=
l*
copies are destroyed or inconsistent to the degree that they're unusable
it's extremely likely that if you do get some sort of "disaster
recovery" tool working you're going to find out that what you thought
was a metadata problem is really a "you're hosed; the data is also gone"
sort of problem.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms050606090205020207040608
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL
MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf
BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4
MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD
dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1
ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI
KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD
0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY
vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn
uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24
SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E
6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH
YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL
h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd
zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE
FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q
EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ
TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5
c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY
MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC
AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN
gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9
oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj
tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K
uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv
HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK
17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/
Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA
6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY
UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO
62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp
ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD
QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx
MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw
FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw
ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A
16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg
96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg
y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs
YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg
6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX
SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM
cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN
5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l
CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg
CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v
b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF
oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG
SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O
BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC
GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE
BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh
IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ
zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN
AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR
dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu
2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X
lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK
FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q
IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA
1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL
shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ
qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE
QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw
GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwNTA4MTYyODU3
WjBPBgkqhkiG9w0BCQQxQgRAi2hMCWk3BilFpSqf+utDZ785YVz/Z0KkBxgJAY3ESJM/Vprh
aCmknB4w+smrpZZEsspqxHkD49b1wXako+Dz7TBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl
AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3
DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM
TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM
QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg
gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz
dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0
ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF
AASCAgBmARnWdtPBpqibOi9gGTyJEtf3qzIvlFp+u5f/avUs1UYJxXV4jQb0lSjvOZyXmluC
UDkf820S5QqRCZ5DoAU/4VmSsbYZ1tR0Ni6sVBtfwr8TnbzaDm1xsTEV+/i20zDTsc74Eni7
ZdnOs1v7nsPzLAk3vW2uNZAm4AV0LmJWJRhNojgL87yxqvWav9pPMYOHbYhpI8A0WGHTMXzZ
J0nh1s1gs2YlO1CrpE8Q7eU526t2Gosvh6taRKvmIIj3RnbkD39ARm0/ihgL3jFy4Z8gPtNw
wlCBBYAA2Hk8pdR8rIv7nme6wN0V7Qlx3WXx1jlkXlDHKiDCuon8aK4PkVVgHyfTT8QaB7FR
PWF/Qfpx4uVjKexSjpoDjDkgIZdyY7a9rDHENT8BIYRYn0BCQLGKuxLnquNbN/3Y7nYZyybs
ps7v+cASDusq3yAaEHGsAUTseCFzPNbBk31Px35qUGaGvF7Eigfm/v047bBSGEBhkbbx0TlV
9zOBNyHiCQ4Eb6AkTNKwHNe6cE+0GKlDpaScU/PD+jS0lkgPMxc6NPPvIpoIVjol3r23X6NS
vFqyjAWtFrj4PeSFkxBJoXksUxqECpZAIPNMIHppSbdEipbCQ9W+OioUr7ImAn+J2iRy2Yw+
eFOAzgxbUuNikLlV0h0YFLOY+s6B1x55AcK2cJ+lMAAAAAAAAA==
--------------ms050606090205020207040608--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d9086e22-fa73-1b01-d455-c32e0be70783>