Date: Mon, 24 Oct 2016 13:54:30 -0500 From: Karl Denninger <karl@denninger.net> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Perhaps odd, perhaps trouble, perhaps not? (ZFS and Mirrored Configurations) Message-ID: <c9d7bb48-628f-9768-2f9a-3cec8a6417e4@denninger.net>
next in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms080800030109050505030103 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Contemplate the following: 1. Mirrored ZFS pool called "external" with *three* components (call them "primary", "Sec1" and "Sec2") 2. Write data to said pool. 3. "zpool offline external Sec2" 4. Physically removed "Sec2" and place it in a vault somewhere; "Primary" and "Sec1" remain in the computer. 5. Run system for some fairly long period of time (days, weeks, perhaps months) 6. "Zpool scrub external" (make sure both drives are ok); note zero errors at completion. 6. "zpool offline external Sec1" 7. Remove Sec1, exchange with "Sec2" at vault, place "Sec1" back in computer. 8. "zpool online external (long series of numbers that zpool status says was Sec1 last time it was mounted)" Wait for resilver to complete, which _*should*_ only require that _*changed*_ sectors on "Primary" since Sec1 was removed be rewritten to "Sec1". Well, this works if the time is relatively short. However, what I have observed is that if the time is relatively long or some unknown event(s) take place in the interium then you can get this: zpool status external pool: external state: DEGRADED status: One or more devices is currently being resilvered. The pool will= continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Oct 24 12:32:40 2016 487G scanned out of 2.26T at 127M/s, 4h6m to go 96.1G resilvered, 21.05% done config: NAME STATE READ WRITE CKSUM external DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 label/Primary.eli ONLINE 0 0 0 9568232509714437622 OFFLINE 0 0 0 was /dev/label/Sec1.eli label/Sec2.eli ONLINE 0 0 555K (resilvering= ) Note the enormous number of checksum errors (all on the *just-attached* disk.) Do you think its possible that I have *that many* actual checksum errors on those blocks and yet *zero* I/O errors logged by the driver *or* the "smart" data when queried from the disk itself *and* geli is not complaining about corruption of the data its reading either? Uh, not damn likely, and further, the re-write is successful (e.g. there are no write errors winding up being logged during this process.) Any idea what's going on here when this happens? I suspect that there is some event that has cleared the log of "pending" changes between the devices in that mirror, such that you can no longer successfully online the device without at least part of it (but not all) being re-written -- but I have no idea what event that would be (and thus how to avoid it happening, if it can be avoided.) Why do something like this in the first place? Because it's very a convenient way to take a device offsite for fire/disaster protection and yet have the resync be very fast if there is a lot of data on that dataset that has not changed, since (in theory) the system only needs to rewrite the portion of that disk that has changed blocks. --=20 Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms080800030109050505030103 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjEwMjQxODU0MzBaME8GCSqGSIb3DQEJBDFCBEBO gYi4nP0cG7LPLTzUJr/AtpsWWlCZhm5dfTpEvE9e08j1pcYTQ+oYceYNL95DnG0dX+wf07Ea R4CQZ1y2A6GiMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAns6pVou5 ay7Tiicr25BFTe8gqoyt6dpUGMJxLlsXpL9GK+i2G2ExjUgTx1MRAO0wV8n3b7n9kwUmOrOc y51pBTeB2E+73RkUIohlJW+AI5btjRNzjwn3m7eCH1HMR/WvxRaE+ZbJh2QOnFX6TM3/N7Am Cxa9h8OjV+J1YaEWjYq8qS0La9zaMOUn1CFpaG7vS5TgMHhtZhkL/Ab65T/rTRxnrkCFv6CN HqvB0qd+8Z8f6M9F62jNd/FfGh2kadi8agllvBXF12c+O2rkgYB4hA5k1lhG5fw+EHU6Azb7 d+/499KpNVe92bC0bzP0/yzHu30NOa6KKEB4WH/vHLYGxvtwRlPwT0zREsLyk9lgY7oFAA9D IdXL0yhsyNAZmpqL6VqBfOGD/TMI3RRAIyVEqL7VVBdcLw9Mnf1g06rTL+mLBVMSCSYd25Fk VKZb2Hd82kpDp4jxGRlNzhUlnrib7LxAIjhaLCUBYweMqFsBNHDRavt0IAV0UljAYgCT5rR1 VsJ+qn9cM00sqPt1GBPcxiwYd7rTbCqk2jvSU8I/RnzP0uW0HVrPjaWdbsUGCrG0cZ+VoUVg D1hpDgCc4WM3cYQzLkt0WQtUS+u0/ejPkFnKgaejFJVrNQ3+u3GFUyxiia0sS1UmHYhjoC8U LUieKd79KUIBcscDwW05ZWCV4QAAAAAAAAA= --------------ms080800030109050505030103--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c9d7bb48-628f-9768-2f9a-3cec8a6417e4>