Date: Tue, 18 Oct 2016 15:55:37 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE Message-ID: <1fefed03-6062-50f9-be97-d693e25a64c9@denninger.net> In-Reply-To: <4d4909b7-c44b-996e-90e1-ca446e8e4813@multiplay.co.uk> References: <3d4f25c9-a262-a373-ec7e-755325f8810b@denninger.net> <9adecd24-6659-0da5-5c05-d0d3957a2cb3@denninger.net> <CANCZdfq5QCDNhLY5GOpmBoh5ONYy2VPteuaMhQ2=3v%2B0vcoM0g@mail.gmail.com> <0f58b11f-0bca-bc08-6f90-4e6e530f9956@denninger.net> <43a67287-f4f8-5d3e-6c5e-b3599c6adb4d@multiplay.co.uk> <76551fd6-0565-ee6c-b0f2-7d472ad6a4b3@denninger.net> <25ff3a3e-77a9-063b-e491-8d10a06e6ae2@multiplay.co.uk> <26e092b2-17c6-8744-5035-d0853d733870@denninger.net> <d2afc0b0-0e7f-e7ac-fb21-fa4ffd1c1003@multiplay.co.uk> <f9a4a12d-62df-482d-feeb-9d9f64de3e55@denninger.net> <4d4909b7-c44b-996e-90e1-ca446e8e4813@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms040206080209020402030205 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 10/17/2016 18:32, Steven Hartland wrote: > > > On 17/10/2016 22:50, Karl Denninger wrote: >> I will make some effort on the sandbox machine to see if I can come up= >> with a way to replicate this. I do have plenty of spare larger drives= >> laying around that used to be in service and were obsolesced due to >> capacity -- but what I don't know if whether the system will misbehave= >> if the source is all spinning rust. >> >> In other words: >> >> 1. Root filesystem is mirrored spinning rust (production is mirrored >> SSDs) >> >> 2. Backup is mirrored spinning rust (of approx the same size) >> >> 3. Set up auto-snapshot exactly as the production system has now (whic= h >> the sandbox is NOT since I don't care about incremental recovery on th= at >> machine; it's a sandbox!) >> >> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for >> the Pi2s I have here, etc) to generate a LOT of filesystem entropy >> across lots of snapshots. >> >> 5. Back that up. >> >> 6. Export the backup pool. >> >> 7. Re-import it and "zfs destroy -r" the backup filesystem. >> >> That is what got me in a reboot loop after the *first* panic; I was >> simply going to destroy the backup filesystem and re-run the backup, b= ut >> as soon as I issued that zfs destroy the machine panic'd and as soon a= s >> I re-attached it after a reboot it panic'd again. Repeat until I set >> trim=3D0. >> >> But... if I CAN replicate it that still shouldn't be happening, and th= e >> system should *certainly* survive attempting to TRIM on a vdev that >> doesn't support TRIMs, even if the removal is for a large amount of >> space and/or files on the target, without blowing up. >> >> BTW I bet it isn't that rare -- if you're taking timed snapshots on an= >> active filesystem (with lots of entropy) and then make the mistake of >> trying to remove those snapshots (as is the case with a zfs destroy -r= >> or a zfs recv of an incremental copy that attempts to sync against a >> source) on a pool that has been imported before the system realizes th= at >> TRIM is unavailable on those vdevs. >> >> Noting this: >> >> Yes need to find some time to have a look at it, but given how ra= re >> this is and with TRIM being re-implemented upstream in a totally >> different manor I'm reticent to spend any real time on it. >> >> What's in-process in this regard, if you happen to have a reference? > Looks like it may be still in review: https://reviews.csiden.org/r/263/= > > Initial attempts to provoke the panic has failed on the sandbox machine -- it appears that I need a materially-fragmented backup volume (which makes sense, as that would greatly increase the number of TRIM's queued.)= Running a bunch of builds with snapshots taken between generates a metric ton of entropy in the filesystem, but it appears that the number of TRIMs actually issued when you bulk-remove them (with zfs destroy -r) is small enough to not cause it -- probably because the system issues one per area of freed disk, and since there is no interleaving with other (non-removed) data that number is "reasonable" since there's little fragmentation of that free space. The TRIMs *are* attempted, and they *do* fail, however..... I'm running with the 6 pages of kstack now on the production machine, and we'll see if I get another panic... --=20 Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms040206080209020402030205 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjEwMTgyMDU1MzdaME8GCSqGSIb3DQEJBDFCBEBI z9dDmtaCkAeNlMzW4yZMrJchdvA25n5Tei2nlg+EevLD+Q9tZ+zjqMFRYrYr8nGf9TBBFC6c 0ZMVfCBAj3pQMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAn2rgnnNi EZHYytdfyH2xDEN7jIIP9erFeFT61zryxkKUDWsB6zj2hUL0109YN9YOQMfJR0qPPA68W7Np duVddYyZ07Xhcc9l/tYERrUTa9WMmHMDidA3H3tdkruRbZDJ6fWaWUyIcqsTRfoSG5uavzzM U4wS20ouzMXBHLa+/CiJ0+h7qsqjRVdE5pkFJcAQ8hxUsP/wCDB92rEnGdwzXPWrcaOIxIBk cFlPJ7o6TcdlT62Hx7QxF1Ncq1iSz4YDAS49sens04khYP9ZsDAYFlpasc9YzQeQ9ttoQNE4 yGiDhLnf5JMqPCPqGMrWT9sLIvVTRrZdXZEkAEhOirIpA7GVQfqZa3f7jpwVk6Vv6xZj1Y44 BYsG93fAdmsIfl+fEp2D//6s1u/Y+1NcfjQVq6CikpRrAORUX7Ec93OjJieT69vJo80KkdYI ZOQyUu0FkBYGdmd+b0EaJBUJ+D3BZN4R01mMYowjsLSDqYPAB+jyiNSZdzNLAoen6k2pBJ66 qVYaZQz3rXmRfeKdC1fQBISti7jfKngTZe4/n166tplqHc2Ww+mYFWflMKBgdtgtKvUIQe4h 6+kKN0hPnlWwFHGOPHDvthRxiYV1hZ6MOhC7iA/NOdR2YJP2r0kLJj9hLuPjpt5yzLohSLpn wjG/3tuiJtEytNyVNZ+SHo0NTiQAAAAAAAA= --------------ms040206080209020402030205--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1fefed03-6062-50f9-be97-d693e25a64c9>