Date: Thu, 18 Jul 2019 16:17:17 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: Kernel panic in zfs code; 12-STABLE Message-ID: <6a3d5e55-6d50-3e08-63e4-dd1cd2209abe@denninger.net> In-Reply-To: <501734b7-22ec-5c01-eea5-26b458945e7e@denninger.net> References: <61e5debd-b440-16c9-2a70-0912634e52aa@denninger.net> <d6cf2edf-81f2-fb63-fa39-c310fe7258a7@grosbein.net> <501734b7-22ec-5c01-eea5-26b458945e7e@denninger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms080701050702040803020600 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/18/2019 15:35, Karl Denninger wrote: > On 7/18/2019 15:19, Eugene Grosbein wrote: >> 19.07.2019 3:13, Karl Denninger wrote: >> >>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019 =20 >>> karl@NewFS.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP >>> >>> Note -- no patches of any sort in the ZFS code; I am NOT running any = of >>> my former patch set. >>> >>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8 >>> >>> Thu Jul 18 15:02:54 CDT 2019 >>> >>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r34902= 4M: >>> Thu Jun 13 18:01:16 CDT 2019 =20 >>> karl@NewFS.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP am= d64 >>> >>> panic: double fault >> [skip] >> >>> #283 0xffffffff82748d91 in zio_vdev_io_done (zio=3D0xfffff8000b8b8000= ) >>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:= 3376 >>> #284 0xffffffff82744eac in zio_execute (zio=3D0xfffff8000b8b8000) >>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:= 1786 >>> #285 0xffffffff80c3b7f4 in taskqueue_run_locked (queue=3D0xfffff801a8= b35100) >>> at /usr/src/sys/kern/subr_taskqueue.c:467 >>> #286 0xffffffff80c3cb28 in taskqueue_thread_loop (arg=3D<value optimi= zed out>) >>> at /usr/src/sys/kern/subr_taskqueue.c:773 >>> #287 0xffffffff80b9ab23 in fork_exit ( >>> callout=3D0xffffffff80c3ca90 <taskqueue_thread_loop>, >>> arg=3D0xfffff801a0577520, frame=3D0xfffffe009d4edc00) >>> at /usr/src/sys/kern/kern_fork.c:1063 >>> #288 0xffffffff810b367e in fork_trampoline () >>> at /usr/src/sys/amd64/amd64/exception.S:996 >>> #289 0x0000000000000000 in ?? () >>> Current language: auto; currently minimal >>> (kgdb) >> You have "double fault" and completely insane number of stack frames i= n the trace. >> This is obviously infinite recursion resulting in kernel stack overflo= w and panic. > Yes, but.... why and how? > > What's executing at the time is this command: > > zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP > > Which in turn results in the old snapshots on the target not on the > source being deleted, then the new ones being sent.=C2=A0 It never gets= to > the sending part; it blows up during the delete of the OLD snapshots. > > The one(s) it deletes, however, it DOES delete.=C2=A0 When the box is > rebooted those two snapshots on the target are indeed gone. > > That is, it is NOT getting "stuck" on one (which would imply there's an= > un-detected fault in the filesystem on the target in the metadata for > that snapshot, resulting in a recursive call that blows up the stack) > and it never gets to send the new snapshot, so whatever is going on is > NOT on the source filesystem.=C2=A0 Neither source or destination shows= any > errors on the filesystem; both pools are healthy with zero error counts= =2E > > Therefore the question -- is the system queueing enough work to blow th= e > stack *BUT* the work it queues is all legitimate?=C2=A0 If so there's a= > serious problem in the way the code now functions in that an "ordinary"= > operation can result in what amounts to kernel stack exhaustion. > > One note -- I haven't run this backup for the last five days, as I do i= t > manually and I've been out of town.=C2=A0 Previous running it on a dail= y > basis completed without trouble.=C2=A0 This smells like a backlog of "t= hings > to do" when the send runs that results in the allegedly-infinite > recursion (that isn't really infinite) that runs the stack out of space= > -- and THAT implies that the system is trying to queue a crazy amount o= f > work on a recursive basis for what is a perfectly-legitimate operation > -- which it should *NOT* do. Update: This looks like an OLD bug that came back. Previously the system would go absolutely insane on the first few accesses to spinning rust during a snapshot delete and ATTEMPT to send thousands of TRIM requests -- which spinning rust does not support.=C2=A0= On a system with mixed vdevs, where some pools are rust and some are SSD, this was a problem since you can't turn TRIM off because you REALLY want it on those disks. The FIX for this was to do this on the import of said pool comprised of spinning rust: # # Now try to trigger TRIM so that we don't have a storm of them # # echo "Attempting to disable TRIM on spinning rust...." mount -t zfs $BACKUP/no-trim /mnt dd if=3D/dev/random of=3D/mnt/kill-trim bs=3D128k count=3D2 echo "Performed 2 writes" sleep 2 rm /mnt/kill-trim echo "Performed delete of written file; wait" sleep 35 umount /mnt echo "Unmounted temporary filesystem" sleep 2 echo "TRIM disable theoretically done...." This would cause ZFS to "figure out" that TRIM doesn't work on the VDEVs on that pool and not send them.=C2=A0 You were then ok. When this went away I commented those out. I just un-commented them -- and the backup is now running, and appears to be ok -- it's well past where it blew up before, and no panic. So it APPEARS that ZFS (still) doesn't figure out that TRIM doesn't work on those disks on a pool import quickly enough and this leads to a CRAZY number of queued requests which blow up the stack.=C2=A0 But, if you perf= orm an operation that *should* result in a small number of TRIMs being issued and then wait a few seconds the ZFS code does correctly realize that these are spinning drives that don't do TRIM, doesn't stack them up, and it doesn't panic. This wouldn't bite you on an import and then MOST ordinary I/O sequences, because the first few blocks you delete or overwrite will generate a TRIM attempt, it will fail, and the vdev's get properly marked.=C2=A0 But if the first thing you do after the pool import is a "z= fs recv" *and* that is an incremental receive, that is, it first deletes the non-existent snapshots on the source, you get screwed. I don't know why this was "fixed" and now once again is not or if it was just luck that it didn't bite me earlier after moving to 12.0 -- but going through the backtrace I found a crap-ton of TRIMs in those frames which just flat-out don't work, of course, on spinning drives. IMHO ZFS should figure this out when the pool is imported and set the flags appropriately, but it appears that it does not until you actually attempt to TRIM, and the "attempt" in the case of a send/recv pair results in a very large number of TRIM requests all being queued up at once -- enough to cause a panic -- before the code figures out that it can't do them at all. --=20 Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms080701050702040803020600 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4 MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1 ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD 0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24 SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E 6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5 c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9 oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK 17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/ Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA 6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO 62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A 16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg 96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg 6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN 5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu 2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA 1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwNzE4MjExNzE3 WjBPBgkqhkiG9w0BCQQxQgRAUMZIK9lQq9jzx0nuhaDDTsQ5SRpTCLGZP0ppv3di1+HwP/jJ avyJlZOm3pzYWQthmoCd+4LGDEgQMosYVChKGzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3 DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0 ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF AASCAgBgGQ0JfOZA8x521fURGxvJdmIgF5MTtcojE8NKbzSK34zzhrfDuYF4eoJWqI3izdDC 4CLxZkLhcDWHTjTfrPNwZk0/jHfJf8eIIe0x80YW2nC8k/RfAFe2tHuVd2BPobUH/XzCTeb/ RTCQHT3+IvZiXxYQisOgmaRq9z6WiwUbLvvvl2C4Pn6J0llszzsT9HMeeAavLSLkP4CT4HdJ Y4RvxhRjj6HEepsO7wvg74FLRk+y7d5NiAxwX7zAqyAp+awqz526J1pkWWQJk5rP+hdtIVas HPoyIlnPW0Kh1FoacLSNfR+sOeoCFrAObKTjlnxQAcpSeUhBsRvQkIFl/4guqZlT3VerTxgL L0/zqp6xI2YuTGxmwLUni3w+UNWRoOPVBaeT0+zVYl7EOjhpOCrMezcKEKU+pR3kGsJPkjIW es7SyktaCIoZ2NLKW2HuumJ/kBIzd5JBAFGUmVjuyyWPm6tnTHU6ck6TWfymjOWpLuUlDJ9V 7Q56dgTEG48gXZYJRWa0PgbqMLrJXsavnZO2trWiMo0JguKHFlZ30ansgRBMRoLellofAZGD vmxy6h4eBdBRRGYvEquSmp8satlS4aYbcijgK1R98Fen//WLLFS8WXDX/Q25U5NY1QZ+ERlp icdAaIeFumvuGCy7UbriVHAPXLIOYb9IjxshUkgPeAAAAAAAAA== --------------ms080701050702040803020600--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6a3d5e55-6d50-3e08-63e4-dd1cd2209abe>