Date: Tue, 05 May 2020 01:44:39 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 246207] [geom] geli livelocks during panic Message-ID: <bug-246207-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D246207 Bug ID: 246207 Summary: [geom] geli livelocks during panic Product: Base System Version: 12.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: asomers@FreeBSD.org Some geli-using machines I administer occasionally panic. When they do, th= ey sometimes dump core but often don't. When they don't, they simply hang aft= er printing the stack trace, but before printing the uptime. I've traced the problem to geli's shutdown_pre_sync event handler. It trie= s to destroy each geli device. We can't simply skip that step if a panic is underway; erasing the keys is necessary to prevent warm-boot attacks. The problem lies in the following lines.=20=20 g_eli_destroy: sc->sc_flags |=3D G_ELI_FLAG_DESTROY; wakeup(sc); /* * Wait for kernel threads self destruction. */ while (!LIST_EMPTY(&sc->sc_workers)) { msleep(&sc->sc_workers, &sc->sc_queue_mtx, PRIBIO, "geli:destroy", 0); } _sleep: if (SCHEDULER_STOPPED_TD(td)) { if (lock !=3D NULL && priority & PDROP) class->lc_unlock(lock); return (0); } As you can see, if the scheduler is stopped for the current thread (which it will be during a panic), then msleep does nothing, cause g_eli_destroy to l= oop indefinitely. The obvious solution, which I haven't yet tested, would be to skip that section in g_eli_destroy when the scheduler is stopped. What I d= on't understand is why g_eli_destroy _ever_ works during a panic. Perhaps it has something to do with the allocation of worker threads among cores? Perhaps= it only succeeds when all worker threads happen to be on different cores? I f= ind that unlikely though, because these servers have thousands of worker thread= s. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-246207-227>