From owner-freebsd-bugs@freebsd.org Tue May 5 01:44:40 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5894B2CA3A5 for ; Tue, 5 May 2020 01:44:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 49GMwS1m9Zz46J8 for ; Tue, 5 May 2020 01:44:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 3C6A32CA3A4; Tue, 5 May 2020 01:44:40 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3C2F12CA3A3 for ; Tue, 5 May 2020 01:44:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49GMwS0svZz46J7 for ; Tue, 5 May 2020 01:44:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id F35445F24 for ; Tue, 5 May 2020 01:44:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 0451idcu081149 for ; Tue, 5 May 2020 01:44:39 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 0451idnM081147 for bugs@FreeBSD.org; Tue, 5 May 2020 01:44:39 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 246207] [geom] geli livelocks during panic Date: Tue, 05 May 2020 01:44:39 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: asomers@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 May 2020 01:44:40 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D246207 Bug ID: 246207 Summary: [geom] geli livelocks during panic Product: Base System Version: 12.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: asomers@FreeBSD.org Some geli-using machines I administer occasionally panic. When they do, th= ey sometimes dump core but often don't. When they don't, they simply hang aft= er printing the stack trace, but before printing the uptime. I've traced the problem to geli's shutdown_pre_sync event handler. It trie= s to destroy each geli device. We can't simply skip that step if a panic is underway; erasing the keys is necessary to prevent warm-boot attacks. The problem lies in the following lines.=20=20 g_eli_destroy: sc->sc_flags |=3D G_ELI_FLAG_DESTROY; wakeup(sc); /* * Wait for kernel threads self destruction. */ while (!LIST_EMPTY(&sc->sc_workers)) { msleep(&sc->sc_workers, &sc->sc_queue_mtx, PRIBIO, "geli:destroy", 0); } _sleep: if (SCHEDULER_STOPPED_TD(td)) { if (lock !=3D NULL && priority & PDROP) class->lc_unlock(lock); return (0); } As you can see, if the scheduler is stopped for the current thread (which it will be during a panic), then msleep does nothing, cause g_eli_destroy to l= oop indefinitely. The obvious solution, which I haven't yet tested, would be to skip that section in g_eli_destroy when the scheduler is stopped. What I d= on't understand is why g_eli_destroy _ever_ works during a panic. Perhaps it has something to do with the allocation of worker threads among cores? Perhaps= it only succeeds when all worker threads happen to be on different cores? I f= ind that unlikely though, because these servers have thousands of worker thread= s. --=20 You are receiving this mail because: You are the assignee for the bug.=