From nobody Tue Jan 11 02:50:42 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 85DA8193D3FF for ; Tue, 11 Jan 2022 02:50:52 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JXwDX3BlLz4ZDF for ; Tue, 11 Jan 2022 02:50:52 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qt1-x832.google.com with SMTP id c19so59220qtx.3 for ; Mon, 10 Jan 2022 18:50:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=RzibzEmS7euaQACVVoMzLL8s5aOtix9+BTlfFF8yVbU=; b=bxOF40L/qcl6D2rhIAzl497jVzEmM6kcnxeGOb2X3zo5pHfxqCWG9To8oAnr9IFeyO qc4B+/0XsZh2JnQO7+m9M305IiKf4quSgA4S7Kvl36p6gGxBuEVNIBVUykurOfrT2fM6 o8/oO6Pyw3VCsukeulODBEUpHx/u1oyneVK6HFssVMpXYqdRXofbSrbts0ADkuVWXQeD A8U+g6TENd8PS/YFWHKnFgVHTnj37uHWRTBvpUpNL22IKSDYMnvcT8nR5VRSSMvSOb+F JOXAH8VIVSR1RpDLDWGQd0uNIIOOf0RGVsaJHfSxMOU/A5kHCnvc90Q85JCmfK2Hmnor YLKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=RzibzEmS7euaQACVVoMzLL8s5aOtix9+BTlfFF8yVbU=; b=61FHMGMaOaBLFQDT7C6GvJlhWK7Zj51uYHaDwCV+yjISheW4l8IjGfd8SyFjMb7aFm TMJ9aLrpWjytMiP13gFS4Fdf2yLTA2oi287eoTPF4MgEYhq4RjpOuUNzhVKn5z/ZCy8w e1lLopbSk9gZ83PDCr9uXGhaD/S5GU1hl1W+/H+aShkqcOh+OtaYTsJQWQt9uXDW+LuN z6V0Y54FtdA+tZdnMaQp3uxEoxhvPNopF6n2I+aEcgya4C4CbuGr5iOEFAqXqLT6NgWr iPZD/NClwAxJhEC8liBuELuU6tAn6L3mC6P5hRedeXkRELU0KUS4XACpQx0B9y3WEObQ vC6Q== X-Gm-Message-State: AOAM5334Pafn1DzO0ykkGFJZwpBau71SLyJDXDKMpabZgK/A8QRH1Vj6 d2PqvVCbJ9KJ8iu1zSSHK79DHkwxx/o= X-Google-Smtp-Source: ABdhPJyz67FLcfl5u+gAOjsc26ngGwnHd2QA61QYF5FJPainvRcA/AAle3FOa5vfNONlB3j3QqthFw== X-Received: by 2002:a05:622a:5d2:: with SMTP id d18mr2233368qtb.154.1641869445762; Mon, 10 Jan 2022 18:50:45 -0800 (PST) Received: from nuc (198-84-189-58.cpe.teksavvy.com. [198.84.189.58]) by smtp.gmail.com with ESMTPSA id r20sm6214820qkp.21.2022.01.10.18.50.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jan 2022 18:50:44 -0800 (PST) Date: Mon, 10 Jan 2022 21:50:42 -0500 From: Mark Johnston To: Mateusz Guzik Cc: Shawn Webb , freebsd-hackers@freebsd.org Subject: Re: Debugging a (potentially?) ZFS-related panic, and discussion about large patchsets Message-ID: References: <20220110221116.gustgfgfge6pb5fe@mutt-hbsd> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4JXwDX3BlLz4ZDF X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, Jan 11, 2022 at 12:43:06AM +0100, Mateusz Guzik wrote: > On 1/11/22, Mark Johnston wrote: > > On Mon, Jan 10, 2022 at 05:11:16PM -0500, Shawn Webb wrote: > >> Hey all, > >> > >> So I'm getting an interesting ZFS-related kernel panic. I've uploaded > >> the core.txt at [0]. I suspect it's related to FreeBSD commit > >> 681ce946f33e75c590e97c53076e86dff1fe8f4a (zfs: merge > >> openzfs/zfs@f291fa658 (master) into main). > >> > >> I'm able to reproduce it on a single system with some level of > >> determinism: I'm building the security appliance firmware at ${DAYJOB} > >> in a bhyve VM that's backed by a zvol. The host is a Dell Precision > >> 7540 laptop with a single NVMe drive in it. The VM is configured with > >> a single zvol, booting with UEFI. > >> > >> Looking at the commit email sent to dev-commits-src-all@, I see this: > >> 146 files changed, 4933 insertions(+), 1572 deletions(-) > >> > >> Strangely, when I run `git show > >> 681ce946f33e75c590e97c53076e86dff1fe8f4a`, I only see a small subset > >> of those changes. > > > > That is a merge commit. You need to specify that you want a diff > > against the first parent (the preceding FreeBSD), so something > > equivalent to "git diff --stat 681ce946f^ 681ce946f". Use > > "git log 681ce946f^2" to see the merged OpenZFS commits. > > > >> As a downstream consumer of 14-CURRENT, how am I supposed to even > >> start debugging such a large patchset in any manner that respects my > >> time? > >> > >> It seems to me that breaking up commits into smaller, bite-size chunks > >> would make life easier for those experiencing bugs, especially ones > >> that result in kernel panics. > > > > That's up to the upstream project, in this case OpenZFS. > > > >> ZFS in and of itself is a beast, and I've yet to study any of its > >> code, so when there's a commit that large, even thinking about > >> debugging it is a daunting task. > >> > >> Needless to say, I'm going to need some hand holding here for > >> debugging this. Anyone have any idea what's going on? > > > > To start, you'll need to look at the stack trace for the thread with tid > > 100061. > > > > imo the kernel should be patched to obtain the trace on its own. As > the target has interrupts disabled it will have to do it with NMI, but > support for that got scrapped in > > commit 1c29da02798d968eb874b86221333a56393a94c3 > Author: Mark Johnston > Date: Fri Jan 31 15:43:33 2020 +0000 > > Reimplement stack capture of running threads on i386 and amd64. More general and useful, to me at least, is having "acttrace" output available in core.txt. So I propose https://reviews.freebsd.org/D33817 I don't think the NMI-based stack(9) machinery to capture stacks is really very useful here anyway. We already raise NMIs on all CPUs during a panic, so just reuse that and add some handler which can call kdb_backtrace() on the target CPU in ipi_nmi_handler().