From nobody Sat Dec 4 00:28:03 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 03A7818D205D for ; Sat, 4 Dec 2021 00:28:22 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-vk1-xa2d.google.com (mail-vk1-xa2d.google.com [IPv6:2607:f8b0:4864:20::a2d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J5Vsd6PcDz4rFs for ; Sat, 4 Dec 2021 00:28:21 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-vk1-xa2d.google.com with SMTP id e27so2948624vkd.4 for ; Fri, 03 Dec 2021 16:28:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=k/lkbrMjeUAzojxbYVWvs9t7CSmgUBOw5UOgXJDGwF8=; b=1oE6Q/p3cPJ9iTVZ8OzqVG5sYaBFTS8TueKk1OAnW3l00O9nWTXDn1KN/CyBxGeovJ YvM47u+dW3Gx0wPvg9zDMj9swuVgDmVrd91Rq645yrsgIT2A8lq0Ef1XPDr6nv+dEUQ5 mHC57YU9lEpiWcKZScxtA/dCLsUP8eSpeNl5MWR4E3gLhTcCOoJRCdfs75o4gjZzz/1E DNoI0OS3GGDtFaBQgWRUn+LKCIzl+3VOWm3yE8DqCN05Rvs7wUUmgcb4OfWghRVqIkD8 N9dAHBEJb2j9W20LmtfmqAVdsWTeqXT7ovF48r/7YNWtlHW0YEbVrPDOOcfKZ9gBV2sz 2huQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k/lkbrMjeUAzojxbYVWvs9t7CSmgUBOw5UOgXJDGwF8=; b=lXkwGyIHENUC5fhSi5IVAm9jUn0oQ/BJZ7x9raT93d/DWwtK8m6+TtUy1bIiBc1W9D 40TZgMWqyWIHKfHF5AQjsON52XApEXWXoKrwLtsTkR160xqt6CDKE+3TeQfBrBWReoZx UCVMkwOEt8sX/FGYIpgVUlCZdsw+eL/MhJ9EaGQJ7KxOEEsJYyLknHDOjX1doaUAPiMo PKMbE3mMvvsjejWQ1NMv4pkwPpNb5+VN20gi8OBvwzB1MC4qN7yglpJRtNs7zDryK9Zi 6AEc8f5eEhxy5CbKOSpXkHCbXkk33pBrIVWcQ4FXm5jumgkunmrGthr48GQ7m4Yo+KRR fXEA== X-Gm-Message-State: AOAM531/vbLtKp4vsNAldRfmHtD/kRSpnIEoBMD3SALeif2F5JWfPCiI Iir3BJmzO+gKofs+aODmZs0xf5KyKyjANqjtgtsSzHKxv04hqVRf X-Google-Smtp-Source: ABdhPJzFregj6US8TmFe2MumxXm8n2+fwcTAPgpZS/TDjJkfXRz9WtcN1fYDSoTORE3aPckoqxdn2cJ0rkvQdMqeMIc= X-Received: by 2002:a1f:c9c2:: with SMTP id z185mr27953595vkf.26.1638577694824; Fri, 03 Dec 2021 16:28:14 -0800 (PST) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Fri, 3 Dec 2021 17:28:03 -0700 Message-ID: Subject: Re: ZFS deadlocks triggered by HDD timeouts To: Alan Somers Cc: FreeBSD Content-Type: multipart/alternative; boundary="0000000000006fef3905d2471928" X-Rspamd-Queue-Id: 4J5Vsd6PcDz4rFs X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --0000000000006fef3905d2471928 Content-Type: text/plain; charset="UTF-8" Hey Alan, On Fri, Dec 3, 2021 at 5:26 PM Alan Somers wrote: > On Fri, Dec 3, 2021 at 5:19 PM Warner Losh wrote: > > > > Hey Alan, > > > > On Fri, Dec 3, 2021 at 8:38 AM Alan Somers wrote: > >> > >> On Wed, Dec 1, 2021 at 3:48 PM Warner Losh wrote: > >> > > >> > > >> > > >> > On Wed, Dec 1, 2021, 3:36 PM Alan Somers wrote: > >> >> > >> >> On Wed, Dec 1, 2021 at 2:46 PM Warner Losh wrote: > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Wed, Dec 1, 2021, 2:36 PM Alan Somers > wrote: > >> >> >> > >> >> >> On Wed, Dec 1, 2021 at 1:56 PM Warner Losh > wrote: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > On Wed, Dec 1, 2021 at 1:47 PM Alan Somers > wrote: > >> >> >> >> > >> >> >> >> On Wed, Dec 1, 2021 at 1:37 PM Warner Losh > wrote: > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > On Wed, Dec 1, 2021 at 1:28 PM Alan Somers < > asomers@freebsd.org> wrote: > >> >> >> >> >> > >> >> >> >> >> On Wed, Dec 1, 2021 at 11:25 AM Warner Losh > wrote: > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > On Wed, Dec 1, 2021, 11:16 AM Alan Somers < > asomers@freebsd.org> wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> On a stable/13 build from 16-Sep-2021 I see frequent ZFS > deadlocks > >> >> >> >> >> >> triggered by HDD timeouts. The timeouts are probably > caused by > >> >> >> >> >> >> genuine hardware faults, but they didn't lead to > deadlocks in > >> >> >> >> >> >> 12.2-RELEASE or 13.0-RELEASE. Unfortunately I don't > have much > >> >> >> >> >> >> additional information. ZFS's stack traces aren't very > informative, > >> >> >> >> >> >> and dmesg doesn't show anything besides the usual > information about > >> >> >> >> >> >> the disk timeout. I don't see anything obviously > related in the > >> >> >> >> >> >> commit history for that time range, either. > >> >> >> >> >> >> > >> >> >> >> >> >> Has anybody else observed this phenomenon? Or does > anybody have a > >> >> >> >> >> >> good way to deliberately inject timeouts? CAM makes it > easy enough to > >> >> >> >> >> >> inject an error, but not a timeout. If it did, then I > could bisect > >> >> >> >> >> >> the problem. As it is I can only reproduce it on > production servers. > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > What SIM? Timeouts are tricky because they have many > sources, some of which are nonlocal... > >> >> >> >> >> > > >> >> >> >> >> > Warner > >> >> >> >> >> > >> >> >> >> >> mpr(4) > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > Is this just a single drive that's acting up, or is the > controller initialized as part of the error recovery? > >> >> >> >> > >> >> >> >> I'm not doing anything fancy with mprutil or sas3flash, if > that's what > >> >> >> >> you're asking. > >> >> >> > > >> >> >> > > >> >> >> > No. I'm asking if you've enabled debugging on the recovery > messages and see that we enter any kind of > >> >> >> > controller reset when the timeouts occur. > >> >> >> > >> >> >> No. My CAM setup is the default except that I enabled > CAM_IO_STATS > >> >> >> and changed the following two sysctls: > >> >> >> kern.cam.da.retry_count=2 > >> >> >> kern.cam.da.default_timeout=10 > >> >> >> > >> >> >> > >> >> >> > > >> >> >> >> > >> >> >> >> > If a single drive, > >> >> >> >> > are there multiple timeouts that happen at the same time > such that we timeout a request while we're waiting for > >> >> >> >> > the abort command we send to the firmware to be acknowledged? > >> >> >> >> > >> >> >> >> I don't know. > >> >> >> > > >> >> >> > > >> >> >> > OK. > >> >> >> > > >> >> >> >> > >> >> >> >> > Would you be able to run a kgdb script to see > >> >> >> >> > if you're hitting a situation that I fixed in mpr that would > cause I/O to never complete in this rather odd circumstance? > >> >> >> >> > If you can, and if it is, then there's a change I can MFC :). > >> >> >> >> > >> >> >> >> Possibly. When would I run this kgdb script? Before ZFS > locks up, > >> >> >> >> after, or while the problematic timeout happens? > >> >> >> > > >> >> >> > > >> >> >> > After the timeouts. I've been doing 'kgdb' followed by 'source > mpr-hang.gdb' to run this. > >> >> >> > > >> >> >> > What you are looking for is anything with a qfrozen_cnt > 0.. > The script is imperfect and racy > >> >> >> > with normal operations (but not in a bad way), so you may need > to run it a couple of times > >> >> >> > to get consistent data. On my systems, there'd be one or two > devices with a frozen count > 1 > >> >> >> > and no I/O happened on those drives and processes hung. That > might not be any different than > >> >> >> > a deadlock :) > >> >> >> > > >> >> >> > Warner > >> >> >> > > >> >> >> > P.S. here's the mpr-hang.gdb script. Not sure if I can make an > attachment survive the mailing lists :) > >> >> >> > >> >> >> Thanks, I'll try that. If this is the problem, do you have any > idea > >> >> >> why it wouldn't happen on 12.2-RELEASE (I haven't seen it on > >> >> >> 13.0-RELEASE, but maybe I just don't have enough runtime on that > >> >> >> version). > >> >> > > >> >> > > >> >> > 9781c28c6d63 was merged to stable/13 as a996b55ab34c on Sept 2nd. > I fixed a bug > >> >> > with that version in current as a8837c77efd0, but haven't merged > it. I kinda expect that > >> >> > this might be the cause of the problem. But in Netflix's fleet > we've seen this maybe a > >> >> > couple of times a week over many thousands of machines, so I've > been a little cautious > >> >> > in merging it to make sure that it's really fixed. So far, the > jury is out. > >> >> > > >> >> > Warner > >> >> > >> >> Well, I'm experiencing this error much more frequently than you then. > >> >> I've seen it on about 10% of similarly-configured servers and they've > >> >> only been running that release for 1 week. > >> > > >> > > >> > You can run my script soon then to see if it's the same thing. > >> > > >> > Warner > >> > > >> >> -Alan > >> > >> That confirms it. I hit the deadlock again, and qfrozen_cnt was > >> between 1 and 3 for four devices: two da devices (we use multipath) > >> and their accompanying pass devices. So I should try merging > >> a8837c77efd0 next? > > > > > > Yes. I'd planned on merging it this weekend, but if you wanted a jump > > on me, that's the next step. > > > > Warner > > It merged without conflict, and I'm testing it now. But without a way > to inject timeouts I can't tell whether it's working. > You can enable, at runtime, the 'recovery' messages from the mpr driver. >From those you'll know if you are hitting the timeout when timeout active case. dev.mpr.0.debug_level=info,fault,recovery is what I think I use. Warner --0000000000006fef3905d2471928--