From nobody Wed Dec 1 21:45:52 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id ECFA918B3D1F for ; Wed, 1 Dec 2021 21:46:04 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x936.google.com (mail-ua1-x936.google.com [IPv6:2607:f8b0:4864:20::936]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J4CMJ2PCWz3PTd for ; Wed, 1 Dec 2021 21:46:04 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x936.google.com with SMTP id p37so51988375uae.8 for ; Wed, 01 Dec 2021 13:46:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VY5myeNRupeTkhKJCPm4lkwNcAzPNnpfmgtrF9Y4R3U=; b=fDffibUchlA4nabjlP7nNFYBhI+YdJQAtbS4kbDcqiOPJjGOOmlAOgX0jTlyP4qEkJ og3A5uRKf+L7l46msp2DZDQXE4ojnFj+a6e0GcGpfVKK8Mvy5q8KADX38raotfFLrH4V 4or94RfYu4OlDiKUJ0ioHCKC22CGpIeRWLDLG/sG5qZKZssbQFoKCU665oY+2T/5/UHM S86G8HJEO/FkeTjRyCa+DZDIqFsndproeDOcQrQGOEipaToiKpsgRZg9Oinh75aIufsT RdCMTMgB4c8ueY76bER7NBmYIJCdx0nONmjkHxWjkEXxrepe+HjK1jqiW4G+cDh/VApu SAGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VY5myeNRupeTkhKJCPm4lkwNcAzPNnpfmgtrF9Y4R3U=; b=69v+r6E2MRFWZILK6HUDtm1ni99MzVpbZjVGY15q4jpzR4a8D0/DZbegabuqY/u8DJ CMKUCd1KfHyB5Gyn1fQGrg5wGJCrb/+UvuUas5Mmx2CyIHZbidKsJQBoWOzcvGKMHpkA wZHALyQg5bI0ZQpTMCpxoKisMwPkYOjbg7+QP2Es7eFCsqxNzVLUvnEqHsl43U/wjEvH mMQKofi/UHxzNj9ZBpsm1uBUA1+O7jY16Ys9PoFmS1fVc//lcaTUPQblRNiU8xcEg2VY nn1M6GF0qdeTXqtcgU4aUYa8KgurbM7ED90fJHXF/hiMV6MdNhVOgwm4fU4IF1I5QPd9 ISMw== X-Gm-Message-State: AOAM5318zfjU5w4QRkI8BLqFTGbPiP5DdircxWrPF5+AOA3uMcxIMQ9E eDtgT7RnP17OJDWQbWuYBfo2MCsS8pX4WVxFCaRNcdH9cus10tAt X-Google-Smtp-Source: ABdhPJwQiF8kiAWLNkjeSiFoYJtdf0t+6STjBExsW07sC2AplOcLnZdbdGof3T3zcGiguYYz4WEyd8cDKhrxqjkzcrA= X-Received: by 2002:a67:d508:: with SMTP id l8mr10523177vsj.42.1638395163285; Wed, 01 Dec 2021 13:46:03 -0800 (PST) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Wed, 1 Dec 2021 14:45:52 -0700 Message-ID: Subject: Re: ZFS deadlocks triggered by HDD timeouts To: Alan Somers Cc: FreeBSD Content-Type: multipart/alternative; boundary="000000000000b5ad3605d21c996f" X-Rspamd-Queue-Id: 4J4CMJ2PCWz3PTd X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --000000000000b5ad3605d21c996f Content-Type: text/plain; charset="UTF-8" On Wed, Dec 1, 2021, 2:36 PM Alan Somers wrote: > On Wed, Dec 1, 2021 at 1:56 PM Warner Losh wrote: > > > > > > > > On Wed, Dec 1, 2021 at 1:47 PM Alan Somers wrote: > >> > >> On Wed, Dec 1, 2021 at 1:37 PM Warner Losh wrote: > >> > > >> > > >> > > >> > On Wed, Dec 1, 2021 at 1:28 PM Alan Somers > wrote: > >> >> > >> >> On Wed, Dec 1, 2021 at 11:25 AM Warner Losh wrote: > >> >> > > >> >> > > >> >> > > >> >> > On Wed, Dec 1, 2021, 11:16 AM Alan Somers > wrote: > >> >> >> > >> >> >> On a stable/13 build from 16-Sep-2021 I see frequent ZFS deadlocks > >> >> >> triggered by HDD timeouts. The timeouts are probably caused by > >> >> >> genuine hardware faults, but they didn't lead to deadlocks in > >> >> >> 12.2-RELEASE or 13.0-RELEASE. Unfortunately I don't have much > >> >> >> additional information. ZFS's stack traces aren't very > informative, > >> >> >> and dmesg doesn't show anything besides the usual information > about > >> >> >> the disk timeout. I don't see anything obviously related in the > >> >> >> commit history for that time range, either. > >> >> >> > >> >> >> Has anybody else observed this phenomenon? Or does anybody have a > >> >> >> good way to deliberately inject timeouts? CAM makes it easy > enough to > >> >> >> inject an error, but not a timeout. If it did, then I could > bisect > >> >> >> the problem. As it is I can only reproduce it on production > servers. > >> >> > > >> >> > > >> >> > What SIM? Timeouts are tricky because they have many sources, some > of which are nonlocal... > >> >> > > >> >> > Warner > >> >> > >> >> mpr(4) > >> > > >> > > >> > Is this just a single drive that's acting up, or is the controller > initialized as part of the error recovery? > >> > >> I'm not doing anything fancy with mprutil or sas3flash, if that's what > >> you're asking. > > > > > > No. I'm asking if you've enabled debugging on the recovery messages and > see that we enter any kind of > > controller reset when the timeouts occur. > > No. My CAM setup is the default except that I enabled CAM_IO_STATS > and changed the following two sysctls: > kern.cam.da.retry_count=2 > kern.cam.da.default_timeout=10 > > > > > >> > >> > If a single drive, > >> > are there multiple timeouts that happen at the same time such that we > timeout a request while we're waiting for > >> > the abort command we send to the firmware to be acknowledged? > >> > >> I don't know. > > > > > > OK. > > > >> > >> > Would you be able to run a kgdb script to see > >> > if you're hitting a situation that I fixed in mpr that would cause > I/O to never complete in this rather odd circumstance? > >> > If you can, and if it is, then there's a change I can MFC :). > >> > >> Possibly. When would I run this kgdb script? Before ZFS locks up, > >> after, or while the problematic timeout happens? > > > > > > After the timeouts. I've been doing 'kgdb' followed by 'source > mpr-hang.gdb' to run this. > > > > What you are looking for is anything with a qfrozen_cnt > 0.. The script > is imperfect and racy > > with normal operations (but not in a bad way), so you may need to run it > a couple of times > > to get consistent data. On my systems, there'd be one or two devices > with a frozen count > 1 > > and no I/O happened on those drives and processes hung. That might not > be any different than > > a deadlock :) > > > > Warner > > > > P.S. here's the mpr-hang.gdb script. Not sure if I can make an > attachment survive the mailing lists :) > > Thanks, I'll try that. If this is the problem, do you have any idea > why it wouldn't happen on 12.2-RELEASE (I haven't seen it on > 13.0-RELEASE, but maybe I just don't have enough runtime on that > version). > 9781c28c6d63 was merged to stable/13 as a996b55ab34c on Sept 2nd. I fixed a bug with that version in current as a8837c77efd0, but haven't merged it. I kinda expect that this might be the cause of the problem. But in Netflix's fleet we've seen this maybe a couple of times a week over many thousands of machines, so I've been a little cautious in merging it to make sure that it's really fixed. So far, the jury is out. Warner --000000000000b5ad3605d21c996f--