From nobody Fri Dec 3 15:38:37 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id E624618B0927 for ; Fri, 3 Dec 2021 15:38:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J5H6m1jhgz3JxM for ; Fri, 3 Dec 2021 15:38:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi1-f170.google.com with SMTP id q25so6648202oiw.0 for ; Fri, 03 Dec 2021 07:38:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fzJrFs3DZ3yj0964dR+iIMkPqER+nlRgHIm3SIgnNNk=; b=lSAHrduK7I1rlxN/4aylmLO23tzGmRGYzSy4vSpSi2CqvBD/nnjcfEwM43XbWQKq+6 BYTqtNf6vYLQ95vn4a0DydwFblEbLqE1UVv+Lx3vVl85+R9uxDa3hi7qfy98YX0sy80c EGgumsMjD267Ss8EEGBPOflao4xHnfx5GydnY+vZ3N7pDrVefxGzktyAaftcUSGaTV7a pbTk+HNjRFhi6eZP9/AjkDO+q7O+d3khskYnaQI6QztvsQR185xH4lSrsB5zW5N3pKGl odW+delxi9iIXowhwXlGYsO3hEekD1NdqjZrOEnUy6i4TgXPPRCvIBBBUPvQ8mRsI1Vw Nqpw== X-Gm-Message-State: AOAM530q510dc/wfHH829we5ywgeOkugdGvdx5IANfUdbbrCdPfJ9AUZ XeIrh4lLV+TJMEd8v+WPP2O2b5lqHgklQXGdzLqI3wHHUjU= X-Google-Smtp-Source: ABdhPJx64Ke/yp/gxRJX59kt+Lo2CSO2KNALhaKyuGWM0/sNoezyABketDtkYX649oL04d+/5JFlJvtbyjcBfW+auMY= X-Received: by 2002:aca:1001:: with SMTP id 1mr10340879oiq.55.1638545928631; Fri, 03 Dec 2021 07:38:48 -0800 (PST) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Fri, 3 Dec 2021 08:38:37 -0700 Message-ID: Subject: Re: ZFS deadlocks triggered by HDD timeouts To: Warner Losh Cc: FreeBSD Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4J5H6m1jhgz3JxM X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.167.170 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [1.45 / 15.00]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[asomers]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c]; RCVD_TLS_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; NEURAL_SPAM_MEDIUM(0.63)[0.629]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-0.18)[-0.178]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[209.85.167.170:from]; NEURAL_SPAM_LONG(1.00)[1.000]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.85.167.170:from]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On Wed, Dec 1, 2021 at 3:48 PM Warner Losh wrote: > > > > On Wed, Dec 1, 2021, 3:36 PM Alan Somers wrote: >> >> On Wed, Dec 1, 2021 at 2:46 PM Warner Losh wrote: >> > >> > >> > >> > >> > On Wed, Dec 1, 2021, 2:36 PM Alan Somers wrote: >> >> >> >> On Wed, Dec 1, 2021 at 1:56 PM Warner Losh wrote: >> >> > >> >> > >> >> > >> >> > On Wed, Dec 1, 2021 at 1:47 PM Alan Somers wrote: >> >> >> >> >> >> On Wed, Dec 1, 2021 at 1:37 PM Warner Losh wrote: >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Wed, Dec 1, 2021 at 1:28 PM Alan Somers wrote: >> >> >> >> >> >> >> >> On Wed, Dec 1, 2021 at 11:25 AM Warner Losh wrote: >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > On Wed, Dec 1, 2021, 11:16 AM Alan Somers wrote: >> >> >> >> >> >> >> >> >> >> On a stable/13 build from 16-Sep-2021 I see frequent ZFS deadlocks >> >> >> >> >> triggered by HDD timeouts. The timeouts are probably caused by >> >> >> >> >> genuine hardware faults, but they didn't lead to deadlocks in >> >> >> >> >> 12.2-RELEASE or 13.0-RELEASE. Unfortunately I don't have much >> >> >> >> >> additional information. ZFS's stack traces aren't very informative, >> >> >> >> >> and dmesg doesn't show anything besides the usual information about >> >> >> >> >> the disk timeout. I don't see anything obviously related in the >> >> >> >> >> commit history for that time range, either. >> >> >> >> >> >> >> >> >> >> Has anybody else observed this phenomenon? Or does anybody have a >> >> >> >> >> good way to deliberately inject timeouts? CAM makes it easy enough to >> >> >> >> >> inject an error, but not a timeout. If it did, then I could bisect >> >> >> >> >> the problem. As it is I can only reproduce it on production servers. >> >> >> >> > >> >> >> >> > >> >> >> >> > What SIM? Timeouts are tricky because they have many sources, some of which are nonlocal... >> >> >> >> > >> >> >> >> > Warner >> >> >> >> >> >> >> >> mpr(4) >> >> >> > >> >> >> > >> >> >> > Is this just a single drive that's acting up, or is the controller initialized as part of the error recovery? >> >> >> >> >> >> I'm not doing anything fancy with mprutil or sas3flash, if that's what >> >> >> you're asking. >> >> > >> >> > >> >> > No. I'm asking if you've enabled debugging on the recovery messages and see that we enter any kind of >> >> > controller reset when the timeouts occur. >> >> >> >> No. My CAM setup is the default except that I enabled CAM_IO_STATS >> >> and changed the following two sysctls: >> >> kern.cam.da.retry_count=2 >> >> kern.cam.da.default_timeout=10 >> >> >> >> >> >> > >> >> >> >> >> >> > If a single drive, >> >> >> > are there multiple timeouts that happen at the same time such that we timeout a request while we're waiting for >> >> >> > the abort command we send to the firmware to be acknowledged? >> >> >> >> >> >> I don't know. >> >> > >> >> > >> >> > OK. >> >> > >> >> >> >> >> >> > Would you be able to run a kgdb script to see >> >> >> > if you're hitting a situation that I fixed in mpr that would cause I/O to never complete in this rather odd circumstance? >> >> >> > If you can, and if it is, then there's a change I can MFC :). >> >> >> >> >> >> Possibly. When would I run this kgdb script? Before ZFS locks up, >> >> >> after, or while the problematic timeout happens? >> >> > >> >> > >> >> > After the timeouts. I've been doing 'kgdb' followed by 'source mpr-hang.gdb' to run this. >> >> > >> >> > What you are looking for is anything with a qfrozen_cnt > 0.. The script is imperfect and racy >> >> > with normal operations (but not in a bad way), so you may need to run it a couple of times >> >> > to get consistent data. On my systems, there'd be one or two devices with a frozen count > 1 >> >> > and no I/O happened on those drives and processes hung. That might not be any different than >> >> > a deadlock :) >> >> > >> >> > Warner >> >> > >> >> > P.S. here's the mpr-hang.gdb script. Not sure if I can make an attachment survive the mailing lists :) >> >> >> >> Thanks, I'll try that. If this is the problem, do you have any idea >> >> why it wouldn't happen on 12.2-RELEASE (I haven't seen it on >> >> 13.0-RELEASE, but maybe I just don't have enough runtime on that >> >> version). >> > >> > >> > 9781c28c6d63 was merged to stable/13 as a996b55ab34c on Sept 2nd. I fixed a bug >> > with that version in current as a8837c77efd0, but haven't merged it. I kinda expect that >> > this might be the cause of the problem. But in Netflix's fleet we've seen this maybe a >> > couple of times a week over many thousands of machines, so I've been a little cautious >> > in merging it to make sure that it's really fixed. So far, the jury is out. >> > >> > Warner >> >> Well, I'm experiencing this error much more frequently than you then. >> I've seen it on about 10% of similarly-configured servers and they've >> only been running that release for 1 week. > > > You can run my script soon then to see if it's the same thing. > > Warner > >> -Alan That confirms it. I hit the deadlock again, and qfrozen_cnt was between 1 and 3 for four devices: two da devices (we use multipath) and their accompanying pass devices. So I should try merging a8837c77efd0 next?