From nobody Wed Dec 1 21:35:50 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6F92918AC734 for ; Wed, 1 Dec 2021 21:36:02 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J4C7k2cN3z3Jm0 for ; Wed, 1 Dec 2021 21:36:02 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-ot1-f45.google.com with SMTP id i5-20020a05683033e500b0057a369ac614so15732420otu.10 for ; Wed, 01 Dec 2021 13:36:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=I623DG4OO94ZGrxXmIYt4H1MiMc1Ex4G41ONZZko2r8=; b=cqWlU/AzFbM8LdSpc4rCkkoDiiRuLGktjt9DhSsN+IrZB9LrsBi2KBuVtmhF0OGGCq lj4UmTiBvxLBnT+rAILm1zRLF+qiJLbWHyddoFX/IP+teRyPlVd+hEZd4dEODwLpHlP7 CLwa5t+07PWRhGGLGVEFB9iT8vQwGvCxLYVlf3ky1qyJAqxxf1dmONKCzCI98Obwf0b3 wyZTy9Tk+OqpO30GJahoQQa0ruX8XM+9f7JYeusGB8aMN6PXOU1Bsa6XoGhaT+gsVGLG +UabUSLtB0Pbn5fRKT0T513H9vJyJRbj40nB9PtpQWG2IWxPCLT5RBowVyyCE8w071i8 aiAQ== X-Gm-Message-State: AOAM532Te50L4YcwYlS/x/MOftX9Tg7ZujCRPx/1QekylqzCWQxqxGp0 XjCH90KIT7kVH9n7NojW8OboUjxjtlquS9LinV3XmfUi X-Google-Smtp-Source: ABdhPJx474MGs+vdLzDzVkfBv1KwUT3f5wdRX0+6DiBkAnzwfErR8OXIiXrWDwaf8qiSc+WZKy5BMifgwGJiKFh8tx0= X-Received: by 2002:a9d:7cce:: with SMTP id r14mr7905695otn.114.1638394561700; Wed, 01 Dec 2021 13:36:01 -0800 (PST) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Wed, 1 Dec 2021 14:35:50 -0700 Message-ID: Subject: Re: ZFS deadlocks triggered by HDD timeouts To: Warner Losh Cc: FreeBSD Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4J4C7k2cN3z3Jm0 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Wed, Dec 1, 2021 at 1:56 PM Warner Losh wrote: > > > > On Wed, Dec 1, 2021 at 1:47 PM Alan Somers wrote: >> >> On Wed, Dec 1, 2021 at 1:37 PM Warner Losh wrote: >> > >> > >> > >> > On Wed, Dec 1, 2021 at 1:28 PM Alan Somers wrote: >> >> >> >> On Wed, Dec 1, 2021 at 11:25 AM Warner Losh wrote: >> >> > >> >> > >> >> > >> >> > On Wed, Dec 1, 2021, 11:16 AM Alan Somers wrote: >> >> >> >> >> >> On a stable/13 build from 16-Sep-2021 I see frequent ZFS deadlocks >> >> >> triggered by HDD timeouts. The timeouts are probably caused by >> >> >> genuine hardware faults, but they didn't lead to deadlocks in >> >> >> 12.2-RELEASE or 13.0-RELEASE. Unfortunately I don't have much >> >> >> additional information. ZFS's stack traces aren't very informative, >> >> >> and dmesg doesn't show anything besides the usual information about >> >> >> the disk timeout. I don't see anything obviously related in the >> >> >> commit history for that time range, either. >> >> >> >> >> >> Has anybody else observed this phenomenon? Or does anybody have a >> >> >> good way to deliberately inject timeouts? CAM makes it easy enough to >> >> >> inject an error, but not a timeout. If it did, then I could bisect >> >> >> the problem. As it is I can only reproduce it on production servers. >> >> > >> >> > >> >> > What SIM? Timeouts are tricky because they have many sources, some of which are nonlocal... >> >> > >> >> > Warner >> >> >> >> mpr(4) >> > >> > >> > Is this just a single drive that's acting up, or is the controller initialized as part of the error recovery? >> >> I'm not doing anything fancy with mprutil or sas3flash, if that's what >> you're asking. > > > No. I'm asking if you've enabled debugging on the recovery messages and see that we enter any kind of > controller reset when the timeouts occur. No. My CAM setup is the default except that I enabled CAM_IO_STATS and changed the following two sysctls: kern.cam.da.retry_count=2 kern.cam.da.default_timeout=10 > >> >> > If a single drive, >> > are there multiple timeouts that happen at the same time such that we timeout a request while we're waiting for >> > the abort command we send to the firmware to be acknowledged? >> >> I don't know. > > > OK. > >> >> > Would you be able to run a kgdb script to see >> > if you're hitting a situation that I fixed in mpr that would cause I/O to never complete in this rather odd circumstance? >> > If you can, and if it is, then there's a change I can MFC :). >> >> Possibly. When would I run this kgdb script? Before ZFS locks up, >> after, or while the problematic timeout happens? > > > After the timeouts. I've been doing 'kgdb' followed by 'source mpr-hang.gdb' to run this. > > What you are looking for is anything with a qfrozen_cnt > 0.. The script is imperfect and racy > with normal operations (but not in a bad way), so you may need to run it a couple of times > to get consistent data. On my systems, there'd be one or two devices with a frozen count > 1 > and no I/O happened on those drives and processes hung. That might not be any different than > a deadlock :) > > Warner > > P.S. here's the mpr-hang.gdb script. Not sure if I can make an attachment survive the mailing lists :) Thanks, I'll try that. If this is the problem, do you have any idea why it wouldn't happen on 12.2-RELEASE (I haven't seen it on 13.0-RELEASE, but maybe I just don't have enough runtime on that version). > > Warner