From nobody Fri Sep 1 16:04:22 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcjXr15NTz4rddr; Fri, 1 Sep 2023 16:05:00 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcjXq3Pmlz3Ldj; Fri, 1 Sep 2023 16:04:59 +0000 (UTC) (envelope-from mavbsd@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-59288c68eb8so23316407b3.0; Fri, 01 Sep 2023 09:04:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693584298; x=1694189098; darn=freebsd.org; h=content-transfer-encoding:in-reply-to:subject:from:references:cc:to :content-language:user-agent:mime-version:date:message-id:sender :from:to:cc:subject:date:message-id:reply-to; bh=nFZK5MS1H9OusUArnHQ8sF6Ph3vCH8rDrTI6UZmR+ac=; b=mzYwKi8SmUZuaylRAR+3uWFWD8XdxlJouw75lYLwUiLHcP4/ACA2Rbu/GM4uW29/y7 Mzbyt4z1uHjH6FkYP9+XMsIhSIa5P3LKtxxXwztuMAfGVYZn9blXKAo4zictna6Ndg/8 VBcfV5aZbYA+fx/KWW9wdlVR6UkJwc8xxl/D0HkdKNIJjeRKAI7IwoGXXJ2/viEMDx2f VZ1o5LzVXa8cc+HXJgJuSHyE2FhOnYSueAuwyd/Jgj5FXGhMED1nUPI500owDPsZcW50 tXDJ7ttbkad5+cbTJ4tT/K+wXfuCakim/6AZQANYGtGnR3nlXLft2JF4zxeCwB612AyN Feiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693584298; x=1694189098; h=content-transfer-encoding:in-reply-to:subject:from:references:cc:to :content-language:user-agent:mime-version:date:message-id:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nFZK5MS1H9OusUArnHQ8sF6Ph3vCH8rDrTI6UZmR+ac=; b=Y+Rtk9bjF8fPB0VmKM0bfouvUMoRX9ScgBdUkPkQYcO64J2XUfP12cENv2L33s1JDs g5nju41336MkHhE1RJWWS6jGgl7R5EWHKM4IgfxxyS3SzLh9cKLHiuv3byU8olQXj12v ZLTeFKAhJxf7laLIkdxwoVOrG+SZ4A+QZIWXGbBY5U9gJ9Adg3i436ZH4DEMhvw/06g7 RINZ6dxVwWAxdH8rPRk2LdfkARY0Lcpg9FUs+7VZAhlA/Hpk8/xTaY5yi4JNZpVMLsmQ z3JiPCFjOGkP21Uepym10KM3pYmkByHVGZw3Y7Owwo0wiZBTzxNp8VDW7vbUKibn7OQd nMhA== X-Gm-Message-State: AOJu0YyApDVSKRyDoYjPXsCRdGsMUDvpjamN0wWjmWGI02SHHgazqR0z zUvDCVHfM8BYDFuuGjbghvSaiPYjjmU24g== X-Google-Smtp-Source: AGHT+IHz0ByN3sT+HCDTrDJ5TLO3+hqeTLYQ2hnQh0BMvIOzqBqNYLDhyon6Iwd/I+TtR+BO0AoO8Q== X-Received: by 2002:a0d:c945:0:b0:589:8b55:fe09 with SMTP id l66-20020a0dc945000000b005898b55fe09mr2855844ywd.50.1693584298240; Fri, 01 Sep 2023 09:04:58 -0700 (PDT) Received: from [10.230.45.5] ([38.32.73.2]) by smtp.gmail.com with ESMTPSA id s6-20020a0dd006000000b005924fb1be44sm1132860ywd.139.2023.09.01.09.04.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 01 Sep 2023 09:04:57 -0700 (PDT) Message-ID: <80777717-1d67-104a-94f6-2ac8112e41b8@FreeBSD.org> Date: Fri, 1 Sep 2023 12:04:22 -0400 List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Content-Language: en-US To: Kyle Evans , Martin Matuska Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org, Cy Schubert References: <202308270509.37R596B5048298@gitrepo.freebsd.org> <65269e7a-4c3f-95ff-3e81-91b76e023fbd@FreeBSD.org> <7b12cc47-0e41-ee8c-2165-9e81874c3490@FreeBSD.org> From: Alexander Motin Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad In-Reply-To: <7b12cc47-0e41-ee8c-2165-9e81874c3490@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4RcjXq3Pmlz3Ldj On 01.09.2023 11:46, Kyle Evans wrote: > On 9/1/23 08:41, Alexander Motin wrote: >> On 31.08.2023 22:18, Kyle Evans wrote: >>> It seems to have clearly been stomped on by uma trashing. Encountered >>> while running a pkgbase build, I think while it was in the packaging >>> phase. I note in particular in that frame: >>> >>> (kgdb) p/x lwb->lwb_issued_timestamp >>> $4 = 0xdeadc0dedeadc0de >>> >>> So I guess it was freed sometime during one of the previous two >>> zio_nowait() calls. >> >> Thank you, Kyle.  If the source lines are resolved correctly and it >> really crashes on lwb_child_zio access, then I do see there a possible >> race condition, even though I think it would involve at least 2 or may >> be even 3 different threads. >> > > Oh, sorry- yes, it was the access to lwb_child_zio there. > > >> I've just created this new PR to address it: >> https://github.com/openzfs/zfs/pull/15233 >> >> If you'll be able to test it, include also the two previous: >> https://github.com/openzfs/zfs/pull/15227 >> https://github.com/openzfs/zfs/pull/15228 >> >> Thank you for something actionable, it really feels much better! :) >> > > Perfect, thanks! I haven't been able to reproduce it since the first > time, but your explanation sounds plausible to me. > > I'm not a ZFS developer, but it's not clear to me how I didn't end up > tripping over other assertions, though; e.g., in zil_lwb_flush_vdevs_done: > > 1442         ASSERT3S(lwb->lwb_state, ==, LWB_STATE_WRITE_DONE); > 1443         lwb->lwb_state = LWB_STATE_FLUSH_DONE; > > lwb_state seems to only be set to LWB_STATE_WRITE_DONE in > zil_lwb_write_done (lwb_write_zio's completion routine). I would've > thought all three of these were executed synchronously in > __zio_execute(), which would presumably put us in LWB_STATE_ISSUED at > the time of completing the lwb_root_zio? That is where ZIO dependencies work. lwb_root_zio can never complete before lwb_write_zio completion. So first zil_lwb_write_done() on lwb_write_zio completion should move the lwb to LWB_STATE_WRITE_DONE, then zil_lwb_flush_vdevs_done() on lwb_root_zio completion should move it to LWB_STATE_FLUSH_DONE, at which state zil_sync() can free it. If only at that point we try to check lwb->lwb_child_zio, we see the 0xdeadc0dedeadc0de and try call zio_nowait() on it with the result you saw. Would lwb_child_zio actually be used by the specific lwb, lwb_write_zio could not proceed before its completion first and so late zio_nowait() call for it would be legal, but not otherwise. -- Alexander Motin