Date: Fri, 1 Sep 2023 10:46:00 -0500 From: Kyle Evans <kevans@FreeBSD.org> To: Alexander Motin <mav@FreeBSD.org>, Martin Matuska <mm@freebsd.org> Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org, Cy Schubert <Cy.Schubert@cschubert.com> Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad Message-ID: <7b12cc47-0e41-ee8c-2165-9e81874c3490@FreeBSD.org> In-Reply-To: <65269e7a-4c3f-95ff-3e81-91b76e023fbd@FreeBSD.org> References: <202308270509.37R596B5048298@gitrepo.freebsd.org> <ZO_aOaf-eGiCMCKy@cell.glebi.us> <c09c92df-90f5-8c94-4125-9e33262bc686@FreeBSD.org> <a9a0b8b4-b47b-b629-37b6-1c18c8736859@FreeBSD.org> <65269e7a-4c3f-95ff-3e81-91b76e023fbd@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 9/1/23 08:41, Alexander Motin wrote: > On 31.08.2023 22:18, Kyle Evans wrote: >> It seems to have clearly been stomped on by uma trashing. Encountered >> while running a pkgbase build, I think while it was in the packaging >> phase. I note in particular in that frame: >> >> (kgdb) p/x lwb->lwb_issued_timestamp >> $4 = 0xdeadc0dedeadc0de >> >> So I guess it was freed sometime during one of the previous two >> zio_nowait() calls. > > Thank you, Kyle. If the source lines are resolved correctly and it > really crashes on lwb_child_zio access, then I do see there a possible > race condition, even though I think it would involve at least 2 or may > be even 3 different threads. > Oh, sorry- yes, it was the access to lwb_child_zio there. > I've just created this new PR to address it: > https://github.com/openzfs/zfs/pull/15233 > > If you'll be able to test it, include also the two previous: > https://github.com/openzfs/zfs/pull/15227 > https://github.com/openzfs/zfs/pull/15228 > > Thank you for something actionable, it really feels much better! :) > Perfect, thanks! I haven't been able to reproduce it since the first time, but your explanation sounds plausible to me. I'm not a ZFS developer, but it's not clear to me how I didn't end up tripping over other assertions, though; e.g., in zil_lwb_flush_vdevs_done: 1442 ASSERT3S(lwb->lwb_state, ==, LWB_STATE_WRITE_DONE); 1443 lwb->lwb_state = LWB_STATE_FLUSH_DONE; lwb_state seems to only be set to LWB_STATE_WRITE_DONE in zil_lwb_write_done (lwb_write_zio's completion routine). I would've thought all three of these were executed synchronously in __zio_execute(), which would presumably put us in LWB_STATE_ISSUED at the time of completing the lwb_root_zio? Thanks, Kyle Evans
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7b12cc47-0e41-ee8c-2165-9e81874c3490>