Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Aug 2023 15:37:09 -0400
From:      Alexander Motin <mav@FreeBSD.org>
To:        =?UTF-8?Q?Dag-Erling_Sm=c3=b8rgrav?= <des@FreeBSD.org>
Cc:        current@freebsd.org, Mateusz Guzik <mjguzik@gmail.com>, Martin Matuska <mm@FreeBSD.org>
Subject:   Re: ZFS deadlock in 14
Message-ID:  <8c88acdc-7009-9801-ef44-3e1359c59aff@FreeBSD.org>
In-Reply-To: <197ead1e-210a-6be6-7e24-5c56b14bb777@FreeBSD.org>
References:  <86leeltqcb.fsf@ltc.des.no> <86h6p4s64h.fsf@ltc.des.no> <86a5utrafp.fsf@ltc.des.no> <86350kqokl.fsf@ltc.des.no> <CAGudoHHGeC4qaZpmTc15-Rimo78qVUmg8-oYveMfo0_JO45TSw@mail.gmail.com> <86y1icp95t.fsf@ltc.des.no> <CAGudoHFgyb2%2B3jkfmuzG86FqCQfNfPOuoWpsXFByY=YBYCN%2BFQ@mail.gmail.com> <86ttt0p8wv.fsf@ltc.des.no> <197ead1e-210a-6be6-7e24-5c56b14bb777@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 17.08.2023 14:57, Alexander Motin wrote:
> On 15.08.2023 12:28, Dag-Erling Smørgrav wrote:
>> Mateusz Guzik <mjguzik@gmail.com> writes:
>>> Going through the list may or may not reveal other threads doing
>>> something in the area and it very well may be they are deadlocked,
>>> which then results in other processes hanging on them.
>>>
>>> Just like in your case the process reported as hung is a random victim
>>> and whatever the real culprit is deeper.
>>
>> We already know the real culprit, see upthread.
> 
> Dag, I looked through the thread once more, and, while thank you for 
> tracing it, but you never went beyond txg_wait_synced() in `zfs revert` 
> thread.  If you are saying that thread is holding the lock, then the 
> question is why transaction commit is stuck.  I need to see stacks for 
> ZFS sync threads, or better all kernel stacks, just in case.  Without 
> that information I can only speculate.
> 
> Trying to run your test (so far without reproduction) I see it producing 
> a substantial amount of ZIL writes.  The range of commits you reduced 
> the scope to so far includes my ZIL locking refactoring, where I know 
> for sure are some deadlocks.  I am already waiting for 3 weeks now for 
> reviews and tests for PR that should fix it: 
> https://github.com/openzfs/zfs/pull/15122 .  It would be good if you 
> could test it, though it seems to depend on few more earlier patches not 
> merged to FreeBSD yet.

Ah, appears on the pool I tested first I have sync=always from earlier 
tests, that explains the high amount of ZIL traffic I saw, so it may be 
irrelevant.  But I still wonder what sync threads are doing in your case.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8c88acdc-7009-9801-ef44-3e1359c59aff>