Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 07 Nov 2018 18:57:30 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 229614] ZFS lockup in zil_commit_impl
Message-ID:  <bug-229614-3630-48X2Y3x4pK@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-229614-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-229614-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D229614

--- Comment #39 from Michael Gmelin <grembo@FreeBSD.org> ---
(In reply to Allan Jude from comment #38)
> The patch I committed only stops the deadlock in the case where the vnode=
 being reclaimed belongs to a deleted file. It can still happen in other ci=
rcumstances.

Obviously (like avg pointed our), but the main issue in this test case was
`find` iterating over files that were deleted. So beforehand it would hang 9
out of 10 times when running the test case, while I couldn't reproduce it a=
fter
patching the kernel. So the patch is effective for this specific scenario.
Let's see if Andreas' CI runs into issues again (like he pointed out, his
observations in the last two weeks were done using an unpatched kernel by
accident).

> I am wondering if the correct approach is just to limit the number of tim=
es it can loop, and return an error. I am not sure what side effects return=
ing an error will have.

Based on my gut feeling this sounds dangerous, just thinking of possible
scenarios where one process could cause another process to make an important
write fail that way where it wasn't expected. What kind of error would you
return in such a scenario, maybe EINTR?

--=20
You are receiving this mail because:
You are on the CC list for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-229614-3630-48X2Y3x4pK>