Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Aug 2023 11:24:00 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Alexander Motin <mav@FreeBSD.org>, Current FreeBSD <freebsd-current@freebsd.org>
Subject:   Re: ZFS deadlock in 14
Message-ID:  <4FFAE432-21FE-4462-9162-9CC30A5D470A@yahoo.com>
References:  <4FFAE432-21FE-4462-9162-9CC30A5D470A.ref@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Alexander Motin <mav_at_FreeBSD.org> wrote on
Date: Tue, 22 Aug 2023 16:18:12 UTC :

> I am waiting for final test results from George Wilson and then will=20=

> request quick merge of both to zfs-2.2-release branch. Unfortunately=20=

> there are still not many reviewers for the PR, since the code is not=20=

> trivial, but at least with the test reports Brian Behlendorf and Mark=20=

> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else=20=

> have tested and/or reviewed the PR, you may comment on it.

I had written to the list that when I tried to test the system
doing poudriere builds (initially with your patches) using
USE_TMPFS=3Dno so that zfs had to deal with all the file I/O, I
instead got only one builder that ended up active, the others
never reaching "Builder started":

[00:01:34] [01] [00:00:00] Builder starting
[00:01:57] [01] [00:00:23] Builder started
[00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4
[00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: Success
[00:03:21] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
[00:03:21] [02] [00:00:00] Builder starting
[00:03:21] [03] [00:00:00] Builder starting
[00:03:21] [04] [00:00:00] Builder starting
[00:03:21] [05] [00:00:00] Builder starting
[00:03:21] [06] [00:00:00] Builder starting
[00:03:21] [07] [00:00:00] Builder starting
[00:03:22] [08] [00:00:00] Builder starting
[00:03:22] [09] [00:00:00] Builder starting
[00:03:22] [10] [00:00:00] Builder starting
[00:03:22] [11] [00:00:00] Builder starting
[00:03:22] [12] [00:00:00] Builder starting
[00:03:22] [13] [00:00:00] Builder starting
[00:03:22] [14] [00:00:00] Builder starting
[00:03:22] [15] [00:00:00] Builder starting
[00:03:22] [16] [00:00:00] Builder starting
[00:03:22] [17] [00:00:00] Builder starting
[00:03:22] [18] [00:00:00] Builder starting
[00:03:22] [19] [00:00:00] Builder starting
[00:03:22] [20] [00:00:00] Builder starting
[00:03:22] [21] [00:00:00] Builder starting
[00:03:22] [22] [00:00:00] Builder starting
[00:03:22] [23] [00:00:00] Builder starting
[00:03:22] [24] [00:00:00] Builder starting
[00:03:22] [25] [00:00:00] Builder starting
[00:03:22] [26] [00:00:00] Builder starting
[00:03:22] [27] [00:00:00] Builder starting
[00:03:22] [28] [00:00:00] Builder starting
[00:03:22] [29] [00:00:00] Builder starting
[00:03:22] [30] [00:00:00] Builder starting
[00:03:22] [31] [00:00:00] Builder starting
[00:03:22] [32] [00:00:00] Builder starting
[00:03:30] [01] [00:00:09] Finished print/indexinfo | indexinfo-0.3.1: =
Success
[00:03:31] [01] [00:00:00] Building devel/gettext-runtime | =
gettext-runtime-0.22
. . .

Top was showing lots of "vlruwk" for the cpdup's. For example:

. . .
 362     0 root         40    0  27076Ki   13776Ki CPU19   19   4:23   =
0.00% cpdup -i0 -o ref 32
 349     0 root         53    0  27076Ki   13776Ki vlruwk  22   4:20   =
0.01% cpdup -i0 -o ref 31
 328     0 root         68    0  27076Ki   13804Ki vlruwk   8   4:30   =
0.01% cpdup -i0 -o ref 30
 304     0 root         37    0  27076Ki   13792Ki vlruwk   6   4:18   =
0.01% cpdup -i0 -o ref 29
 282     0 root         42    0  33220Ki   13956Ki vlruwk   8   4:33   =
0.01% cpdup -i0 -o ref 28
 242     0 root         56    0  27076Ki   13796Ki vlruwk   4   4:28   =
0.00% cpdup -i0 -o ref 27
. . .

But those processes did show CPU?? on occasion, as well as
*vnode less often. None of the cpdup's was stuck in

Removing your patches did not change the behavior.

So far I've not seen any similar reports to these
resuls that I got the ThreadRipper 1950X that I
have access to.

I normally use USE_TMPFS=3Dall but that hides the
problem and is why I've no clue when the behavior
would have started if I'd been using USE_TMPFS=3Dno
instead.

I never got so far as testing for the kinds of
reports I've seen about the deadlock issue.

No one has commented one what I reported or if
they have done any USE_TMPFS=3Dno style of testing.
(I also use ALLOW_MAKE_JOBS=3Dyes .)

The ZFS context is a simple single partition context.
I use ZFS in order to use bectl BE's, not other
reasons.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FFAE432-21FE-4462-9162-9CC30A5D470A>