Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Aug 2023 13:41:56 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Current FreeBSD <freebsd-current@freebsd.org>
Subject:   Re: ZFS deadlock in 14
Message-ID:  <C5747BF8-724E-43B7-88D0-A9F70485E7E1@yahoo.com>
In-Reply-To: <3AA253E3-C4F0-4AA3-9C37-D77E7527A458@yahoo.com>
References:  <59FCB309-4A55-4924-98C4-7ACCA70FD299@yahoo.com> <0F2C42B4-36FF-443A-A174-5B0CC57C4FC7@yahoo.com> <3AA253E3-C4F0-4AA3-9C37-D77E7527A458@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I forgot to adjust USE_TMPFS for the purpose of the test.
So I'll later be starting over.]

On Aug 19, 2023, at 12:18, Mark Millard <marklmi@yahoo.com> wrote:

> On Aug 19, 2023, at 11:40, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> We will see how long the following high load average bulk -a
>> configuration survives a build attempt, using a non-debug kernel
>> for this test.
>>=20
>> I've applied:
>>=20
>> # fetch -o- https://github.com/openzfs/zfs/pull/15107.patch | git -C =
/usr/main-src/ am --dir=3Dsys/contrib/openzfs
>> -                                                       13 kB  900 =
kBps    00s
>> Applying: Remove fastwrite mechanism.
>>=20
>> # fetch -o- https://github.com/openzfs/zfs/pull/15122.patch | git -C =
/usr/main-src/ am --dir=3Dsys/contrib/openzfs
>> -                                                       45 kB 1488 =
kBps    00s
>> Applying: ZIL: Second attempt to reduce scope of zl_issuer_lock.
>>=20
>> on a ThreadRipper 1950X (32 hardware threads) that is at
>> main 6b405053c997:
>>=20
>> Thu, 10 Aug 2023
>> . . .
>>  =E2=80=A2 git: cd25b0f740f8 - main - zfs: cherry-pick fix from =
openzfs Martin Matuska=20
>>  =E2=80=A2 git: 28d2e3b5dedf - main - zfs: cherry-pick fix from =
openzfs Martin Matuska
>> . . .
>>  =E2=80=A2 git: 6b405053c997 - main - OpenSSL: clean up botched =
merges in OpenSSL 3.0.9 import Jung-uk Kim
>>=20
>> So it is based on starting with the 2 cherry-pick's as
>> well.
>>=20
>> The ThreadRipper 1950X boots from a bectl BE and
>> that zfs media is all that is in use here.
>>=20
>> I've setting up to test starting a bulk -a using
>> ALLOW_MAKE_JOBS=3Dyes along with allowing 32 builders.
>> This so 32*32 or so potentially for load average(s)
>> at times. There is 128 GiBytes of RAM and:
>>=20
>> # swapinfo
>> Device          1K-blocks     Used    Avail Capacity
>> /dev/gpt/OptBswp480 503316480        0 503316480     0%
>>=20
>> I'm not so sure that such a high load average bulk -a
>> is reasonable for a debug kernel build: unsure of
>> resource usage for such and if everything could be
>> tracked as needed. So I'm testing a non-debug build
>> for now.
>>=20
>> I have built the kernels (nodbg and dbg), installed
>> the nodbg kernel, rebooted, and started:
>>=20
>> # poudriere bulk -jmain-amd64-bulk_a -a
>> . . .
>> [00:01:22] Building 34042 packages using up to 32 builders
>> . . .
>>=20
>> The ports tree is from back in mid-July.
>>=20
>> I have a patched up top that records and reports
>> various MaxObs???? figures (Maximum Observed). It
>> was recetnly reporting:
>>=20
>> . . .;  load averages: 119.56, 106.79,  71.54 MaxObs: 184.08, 112.10, =
 71.54
>> 1459 threads:  . . ., 273 MaxObsRunning
>> . . .
>> Mem: . . ., 61066Mi MaxObsActive, 10277Mi MaxObsWired, 71371Mi =
MaxObs(Act+Wir+Lndry)
>> . . .
>> Swap: . . ., 61094Mi MaxObs(Act+Lndry+SwapUsed), 71371Mi =
MaxObs(Act+Wir+Lndry+SwapUsed)
>=20
> Status report at about 1 hr in:
>=20
> [main-amd64-bulk_a-default] [2023-08-19_11h04m26s] [parallel_build:] =
Queued: 34435 Built: 1929  Failed: 9     Skipped: 2569  Ignored: 358   =
Fetched: 0     Tobuild: 29570  Time: 00:59:59
>=20
> Not hung up yet.
>=20
> =46rom about 10 minutes after that:
>=20
> . . . load averages: 205.56, 181.58, 153.68 MaxObs: 213.78, 182.26, =
153.68
> 1704 threads:  . . ., 311 MaxObsRunning
> . . .
> Mem: . . ., 100250Mi MaxObsActive, 16857Mi MaxObsWired, 124879Mi =
MaxObs(Act+Wir+Lndry)
> . . .
> Swap: . . . 5994Mi MaxObsUsed, 116589Mi MaxObs(Act+Lndry+SwapUsed), =
127354Mi MaxObs(Act+Wir+Lndry+SwapUsed)

Just relized that I'd forgotten to reconfigure the
USE_TMPFS=3Dall to be USE_TMPFS=3Dno so what I've done
so far is not a great test.

I'll still probably let it reach 3hr and get the
summary information before I stop it, adjust
USE_TMPFS, and start over from scratch.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C5747BF8-724E-43B7-88D0-A9F70485E7E1>