From nobody Tue Sep  5 05:06:45 2023
X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Rftlq1CZgz4sHd6
	for <freebsd-current@mlmmj.nyi.freebsd.org>; Tue,  5 Sep 2023 05:07:03 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Received: from sonic305-19.consmr.mail.gq1.yahoo.com (sonic305-19.consmr.mail.gq1.yahoo.com [98.137.64.82])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4Rftln6Btfz3DjZ
	for <freebsd-current@freebsd.org>; Tue,  5 Sep 2023 05:07:01 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=yahoo.com header.s=s2048 header.b=lhqPwjKA;
	spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.82 as permitted sender) smtp.mailfrom=marklmi@yahoo.com;
	dmarc=pass (policy=reject) header.from=yahoo.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693890419; bh=7nhxTNUkT8AMv1Gs52gY8VCOLEpf0ixNh0hQmH5e8NU=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=lhqPwjKAvqRQA19ApqGQGJk6MF/nCpFcYvhqSj8sxFBAGyDtktK/WnrRBDMpQy3nCCfskJb12TwdN1m1NhHg0Dh6yNWexAeihoNrYmiyhdfzVFQwQH2kPHPYXNMTErQAd2pzNknyipO3Qq5ARGdn9hg3oFxRb9ht2F+F/VTsgLWdhpskZwzLb8OS9FEBzXg08unjG6NBrEWktAkqL9kdyqJdmcgqzDxLe3PLseFt+DJIfkY/2+SyKQHRYtphID53zZjWr+zsdYTjb5gJePXT3mC0hgKQpRv43ZTjxdX4DGix++8QMnAvjWUnP/rXb1c+4rrPoeWeFnYjUU0bYeY1Aw==
X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1693890419; bh=wg3wSjmioxjNzHbDqOVKVyj9Dl9QqzwMvlfTVufpeTp=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=hyWVuNPuzg91Y/ymAU3Es4djsNWhRf2gBlvZdd/vyDMB9GK7ogva/fCFs47Z9620XyxOBjA6vzqDJbmK2KPq5lFDJq7waRbQj6pwQyQ6oMwvbJ2Ge6U1K3k0Ji1NIUaLtBANBhrn/w7pPPt4P2dRTfejdh+jzxUlrN5S88yyEzQ9mFRZdH0pQzO3giCLdv/GtgHjhIUJML0qXdQbwpP68mmLPStx0qYfUm++HSyIUVwMfhS8shmv+5g+OPn8g2fenynRbiqeYtmj6rFsKEg1D3icvCw1hFirmD+u5wQNYxOdIa/uUvpihcufM77guE4ES76r6kf2tRyAzRMBtvAGWQ==
X-YMail-OSG: D2CdCDcVM1nQOcXY5xjWdwCgM8r9WyU7avoauGp28wVYbswfKOYtk18ILsIstW1
 cZp4Ow1EMFj113gWei_4OtuSTRz9Yc46v8BMhO6AICQp4zHxqSUCuaXLHWEyMZvgoT2Grn6B3vGQ
 iXgrJl8sokjwfv1zzAgVRGuBgq1tSPFvWP6Fd6SwvV3YziGuRBMfywUgC_nN2SEC3hcNXxg1pwKX
 VfFgJzFuc3zC_OnRDD.3DB6SdUIsAdBCRHtyY3cef2NyvxenswzF6.7byX4LHRU6LHFoNxJzT1_L
 jjq7PAdoxC8_f9Dtw2XqE8VDPn_5t7Adx5q15_BBLcgsZVL6PDvflAyDErqCDzS4g5cgZsHQRMHi
 t7GgGi71wqlRHecalsgcOpPutsjSx6re9MGUc50lQ5VaeJNLiYxEa8oq.hq7LwJVeen1ztsEhwYL
 VgLHcXvUFCvweHTWDTqY8pNGKoYxzZpBaT6VcVxW1AuAG6caN.dPb_Jtk8vSTtaAG9aApYn9Ag0t
 Y9kICJmQ1aoHsTxPUtyKBb7rRWIrC_DfF4Koor.98CJeAWTD9pKFiZrkOPlsDxDe9a_Ir1AShCjT
 jb92eh8FhRSfdPcnCd8qDX_9dt1yEHJptj5rvxv6LUFsFn_ve2LEkY2E4sP95bK2qGJo7e7jG1MG
 ZrJSMCCPxWTepPTDsh4dR0qX5_7iCpgC7k8l.NAdrovi.mQa90r1HFUa5gdtmx6YfQws30rvaVYk
 l2bdavcj7_fapIssV..VZWsVNZKei_Dcx038PbyGs.cvvS_V_zThVzEx_E1AISyWw_JrK0lgt37O
 6FxUPptJBWy7SWjB7MKM5rydgwqvoGNBD8.hGwMGdofxEYTDjxRqQcyYR5ByuR2x3AFMB0wAYX5t
 HKiChEksgWlaU89t3oJ7jXq38zoJyrBO56pXqzSkuJ2bhUNX9nV5iKae9JWRA0s5n9kVDwN4Tc38
 e_qxw0oqkGAmOZA7cBPlcfeUYf4EtRsTDvmvLPUIs5a4eSmyQ0E76HXv6RN7PjhK3tiOPnR_n0Qz
 cxor964WIb9FCsgTCFmm3mtVbUA78_95V4ypHm6E08XAoqo2k3GGi6BKSxrPHQSEvEhKUi4MbJE6
 eu4qhlYszlv_thlZwSt2.wZBm9JBTaXY9K9SNS5aUinVomaDY4EDpDaxLdyfAChOzP8NXP43RQKu
 09VqcZ8HNYumeUqgZhG8TV4l9tDOZ_R0ldhbxfPrGpF0IHR8N7K22gQrTak3zSnzV_VjTTanexJg
 r7W4eFhO8uV8jgCaNwqQau57_XMp1RnAYMkmD6DMVmnSBroh.NzjPVuQxIrD88g8rjIrD8SeOcno
 JqLkCYguY7kTSpGWVo.QQJO0Cu4IrYDQVwxGRGjw7CWSfzXICeDWGunHWXHqnje_Oa8rnq957kz5
 A1vNLnMZxLjNkzNrSV6qs9Z.IjkPCWY.0RcD9uqT8MZMJyqdarvJ6akqhgj4unibDYGy0nVHbZs7
 ttwuNPbFRFj4_FxIJyCXcQNd5RBZAOH0u8aIy_4e_um4r1v7c5y00V_GdvkmSODx8r3w1joBACcJ
 gsufW3q0WbPxPQv64NRgqJ9J3gSGuJbHdgRc_fq2X93sgGgbSn9kILMrBm_MuhD1gYmI.nR1DhuG
 mZMpCzq0AsrFrqGhQieVOwnbiFR7Notc2lMQTRUTBmYAVv2_OFA48DZpaIcuHBi6x3ScqhBNPdg_
 eGublet99JzhOuQxX5yUoU2YvNDQBqfDBI0kgEMK5ocfnDPaCKQQz6jjHwu5MDAHIGtnW_VF96hm
 el8wNaqHgxtW8CoyQIVH6FQ8d.8vG3vZpTI7RHSFhCh3f809LgZ_aPK6tp4k7wWEe1RyWDWGnw2E
 _QxndF4xLACL0zvGOW2OYSClEgIWfBsBvtzktF3ZmEpCDddH1.0fVaK.SNxQ7qcCpjjvnsMwRa2J
 AEulDeks0gOaVa282hd0sxI239JdXcGyoL1V.ycEg9ZxyvLLWvwqABf2R4L0Yblq179bTlq7AElq
 x_vmN40y3hQSSc6w6XvsabZqqAPf6oS7xbH8rBdSeXTS6PQ9t8Fj28G4XgRKe.n_cOdSgmRTmbh0
 Eer.OObh3sNGz.xf8K1mYyBXJm8qCzpdaLqLFntQLNaQ3RcR.a5Hi8fE7LV06zgH7FL6KgBUlLs9
 YkXvqgDZIy2pacoUh4YT.D2Ou8qUpF23RytKRrKffQk2V_kxDGeFXo_d2613k.n0edZzK61itQBx
 kziEppEea8im08SEWntBSe0B0__YAVp598np2WYcXspmF61G6avSrB0YiUpngBzbSKfBpbVs17Ha
 H7.A-
X-Sonic-MF: <marklmi@yahoo.com>
X-Sonic-ID: 225cd4e5-904d-4d53-9d73-38483c380d42
Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.gq1.yahoo.com with HTTP; Tue, 5 Sep 2023 05:06:59 +0000
Received: by hermes--production-bf1-865889d799-7vf9r (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID ccb0e102c847d0bfa14ec0471e6c74a4;
          Tue, 05 Sep 2023 05:06:54 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
List-Id: Discussions about the use of FreeBSD-current <freebsd-current.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-current
List-Help: <mailto:freebsd-current+help@freebsd.org>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Subscribe: <mailto:freebsd-current+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-current+unsubscribe@freebsd.org>
Sender: owner-freebsd-current@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\))
Subject: Re: An attempted test of main's "git: 2ad756a6bbb3" "merge
 openzfs/zfs@95f71c019" that did not go as planned
From: Mark Millard <marklmi@yahoo.com>
In-Reply-To: <5724006A-F4C4-42CA-98A9-90C5EB914F5E@yahoo.com>
Date: Mon, 4 Sep 2023 22:06:45 -0700
Cc: dev-commits-src-main@freebsd.org,
 Current FreeBSD <freebsd-current@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <4E888EC8-4B06-4139-90DF-59AFBA700A5F@yahoo.com>
References: <F20BD354-2E10-4CD9-88FB-99DB1E405A52.ref@yahoo.com>
 <F20BD354-2E10-4CD9-88FB-99DB1E405A52@yahoo.com>
 <c9c5096b-33d9-1cb6-d446-97b99f324fce@FreeBSD.org>
 <673A446E-6F94-451E-910F-079F678C5289@yahoo.com>
 <2BDD30B5-6248-4EC3-83C8-0499E0717D1D@yahoo.com>
 <C78808EB-FDE2-432D-8309-A7017DFC1BCE@yahoo.com>
 <69028684-c2cd-8f0c-617f-d0763c08dbe4@FreeBSD.org>
 <8D832756-6754-4D1E-AE8D-E716FE08F747@yahoo.com>
 <5605506b-5059-fb72-3e5a-741863d54444@FreeBSD.org>
 <5724006A-F4C4-42CA-98A9-90C5EB914F5E@yahoo.com>
To: Alexander Motin <mav@FreeBSD.org>
X-Mailer: Apple Mail (2.3731.700.6)
X-Spamd-Bar: ---
X-Spamd-Result: default: False [-3.44 / 15.00];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-0.94)[-0.941];
	MV_CASE(0.50)[];
	DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject];
	R_SPF_ALLOW(-0.20)[+ptr:yahoo.com];
	R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048];
	MIME_GOOD(-0.10)[text/plain];
	RCPT_COUNT_THREE(0.00)[3];
	FROM_HAS_DN(0.00)[];
	RCVD_VIA_SMTP_AUTH(0.00)[];
	RCVD_IN_DNSWL_NONE(0.00)[98.137.64.82:from];
	TO_MATCH_ENVRCPT_SOME(0.00)[];
	MLMMJ_DEST(0.00)[freebsd-current@freebsd.org];
	ARC_NA(0.00)[];
	MID_RHS_MATCH_FROM(0.00)[];
	RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.82:from];
	DKIM_TRACE(0.00)[yahoo.com:+];
	FREEMAIL_FROM(0.00)[yahoo.com];
	TO_DN_SOME(0.00)[];
	DWL_DNSWL_NONE(0.00)[yahoo.com:dkim];
	FROM_EQ_ENVFROM(0.00)[];
	MIME_TRACE(0.00)[0:+];
	RCVD_TLS_LAST(0.00)[];
	ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US];
	FREEMAIL_ENVFROM(0.00)[yahoo.com];
	RCVD_COUNT_TWO(0.00)[2]
X-Rspamd-Queue-Id: 4Rftln6Btfz3DjZ

On Sep 4, 2023, at 18:39, Mark Millard <marklmi@yahoo.com> wrote:

> On Sep 4, 2023, at 10:05, Alexander Motin <mav@FreeBSD.org> wrote:
>=20
>> On 04.09.2023 11:45, Mark Millard wrote:
>>> On Sep 4, 2023, at 06:09, Alexander Motin <mav@FreeBSD.org> wrote:
>>>> per_txg_dirty_frees_percent is directly related to the delete =
delays we see here.  You are forcing ZFS to commit transactions each 5% =
of dirty ARC limit, which is 5% of 10% or memory size.  I haven't looked =
on that code recently, but I guess setting it too low can make ZFS =
commit transactions too often, increasing write inflation for the =
underlying storage.  I would propose you to restore the default and try =
again.
>>> While this machine is different, the original problem was worse than
>>> the issue here: the load average was less than 1 for the most part
>>> the parallel bulk build when 30 was used. The fraction of time =
waiting
>>> was much longer than with 5. If I understand right, both too high =
and
>>> too low for a type of context can lead to increased elapsed time and
>>> getting it set to a near optimal is a non-obvious exploration.
>>=20
>> IIRC this limit was modified several times since originally =
implemented.  May be it could benefit from another look, if default 30% =
is not good.  It would be good if generic ZFS issues like this were =
reported to OpenZFS upstream to be visible to a wider public.  =
Unfortunately I have several other project I must work on, so if it is =
not a regression I can't promise I'll take it right now, so anybody else =
is welcome.
>=20
> As I understand, there are contexts were 5 is inappropriate
> and 30 works fairly well: no good single answer as to what
> value range will avoid problems.
>=20
>>> An overall point for the goal of my activity is: what makes a
>>> good test context for checking if ZFS is again safe to use?
>>> May be other tradeoffs make, say, 4 hardware threads more
>>> reasonable than 32.
>>=20
>> Thank you for your testing.  The best test is one that nobody else =
run. It also correlates with the topic of "safe to use", which also =
depends on what it is used for. :)
>=20
> Looks like use of a M.2 Samsung SSD 960 PRO 1TB with a
> non-debug FreeBSD build is suitable for the bulk -a -J128
> test (no ALLOW_MAKE_JOBS variants enabled, USE_TMPFS=3Dno in
> use) on the 32 hardware thread system. (The swap partition
> in use is from the normal environment's PCIe Optane media.)
> The %idle and the load averages and %user stayed reasonable
> in a preliminary test. One thing it does introduce is trim
> management (both available and potentially useful). (Optane
> media does not support or need it.) No
> per_txg_dirty_frees_percent adjustment involved (still 5).
>=20
> I've learned to not use ^T for fear of /bin/sh aborting
> and messing up poudriere's context. So I now monitor with:
>=20
> # poudriere status -b
>=20
> in a separate ssh session.
>=20
> I'll note that I doubt I'd try for a complete bulk -a .
> I'd probably stop it if I notice that the number of
> active builders drops off for a notable time (normal
> waiting for prerequisites appearing to be why).
>=20
>=20

So much for that idea. It has reached a state of staying
under 3500 w/s and up to 4.5ms/w (normally above 2ms/w).
%busy wondering in the range 85% to 101%. Lots of top
showing tx->tx. There is some read and other activity as
well. Of course the kBps figures are larger than the
earlier USB3 context (larger kB figures).

It reached about 1350 port->package builds over the first
hour after the 2nd "Buildee started".

autotrim is off. Doing a "zpool trim -w zamd64" leads to
. . . larger w/s figures during the process. So
more exploring to do at some point. Possibly:

autotrim
per_txg_dirty_frees_percent

For now, I'm just running "zpool trim -w zamd64" once
and a while so the process continues better.

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

. . . At around the end of 2nd hour: 2920 or so built.=20

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

. . . I've turned on autotrim without stopping the bulk
build.

. . . At around the end of 3rd hour: 4080 or so built.=20

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

Looks like the % idle has been high for a significant
time. I think I'll stop this specific test and clean
things out.

Looks like lang/guile* are examples of not respecting
the lack of ALLOW_MAKE_JOBS use.

Hmm. The ^C ended up with:

^C[03:41:07] Error: Signal SIGINT caught, cleaning up and exiting
[main-amd64-bulk_a-default] [2023-09-04_17h49m28s] [sigint:] Queued: =
34588 Built: 4331  Failed: 12    Skipped: 303   Ignored: 335   Fetched: =
0     Tobuild: 29607  Time: 03:41:05
[03:41:07] Logs: =
/usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-04_1=
7h49m28s
[03:41:19] [95] [00:06:59] Finished databases/php80-pdo_sqlite | =
php80-pdo_sqlite-8.0.30: Failed: build-depends
[03:41:20] [60] [00:10:53] Finished www/php82-session | =
php82-session-8.2.9: Success
[03:41:20] [19] [00:06:42] Finished databases/pecl-memcache@php80 | =
php80-pecl-memcache-8.2: Failed: build-depends
[03:41:23] [62] [00:06:40] Finished =
net-mgmt/icingaweb2-module-incubator@php81 | =
icingaweb2-module-incubator-php81-0.20.0: Success
[03:41:24] Cleaning up
exit: cannot open ./var/run/116_nohang.pid: No such file or directory
exit: cannot open ./var/run/93_nohang.pid: No such file or directory

and JID 253's processes are stuck in the STOP state, along with the
related JID 0 processes. These are for x11-toolkits/libXaw . I do
not know how long it might have been stuck.

/usr/local/libexec/poudriere/sh -e /usr/local/share/poudriere/bulk.sh =
-J128 -jmain-amd64-bulk_a -a

is stuck in killpg state:

 3896 101784 sh                  -                   mi_switch+0xbb =
_sx_xlock_hard+0x3e1 killpg1+0x193 kern_kill+0x23c amd64_syscall+0x109 =
fast_syscall_common+0xf8=20

Most recent process birth time listed-first order for the STOP's:

86826 634820 sh                  -                   mi_switch+0xbb =
thread_suspend_switch+0xc3 cursig+0x7b9 ast_sig+0x1df ast_handler+0x88 =
ast+0x20 doreti_ast+0x1c=20
86825 321240 sh                  -                   mi_switch+0xbb =
thread_suspend_check+0x260 sig_intr+0x4c fork1+0x186 sys_fork+0x54 =
amd64_syscall+0x109 fast_syscall_common+0xf8=20
86550 308525 sh                  -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
pipe_read+0x420 dofileread+0x86 sys_read+0xbe amd64_syscall+0x109 =
fast_syscall_common+0xf8=20
 3998 637224 gmake               -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 =
fast_syscall_common+0xf8=20
 3976 298566 sh                  -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 =
fast_syscall_common+0xf8=20
 3974 309336 gmake               -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 =
fast_syscall_common+0xf8=20
 3962 577230 gmake               -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 =
fast_syscall_common+0xf8=20
 3958 635598 sh                  -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+
0xf8=20
 3915 668148 make                -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+
0xf8=20
 3912 638108 sh                  -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 =
kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+
0xf
46142 212631 sh                  -                   mi_switch+0xbb =
sleepq_catch_signals+0x2ab sleepq_timedwait_sig+0x12 =
_cv_timedwait_sig_sbt+0x10c seltdwait+0x75 kern_select+0x95e =
sys_select+0x57 amd64_syscall+0x109 fast_syscall_common+0xf8=20

=3D=3D=3D
Mark Millard
marklmi at yahoo.com