Date: Tue, 14 Apr 2026 22:48:44 +0200 From: Kristof Provost <kp@freebsd.org> To: Zhenlei Huang <zlei@freebsd.org> Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: e9fc0c538264 - main - if_clone: Make ifnet_detach_sxlock opaque to consumers Message-ID: <582CBBF6-BBE9-49DB-8DF6-8EA56238E0F1@freebsd.org> In-Reply-To: <A847EE94-63B6-47B4-B14A-78BF095EAB1E@FreeBSD.org>
index | next in thread | previous in thread | raw e-mail
<html class="apple-mail-supports-explicit-dark-mode"><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr"><br></div><div dir="ltr"><br><blockquote type="cite">On 14 Apr 2026, at 19:46, Zhenlei Huang <zlei@freebsd.org> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 14, 2026, at 11:57 PM, Kristof Provost <<a href="mailto:kp@FreeBSD.org" class="">kp@FreeBSD.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""> <meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8" class=""> <div class=""><div style="font-family: sans-serif;" class=""><div class="markdown" style="white-space: normal;"><p dir="auto" class="">On 13 Apr 2026, at 6:39, Zhenlei Huang wrote:</p> <blockquote style="margin: 0 0 5px; padding-left: 5px; border-left: 2px solid #136BCE; color: #136BCE;" class=""><p dir="auto" class="">The branch main has been updated by zlei:</p><p dir="auto" class="">URL: <a href="https://cgit.freebsd.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290" class="">https://cgit.FreeBSD.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290</a></p><p dir="auto" class="">commit e9fc0c538264355bd3fd9120c650078281c2a290<br class=""> Author: Zhenlei Huang <a href="mailto:zlei@FreeBSD.org" class="">zlei@FreeBSD.org</a><br class=""> AuthorDate: 2026-04-13 04:38:44 +0000<br class=""> Commit: Zhenlei Huang <a href="mailto:zlei@FreeBSD.org" class="">zlei@FreeBSD.org</a><br class=""> CommitDate: 2026-04-13 04:38:44 +0000</p> <pre style="margin-left: 15px; margin-right: 15px; padding: 5px; border: thin solid gray; overflow-x: auto; max-width: 90vw; background-color: #E4E4E4;" class=""><code style="padding: 0 0.25em; background-color: #E4E4E4;" class="">if_clone: Make ifnet_detach_sxlock opaque to consumers The change e133271fc1b5e introduced ifnet_detach_sxlock, and change 6d2a10d96fb5 widened its coverage, but there are still consumers, net80211 and tuntap e.g., want it. Instead of sprinkling it everywhere, make it opaque to consumers. Out of tree drivers shall also benefit from this change. Reviewed by: kp MFC after: 2 weeks Differential Revision: <a href="https://reviews.freebsd.org/D56298" class="">https://reviews.freebsd.org/D56298</a> </code></pre> </blockquote><p dir="auto" class="">I don’t really understand why, but this commit causes bricoler test runs to fail for me.<br class=""> Or more precisely, it looks like parallel test runs see a number of failures in the pf tests.</p><p dir="auto" class="">Reverting only part of this change, specifically this:</p> <pre style="margin-left: 15px; margin-right: 15px; padding: 5px; border: thin solid gray; overflow-x: auto; max-width: 90vw; background-color: #E4E4E4;" class=""><code style="padding: 0 0.25em; background-color: #E4E4E4;" class="">diff --git a/sys/net/if_clone.c b/sys/net/if_clone.c index db3db78c1b76..d0fe54b6146b 100644 --- a/sys/net/if_clone.c +++ b/sys/net/if_clone.c @@ -683,11 +683,12 @@ if_clone_detach(struct if_clone *ifc) V_if_cloners_count--; IF_CLONERS_UNLOCK(); - sx_xlock(&ifnet_detach_sxlock); /* destroy all interfaces for this cloner */ - while (!LIST_EMPTY(&ifc->ifc_iflist)) + while (!LIST_EMPTY(&ifc->ifc_iflist)) { + sx_xlock(&ifnet_detach_sxlock); if_clone_destroyif_flags(ifc, LIST_FIRST(&ifc->ifc_iflist), IFC_F_FORCE); - sx_xunlock(&ifnet_detach_sxlock); + sx_xunlock(&ifnet_detach_sxlock); + } IF_CLONE_REMREF(ifc); } </code></pre><p dir="auto" class="">Seems to be enough to avoid the problem. So it looks like somehow deleting all interfaces for a given cloner under that lock (rather than releasing it and re-acquiring it) causes the test failures.</p><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div>That is interesting. I do not expect a re-acquiring will make any differences, given ifc_detach_cloner() and / or if_clone_detach() are basically be invoked by VNET_SYSUNINIT() and vnet_destroy() has acquired ifnet_detach_sxlock ( I planed to removed the acquiring of it in <span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">vnet_destroy</span>() but not yet done ). So I'd expect an assert `sx_assert(&ifnet_detach_sxlock, SA_XLOCKED)` in if_clone_detach() will mostly succeed.</div><div><br class=""></div><div>Some drivers, for example wg(4), iterate all VNETs and invoke ifc_detach_cloner() on module unload. That should be the only case that `<span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">ifnet_detach_sxlock` is not recursively acquired, and not acquired prior to</span></div><div><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">the commit</span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">.</span></div><div><blockquote type="cite" class=""><div class=""><div class=""><div style="font-family: sans-serif;" class=""><div class="markdown" style="white-space: normal;"><p dir="auto" class="">The failing tests I see are:</p> <pre style="margin-left: 15px; margin-right: 15px; padding: 5px; border: thin solid gray; overflow-x: auto; max-width: 90vw; background-color: #E4E4E4;" class=""><code style="padding: 0 0.25em; background-color: #E4E4E4;" class="">sys/netpfil/pf/divert-to:in_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [36.591s] sys/netpfil/pf/divert-to:in_div_in -> failed: atf-check failed; see the output of the test for details [30.867s] sys/netpfil/pf/divert-to:in_div_in_fwd_out_div_out -> failed: atf-check failed; see the output of the test for details [39.705s] sys/netpfil/pf/divert-to:in_dn_in_div_in_out_div_out_dn_out -> failed: atf-check failed; see the output of the test for details [45.059s] sys/netpfil/pf/divert-to:out_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [23.629s] sys/netpfil/pf/killstate:gateway -> failed: Killing with a different gateway removed the state. [32.803s] sys/netpfil/pf/killstate:id -> failed: Killing a different ID removed the state. [32.789s] sys/netpfil/pf/killstate:label -> failed: Killing a non-existing label removed the state. [47.749s] sys/netpfil/pf/killstate:multilabel -> failed: Setting new rules removed the state. [43.570s] sys/netpfil/pf/killstate:src_dst -> failed: Killing with the wrong destination IP removed our state. [57.819s] sys/netpfil/pf/killstate:v6 -> failed: Killing with the wrong IP removed our state. [54.957s] sys/netpfil/pf/nat64:pool -> failed: atf-check failed; see the output of the test for details [51.118s] sys/netpfil/pf/nat64:sctp_in -> failed: Failed to connect to SCTP server [55.495s] sys/netpfil/pf/nat64.py:TestNAT64::test_udp_checksum -> failed: /usr/tests/sys/netpfil/pf/nat64.py:205: AttributeError [80.001s] sys/netpfil/pf/pflog:matches -> failed: atf-check failed; see the output of the test for details [61.807s] sys/netpfil/pf/pflog:matches_logif -> failed: atf-check failed; see the output of the test for details [51.261s] sys/netpfil/pf/pflog:unspecified_v6 -> failed: atf-check failed; see the output of the test for details [36.281s] sys/netpfil/pf/rdr:srcport_pass -> failed: atf-check failed; see the output of the test for details [76.815s] sys/netpfil/pf/sctp:pfsync -> failed: Initial SCTP connection failed [134.866s] sys/netpfil/pf/sctp:related_icmp -> failed: SCTP connection failed [59.620s] sys/netpfil/pf/table:zero_one -> failed: atf-check failed; see the output of the test for details [64.125s] sys/netpfil/pf/tcp:rst -> failed: atf-check failed; see the output of the test for details [80.261s] </code></pre><p dir="auto" class="">Looking at it now, those runtimes seems suspiciously high as well.</p><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div>I noticed that. My local test shows most of them finish in seconds. Some require more time to finish but still within about 20 seconds.</div><div>I tested as the following,</div><div>```</div><div># kyua test -k Kyuafile pf/divert-to</div><div>```</div><div><br class=""></div><div>Is the failing tests always fail ? </div></div></blockquote><div><br></div>It’s relatively consistently the same tests, I believe.<div><br><blockquote type="cite"><div dir="ltr"><div>Does the kyua report show anything useful ?<br class=""></div></div></blockquote><br>Not obviously so, no, but I’ve also not done a lot of debugging. It’s an annoying problem, because re-running an individual test case just succeeds. <div><br></div><div>In any event, I’ll try to test the patch you sent me tomorrow. (Probably ~12 hours from now.)</div><div><br></div><div>Best regards?</div><div>Kristof Provost </div></div></body></html>home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?582CBBF6-BBE9-49DB-8DF6-8EA56238E0F1>
