Date: Wed, 15 Apr 2026 01:45:53 +0800 From: Zhenlei Huang <zlei@FreeBSD.org> To: Kristof Provost <kp@FreeBSD.org> Cc: "src-committers@freebsd.org" <src-committers@FreeBSD.org>, "dev-commits-src-all@freebsd.org" <dev-commits-src-all@FreeBSD.org>, "dev-commits-src-main@freebsd.org" <dev-commits-src-main@FreeBSD.org> Subject: Re: git: e9fc0c538264 - main - if_clone: Make ifnet_detach_sxlock opaque to consumers Message-ID: <A847EE94-63B6-47B4-B14A-78BF095EAB1E@FreeBSD.org> In-Reply-To: <AAA103BC-755C-4522-B57F-4DC455555BAA@FreeBSD.org> References: <69dc7394.203c3.3a41c692@gitrepo.freebsd.org> <AAA103BC-755C-4522-B57F-4DC455555BAA@FreeBSD.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] > On Apr 14, 2026, at 11:57 PM, Kristof Provost <kp@FreeBSD.org> wrote: > > On 13 Apr 2026, at 6:39, Zhenlei Huang wrote: > > The branch main has been updated by zlei: > > URL: https://cgit.FreeBSD.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290 <https://cgit.freebsd.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290> > commit e9fc0c538264355bd3fd9120c650078281c2a290 > Author: Zhenlei Huang zlei@FreeBSD.org <mailto:zlei@FreeBSD.org> > AuthorDate: 2026-04-13 04:38:44 +0000 > Commit: Zhenlei Huang zlei@FreeBSD.org <mailto:zlei@FreeBSD.org> > CommitDate: 2026-04-13 04:38:44 +0000 > > if_clone: Make ifnet_detach_sxlock opaque to consumers > > The change e133271fc1b5e introduced ifnet_detach_sxlock, and change > 6d2a10d96fb5 widened its coverage, but there are still consumers, > net80211 and tuntap e.g., want it. Instead of sprinkling it everywhere, > make it opaque to consumers. > > Out of tree drivers shall also benefit from this change. > > Reviewed by: kp > MFC after: 2 weeks > Differential Revision: https://reviews.freebsd.org/D56298 > I don’t really understand why, but this commit causes bricoler test runs to fail for me. > Or more precisely, it looks like parallel test runs see a number of failures in the pf tests. > > Reverting only part of this change, specifically this: > > diff --git a/sys/net/if_clone.c b/sys/net/if_clone.c > index db3db78c1b76..d0fe54b6146b 100644 > --- a/sys/net/if_clone.c > +++ b/sys/net/if_clone.c > @@ -683,11 +683,12 @@ if_clone_detach(struct if_clone *ifc) > V_if_cloners_count--; > IF_CLONERS_UNLOCK(); > > - sx_xlock(&ifnet_detach_sxlock); > /* destroy all interfaces for this cloner */ > - while (!LIST_EMPTY(&ifc->ifc_iflist)) > + while (!LIST_EMPTY(&ifc->ifc_iflist)) { > + sx_xlock(&ifnet_detach_sxlock); > if_clone_destroyif_flags(ifc, LIST_FIRST(&ifc->ifc_iflist), IFC_F_FORCE); > - sx_xunlock(&ifnet_detach_sxlock); > + sx_xunlock(&ifnet_detach_sxlock); > + } > > IF_CLONE_REMREF(ifc); > } > Seems to be enough to avoid the problem. So it looks like somehow deleting all interfaces for a given cloner under that lock (rather than releasing it and re-acquiring it) causes the test failures. > > That is interesting. I do not expect a re-acquiring will make any differences, given ifc_detach_cloner() and / or if_clone_detach() are basically be invoked by VNET_SYSUNINIT() and vnet_destroy() has acquired ifnet_detach_sxlock ( I planed to removed the acquiring of it in vnet_destroy() but not yet done ). So I'd expect an assert `sx_assert(&ifnet_detach_sxlock, SA_XLOCKED)` in if_clone_detach() will mostly succeed. Some drivers, for example wg(4), iterate all VNETs and invoke ifc_detach_cloner() on module unload. That should be the only case that `ifnet_detach_sxlock` is not recursively acquired, and not acquired prior to the commit. > The failing tests I see are: > > sys/netpfil/pf/divert-to:in_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [36.591s] > sys/netpfil/pf/divert-to:in_div_in -> failed: atf-check failed; see the output of the test for details [30.867s] > sys/netpfil/pf/divert-to:in_div_in_fwd_out_div_out -> failed: atf-check failed; see the output of the test for details [39.705s] > sys/netpfil/pf/divert-to:in_dn_in_div_in_out_div_out_dn_out -> failed: atf-check failed; see the output of the test for details [45.059s] > sys/netpfil/pf/divert-to:out_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [23.629s] > sys/netpfil/pf/killstate:gateway -> failed: Killing with a different gateway removed the state. [32.803s] > sys/netpfil/pf/killstate:id -> failed: Killing a different ID removed the state. [32.789s] > sys/netpfil/pf/killstate:label -> failed: Killing a non-existing label removed the state. [47.749s] > sys/netpfil/pf/killstate:multilabel -> failed: Setting new rules removed the state. [43.570s] > sys/netpfil/pf/killstate:src_dst -> failed: Killing with the wrong destination IP removed our state. [57.819s] > sys/netpfil/pf/killstate:v6 -> failed: Killing with the wrong IP removed our state. [54.957s] > sys/netpfil/pf/nat64:pool -> failed: atf-check failed; see the output of the test for details [51.118s] > sys/netpfil/pf/nat64:sctp_in -> failed: Failed to connect to SCTP server [55.495s] > sys/netpfil/pf/nat64.py:TestNAT64::test_udp_checksum -> failed: /usr/tests/sys/netpfil/pf/nat64.py:205: AttributeError [80.001s] > sys/netpfil/pf/pflog:matches -> failed: atf-check failed; see the output of the test for details [61.807s] > sys/netpfil/pf/pflog:matches_logif -> failed: atf-check failed; see the output of the test for details [51.261s] > sys/netpfil/pf/pflog:unspecified_v6 -> failed: atf-check failed; see the output of the test for details [36.281s] > sys/netpfil/pf/rdr:srcport_pass -> failed: atf-check failed; see the output of the test for details [76.815s] > sys/netpfil/pf/sctp:pfsync -> failed: Initial SCTP connection failed [134.866s] > sys/netpfil/pf/sctp:related_icmp -> failed: SCTP connection failed [59.620s] > sys/netpfil/pf/table:zero_one -> failed: atf-check failed; see the output of the test for details [64.125s] > sys/netpfil/pf/tcp:rst -> failed: atf-check failed; see the output of the test for details [80.261s] > Looking at it now, those runtimes seems suspiciously high as well. > > I noticed that. My local test shows most of them finish in seconds. Some require more time to finish but still within about 20 seconds. I tested as the following, ``` # kyua test -k Kyuafile pf/divert-to ``` Is the failing tests always fail ? Does the kyua report show anything useful ? > Best regards, > Kristof > Best regards, Zhenlei [-- Attachment #2 --] <html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 14, 2026, at 11:57 PM, Kristof Provost <<a href="mailto:kp@FreeBSD.org" class="">kp@FreeBSD.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""> <meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8" class=""> <div class=""><div style="font-family: sans-serif;" class=""><div class="markdown" style="white-space: normal;"><p dir="auto" class="">On 13 Apr 2026, at 6:39, Zhenlei Huang wrote:</p> <blockquote style="margin: 0 0 5px; padding-left: 5px; border-left: 2px solid #136BCE; color: #136BCE;" class=""><p dir="auto" class="">The branch main has been updated by zlei:</p><p dir="auto" class="">URL: <a href="https://cgit.freebsd.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290" class="">https://cgit.FreeBSD.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290</a></p><p dir="auto" class="">commit e9fc0c538264355bd3fd9120c650078281c2a290<br class=""> Author: Zhenlei Huang <a href="mailto:zlei@FreeBSD.org" class="">zlei@FreeBSD.org</a><br class=""> AuthorDate: 2026-04-13 04:38:44 +0000<br class=""> Commit: Zhenlei Huang <a href="mailto:zlei@FreeBSD.org" class="">zlei@FreeBSD.org</a><br class=""> CommitDate: 2026-04-13 04:38:44 +0000</p> <pre style="margin-left: 15px; margin-right: 15px; padding: 5px; border: thin solid gray; overflow-x: auto; max-width: 90vw; background-color: #E4E4E4;" class=""><code style="padding: 0 0.25em; background-color: #E4E4E4;" class="">if_clone: Make ifnet_detach_sxlock opaque to consumers The change e133271fc1b5e introduced ifnet_detach_sxlock, and change 6d2a10d96fb5 widened its coverage, but there are still consumers, net80211 and tuntap e.g., want it. Instead of sprinkling it everywhere, make it opaque to consumers. Out of tree drivers shall also benefit from this change. Reviewed by: kp MFC after: 2 weeks Differential Revision: <a href="https://reviews.freebsd.org/D56298" class="">https://reviews.freebsd.org/D56298</a> </code></pre> </blockquote><p dir="auto" class="">I don’t really understand why, but this commit causes bricoler test runs to fail for me.<br class=""> Or more precisely, it looks like parallel test runs see a number of failures in the pf tests.</p><p dir="auto" class="">Reverting only part of this change, specifically this:</p> <pre style="margin-left: 15px; margin-right: 15px; padding: 5px; border: thin solid gray; overflow-x: auto; max-width: 90vw; background-color: #E4E4E4;" class=""><code style="padding: 0 0.25em; background-color: #E4E4E4;" class="">diff --git a/sys/net/if_clone.c b/sys/net/if_clone.c index db3db78c1b76..d0fe54b6146b 100644 --- a/sys/net/if_clone.c +++ b/sys/net/if_clone.c @@ -683,11 +683,12 @@ if_clone_detach(struct if_clone *ifc) V_if_cloners_count--; IF_CLONERS_UNLOCK(); - sx_xlock(&ifnet_detach_sxlock); /* destroy all interfaces for this cloner */ - while (!LIST_EMPTY(&ifc->ifc_iflist)) + while (!LIST_EMPTY(&ifc->ifc_iflist)) { + sx_xlock(&ifnet_detach_sxlock); if_clone_destroyif_flags(ifc, LIST_FIRST(&ifc->ifc_iflist), IFC_F_FORCE); - sx_xunlock(&ifnet_detach_sxlock); + sx_xunlock(&ifnet_detach_sxlock); + } IF_CLONE_REMREF(ifc); } </code></pre><p dir="auto" class="">Seems to be enough to avoid the problem. So it looks like somehow deleting all interfaces for a given cloner under that lock (rather than releasing it and re-acquiring it) causes the test failures.</p><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div>That is interesting. I do not expect a re-acquiring will make any differences, given ifc_detach_cloner() and / or if_clone_detach() are basically be invoked by VNET_SYSUNINIT() and vnet_destroy() has acquired ifnet_detach_sxlock ( I planed to removed the acquiring of it in <span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">vnet_destroy</span>() but not yet done ). So I'd expect an assert `sx_assert(&ifnet_detach_sxlock, SA_XLOCKED)` in if_clone_detach() will mostly succeed.</div><div><br class=""></div><div>Some drivers, for example wg(4), iterate all VNETs and invoke ifc_detach_cloner() on module unload. That should be the only case that `<span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">ifnet_detach_sxlock` is not recursively acquired, and not acquired prior to</span></div><div><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">the commit</span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">.</span></div><div><blockquote type="cite" class=""><div class=""><div class=""><div style="font-family: sans-serif;" class=""><div class="markdown" style="white-space: normal;"><p dir="auto" class="">The failing tests I see are:</p> <pre style="margin-left: 15px; margin-right: 15px; padding: 5px; border: thin solid gray; overflow-x: auto; max-width: 90vw; background-color: #E4E4E4;" class=""><code style="padding: 0 0.25em; background-color: #E4E4E4;" class="">sys/netpfil/pf/divert-to:in_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [36.591s] sys/netpfil/pf/divert-to:in_div_in -> failed: atf-check failed; see the output of the test for details [30.867s] sys/netpfil/pf/divert-to:in_div_in_fwd_out_div_out -> failed: atf-check failed; see the output of the test for details [39.705s] sys/netpfil/pf/divert-to:in_dn_in_div_in_out_div_out_dn_out -> failed: atf-check failed; see the output of the test for details [45.059s] sys/netpfil/pf/divert-to:out_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [23.629s] sys/netpfil/pf/killstate:gateway -> failed: Killing with a different gateway removed the state. [32.803s] sys/netpfil/pf/killstate:id -> failed: Killing a different ID removed the state. [32.789s] sys/netpfil/pf/killstate:label -> failed: Killing a non-existing label removed the state. [47.749s] sys/netpfil/pf/killstate:multilabel -> failed: Setting new rules removed the state. [43.570s] sys/netpfil/pf/killstate:src_dst -> failed: Killing with the wrong destination IP removed our state. [57.819s] sys/netpfil/pf/killstate:v6 -> failed: Killing with the wrong IP removed our state. [54.957s] sys/netpfil/pf/nat64:pool -> failed: atf-check failed; see the output of the test for details [51.118s] sys/netpfil/pf/nat64:sctp_in -> failed: Failed to connect to SCTP server [55.495s] sys/netpfil/pf/nat64.py:TestNAT64::test_udp_checksum -> failed: /usr/tests/sys/netpfil/pf/nat64.py:205: AttributeError [80.001s] sys/netpfil/pf/pflog:matches -> failed: atf-check failed; see the output of the test for details [61.807s] sys/netpfil/pf/pflog:matches_logif -> failed: atf-check failed; see the output of the test for details [51.261s] sys/netpfil/pf/pflog:unspecified_v6 -> failed: atf-check failed; see the output of the test for details [36.281s] sys/netpfil/pf/rdr:srcport_pass -> failed: atf-check failed; see the output of the test for details [76.815s] sys/netpfil/pf/sctp:pfsync -> failed: Initial SCTP connection failed [134.866s] sys/netpfil/pf/sctp:related_icmp -> failed: SCTP connection failed [59.620s] sys/netpfil/pf/table:zero_one -> failed: atf-check failed; see the output of the test for details [64.125s] sys/netpfil/pf/tcp:rst -> failed: atf-check failed; see the output of the test for details [80.261s] </code></pre><p dir="auto" class="">Looking at it now, those runtimes seems suspiciously high as well.</p><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div>I noticed that. My local test shows most of them finish in seconds. Some require more time to finish but still within about 20 seconds.</div><div>I tested as the following,</div><div>```</div><div># kyua test -k Kyuafile pf/divert-to</div><div>```</div><div><br class=""></div><div>Is the failing tests always fail ? Does the kyua report show anything useful ?<br class=""><blockquote type="cite" class=""><div class=""><div class=""><div style="font-family: sans-serif;" class=""><div class="markdown" style="white-space: normal;"><p dir="auto" class="">Best regards,<br class=""> Kristof</p> </div> </div> </div> </div></blockquote></div><br class=""><div class=""> <div>Best regards,</div><div>Zhenlei</div> </div> <br class=""></body></html>home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A847EE94-63B6-47B4-B14A-78BF095EAB1E>
