Date: Sat, 28 Jun 2025 23:35:17 +0800 From: Zhenlei Huang <zlei@FreeBSD.org> To: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> Cc: FreeBSD Current <current@freebsd.org>, Olivier Certner <olce@freebsd.org>, John Baldwin <jhb@FreeBSD.org> Subject: Re: regression: memory issues on main/arm64 over sched/runq changes Message-ID: <907D042E-AE8A-4818-A807-AD45F36354FD@FreeBSD.org> In-Reply-To: <23n1773o-10o2-5p5o-25s4-r623rnn44649@yvfgf.mnoonqbm.arg> References: <43005447-2rq0-6nn2-pnr5-4939s112npr4@yvfgf.mnoonqbm.arg> <0A01B9F5-C49C-41D8-BAB7-4378DEDBF647@FreeBSD.org> <28o26o81-so5r-qq79-6q6n-0q6746o7oo79@yvfgf.mnoonqbm.arg> <6A003013-415A-4594-AB04-AF5A9B2D660D@FreeBSD.org> <23n1773o-10o2-5p5o-25s4-r623rnn44649@yvfgf.mnoonqbm.arg>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
> On Jun 28, 2025, at 4:12 AM, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:
>
> On Sat, 28 Jun 2025, Zhenlei Huang wrote:
>
>>
>>
>>> On Jun 27, 2025, at 11:02 PM, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:
>>>
>>> On Wed, 25 Jun 2025, Zhenlei Huang wrote:
>>>
>>> Hi,
>>>
>>> I appplied olce's change from the review but it didn't make a difference
>>> on my arm64 and now on a tree with local changes (wifi bits, user sapce
>>> bits, etc).
>>>
>>> Now I netbooted that tree on X86 hardware (an old Lenovo Laptop) and ran
>>> into something else (the same tree boots in a bhyve instance on a
>>> different machine from a local disk image).
>>>
>>> At the end of if_addgroup() I had added the following for local
>>> debugging (really crude sorry):
>>>
>>> ...
>>>
>>> + atomic_thread_fence_seq_cst();
>>> IF_ADDR_WLOCK(ifp);
>>> CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next);
>>> CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next);
>>> IF_ADDR_WUNLOCK(ifp);
>>>
>>> IFNET_WUNLOCK(); // excl unlock
>>>
>>> if (new)
>>> EVENTHANDLER_INVOKE(group_attach_event, ifg);
>>> EVENTHANDLER_INVOKE(group_change_event, groupname);
>>>
>>> + IFNET_RLOCK(); // shared, panic
>>> + CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) {
>>> + if (bz_debug_groups) if_printf(ifp, "XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ %s:%d: ifgl %p, ifgl_group %p, ifg_group %p\n", __func__, __LINE__, ifgl, (ifgl != NULL) ? ifgl->ifgl_group : NULL, (ifgl != NULL && ifgl->ifgl_group != NULL) ? ifgl->ifgl_group->ifg_group : NULL);
>>> + }
>>> + IFNET_RUNLOCK();
>>> +
>>> return (0);
>>> }
>>>
>>>
>>>
>>> You see the anotation //shared ?
>>>
>>> I got a panic: excl->share with that.
>>
>> Well, I applied identical patch with you and I can repeat that panic, but my screen freezes and the top most stack is
>
> I took a video of the boot at 60fps so I could "scroll" a bit backwards.
Good idea!
>
>> ```
>> _sx_slock_int() at _sx_slock_int+0x64/frame 0xff....
>> if_addgroup() at .....
>> ....
>> device_attach() at ...
>> ...
>> root_bus_configure() at ...
>> configure() at ...
>> mi_startup() at ..
>> ```
>>
>> I've no idea what's wrong. From the disassembly it appears the panic happens just after witness_checkorder .
>
> That is interesting. So it's not just me.
>
> Did you do a netboot or from disk?
I boot from disk.
Updates on this locking issue,
I think I finally figured out why. More stack trace from my video:
```
shared lock of (sx) ifnet_sx @/usr/home/zlei/freebsd-src/sys/net/if.c:1467
while exclusively locked from /usr/home/zlei/freebsd-src/sys/net/if.c:1416
panic: excl->share
...
witness_checkorder() at ...
_sx_slock_int() at _sx_slock_int+0x64/frame ....
if_addgroup() at ...
if_attach_internal() at ...
ether_ifattach() at ...
iflib_device_register() at ...
iflib_device_attach() at ...
device_attach() at ...
...
root_bus_configure() at ...
configure() at ...
mi_startup() at ...
```
The ifnet_sx has flag bit SX_RECURSE then it can be recursively locked.
iflib_device_register() acquired ifnet_sx exclusively and then calls ethernet_ifattach() which will then calls if_addgroup(). It is prohibited to re-acquire the same lock shared so the witness blames.
I think the witness should show the first file location of the exclusively lock, i.e. sys/net/iflib.c rather than the sys/net/if.c:1416 . So that it is more straight forward to figure out how that happens. CC John to see if that can be improved.
OK, lets turn it over and focus on the original synchronization issue :) .
Best regards,
Zhenlei
>
> --
> Bjoern A. Zeeb r15:7
[-- Attachment #2 --]
<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jun 28, 2025, at 4:12 AM, Bjoern A. Zeeb <<a href="mailto:bzeeb-lists@lists.zabbadoz.net" class="">bzeeb-lists@lists.zabbadoz.net</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">On Sat, 28 Jun 2025, Zhenlei Huang wrote:</span><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""><br class=""><blockquote type="cite" class="">On Jun 27, 2025, at 11:02 PM, Bjoern A. Zeeb <<a href="mailto:bzeeb-lists@lists.zabbadoz.net" class="">bzeeb-lists@lists.zabbadoz.net</a>> wrote:<br class=""><br class="">On Wed, 25 Jun 2025, Zhenlei Huang wrote:<br class=""><br class="">Hi,<br class=""><br class="">I appplied olce's change from the review but it didn't make a difference<br class="">on my arm64 and now on a tree with local changes (wifi bits, user sapce<br class="">bits, etc).<br class=""><br class="">Now I netbooted that tree on X86 hardware (an old Lenovo Laptop) and ran<br class="">into something else (the same tree boots in a bhyve instance on a<br class="">different machine from a local disk image).<br class=""><br class="">At the end of if_addgroup() I had added the following for local<br class="">debugging (really crude sorry):<br class=""><br class="">...<br class=""><br class="">+ atomic_thread_fence_seq_cst();<br class=""> IF_ADDR_WLOCK(ifp);<br class=""> CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next);<br class=""> CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next);<br class=""> IF_ADDR_WUNLOCK(ifp);<br class=""><br class=""> IFNET_WUNLOCK();<span class="Apple-tab-span" style="white-space: pre;"> </span>// excl unlock<br class=""><br class=""> if (new)<br class=""> EVENTHANDLER_INVOKE(group_attach_event, ifg);<br class=""> EVENTHANDLER_INVOKE(group_change_event, groupname);<br class=""><br class="">+ IFNET_RLOCK(); // shared, panic<br class="">+ CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) {<br class="">+ if (bz_debug_groups) if_printf(ifp, "XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ %s:%d: ifgl %p, ifgl_group %p, ifg_group %p\n", __func__, __LINE__, ifgl, (ifgl != NULL) ? ifgl->ifgl_group : NULL, (ifgl != NULL && ifgl->ifgl_group != NULL) ? ifgl->ifgl_group->ifg_group : NULL);<br class="">+ }<br class="">+ IFNET_RUNLOCK();<br class="">+<br class=""> return (0);<br class="">}<br class=""><br class=""><br class=""><br class="">You see the anotation //shared ?<br class=""><br class="">I got a panic: excl->share with that.<br class=""></blockquote><br class="">Well, I applied identical patch with you and I can repeat that panic, but my screen freezes and the top most stack is<br class=""></blockquote><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">I took a video of the boot at 60fps so I could "scroll" a bit backwards.</span><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""></div></blockquote><div><br class=""></div><div>Good idea!</div><br class=""><blockquote type="cite" class=""><div class=""><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">```<br class="">_sx_slock_int() at _sx_slock_int+0x64/frame 0xff....<br class="">if_addgroup() at .....<br class="">....<br class="">device_attach() at ...<br class="">...<br class="">root_bus_configure() at ...<br class="">configure() at ...<br class="">mi_startup() at ..<br class="">```<br class=""><br class="">I've no idea what's wrong. From the disassembly it appears the panic happens just after witness_checkorder .<br class=""></blockquote><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">That is interesting. So it's not just me.</span><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Did you do a netboot or from disk?</span><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""></div></blockquote><div><br class=""></div><div>I boot from disk.</div><div><br class=""></div><div>Updates on this locking issue,</div><div><br class=""></div><div>I think I finally figured out why. More stack trace from my video:</div><div><br class=""></div><div>```</div><div>shared lock of (sx) ifnet_sx @/usr/home/zlei/freebsd-src/sys/net/if.c:1467</div><div>while exclusively locked from /usr/home/zlei/freebsd-src/sys/net/if.c:1416</div><div>panic: excl->share</div><div>...</div><div>witness_checkorder() at ...</div><div>_sx_slock_int() at _sx_slock_int+0x64/frame ....</div><div>if_addgroup() at ...</div><div>if_attach_internal() at ...</div><div>ether_ifattach() at ...</div><div>iflib_device_register() at ...</div><div>iflib_device_attach() at ...</div><div>device_attach() at ...</div><div>...</div><div>root_bus_configure() at ...</div><div>configure() at ...</div><div>mi_startup() at ...</div><div>```</div><div><br class=""></div><div>The ifnet_sx has flag bit SX_RECURSE then it can be recursively locked.</div><div><br class=""></div><div>iflib_device_register() acquired ifnet_sx exclusively and then calls ethernet_ifattach() which will then calls if_addgroup(). It is prohibited to re-acquire the same lock shared so the witness blames.</div><div><br class=""></div><div>I think the witness should show the first file location of the exclusively lock, i.e. sys/net/iflib.c rather than the sys/net/if.c:1416 . So that it is more straight forward to figure out how that happens. CC John to see if that can be improved.</div><div><br class=""></div><div>OK, lets turn it over and focus on the original synchronization issue :) . </div><div><br class=""></div><div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Best regards,</div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Zhenlei</div></div><br class=""><blockquote type="cite" class=""><div class=""><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">--<span class="Apple-converted-space"> </span></span><br style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Menlo-Regular; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Bjoern A. Zeeb r15:7</span></div></blockquote></div><br class=""><div class="">
<div><br class=""></div>
</div>
<br class=""></body></html>
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?907D042E-AE8A-4818-A807-AD45F36354FD>
