Date: Fri, 27 Jun 2025 15:02:35 +0000 (UTC) From: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> To: Zhenlei Huang <zlei@FreeBSD.org> Cc: FreeBSD Current <current@freebsd.org>, Olivier Certner <olce@freebsd.org> Subject: Re: regression: memory issues on main/arm64 over sched/runq changes Message-ID: <28o26o81-so5r-qq79-6q6n-0q6746o7oo79@yvfgf.mnoonqbm.arg> In-Reply-To: <0A01B9F5-C49C-41D8-BAB7-4378DEDBF647@FreeBSD.org> References: <43005447-2rq0-6nn2-pnr5-4939s112npr4@yvfgf.mnoonqbm.arg> <0A01B9F5-C49C-41D8-BAB7-4378DEDBF647@FreeBSD.org>
index | next in thread | previous in thread | raw e-mail
On Wed, 25 Jun 2025, Zhenlei Huang wrote:
Hi,
I appplied olce's change from the review but it didn't make a difference
on my arm64 and now on a tree with local changes (wifi bits, user sapce
bits, etc).
Now I netbooted that tree on X86 hardware (an old Lenovo Laptop) and ran
into something else (the same tree boots in a bhyve instance on a
different machine from a local disk image).
At the end of if_addgroup() I had added the following for local
debugging (really crude sorry):
...
+ atomic_thread_fence_seq_cst();
IF_ADDR_WLOCK(ifp);
CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next);
CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next);
IF_ADDR_WUNLOCK(ifp);
IFNET_WUNLOCK(); // excl unlock
if (new)
EVENTHANDLER_INVOKE(group_attach_event, ifg);
EVENTHANDLER_INVOKE(group_change_event, groupname);
+ IFNET_RLOCK(); // shared, panic
+ CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) {
+ if (bz_debug_groups) if_printf(ifp, "XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ %s:%d: ifgl %p, ifgl_group %p, ifg_group %p\n", __func__, __LINE__, ifgl, (ifgl != NULL) ? ifgl->ifgl_group : NULL, (ifgl != NULL && ifgl->ifgl_group != NULL) ? ifgl->ifgl_group->ifg_group : NULL);
+ }
+ IFNET_RUNLOCK();
+
return (0);
}
You see the anotation //shared ?
I got a panic: excl->share with that.
The excl. is the
IFNET_WLOCK(); // excl
at the top of the function after the groupname check.
But that gets unlocked before the event handler above
so how can this happen?
Sadly I cannot even dump or anything as the keyboard is as dead
as the rest of the laptop. Have to power cycle it hard.
Apart from the debugging I added I have no local changes in sys/net
in that tree. sys/kern seems to have no relevant changes either
(added a bus func, toggle link_elf_leak_locals default, and a printf
got an extra argument to print %d error when modules fail to load).
I'll try a plain main (hopefully tonight) on that machine too but I am
really at a loss here now that it's also happening on X86 and only for me
and always around the same code there...
I'll also try to boot this tree from a USB pen drive or something; not
that my problem comes in from netbooing...
I'll keep you posted...
/bz
--
Bjoern A. Zeeb r15:7
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?28o26o81-so5r-qq79-6q6n-0q6746o7oo79>
