Date: Fri, 28 Mar 2025 18:06:03 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 283747] kernel panic after telegraf service restart Message-ID: <bug-283747-227-VkQrPl8wzA@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-283747-227@https.bugs.freebsd.org/bugzilla/> References: <bug-283747-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D283747 --- Comment #47 from Gleb Smirnoff <glebius@FreeBSD.org> --- Mike, my current hypothesis is that we have a 32-bit overflow in credential reference counting. The overflow happens, when we reap a group of processe= s, and reference counts of the group summed up together overflow. AFAIU, tele= graf will fork+exec arbitrary programs, which in their turn can also fork+exec m= ore programs. While telegraf itself seems to do proper wait(2)-ing on zombies,= but some external program may leak zombies, and do not exit itself. Then, when telegraf is restarted, this pack of zombies is reaped and this is where overflow could be hit. This is fixed by attachment 258804. I am not sure in my hypothesis, that's= why it is not even committed to CURRENT. However, everyone affected by the bug= are advices to use this patch and let's see what happens. We still have some t= ime before 14.3. I will probably start review process to get it into CURRENT, anyway. With this info, you may have some idea on how to reproduce it. I know, you= are good at chasing bugs, Mike :) Sorry that it hits you, but I'm glad that you joined the team chasing this bug. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-283747-227-VkQrPl8wzA>