Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Mar 2025 18:06:03 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 283747] kernel panic after telegraf service restart
Message-ID:  <bug-283747-227-VkQrPl8wzA@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-283747-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-283747-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D283747

--- Comment #47 from Gleb Smirnoff <glebius@FreeBSD.org> ---
Mike, my current hypothesis is that we have a 32-bit overflow in credential
reference counting.  The overflow happens, when we reap a group of processe=
s,
and reference counts of the group summed up together overflow.  AFAIU, tele=
graf
will fork+exec arbitrary programs, which in their turn can also fork+exec m=
ore
programs.  While telegraf itself seems to do proper wait(2)-ing on zombies,=
 but
some external program may leak zombies, and do not exit itself.  Then, when
telegraf is restarted, this pack of zombies is reaped and this is where
overflow could be hit.

This is fixed by attachment 258804.  I am not sure in my hypothesis, that's=
 why
it is not even committed to CURRENT.  However, everyone affected by the bug=
 are
advices to use this patch and let's see what happens.  We still have some t=
ime
before 14.3.  I will probably start review process to get it into CURRENT,
anyway.

With this info, you may have some idea on how to reproduce it.  I know, you=
 are
good at chasing bugs, Mike :) Sorry that it hits you, but I'm glad that you
joined the team chasing this bug.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-283747-227-VkQrPl8wzA>