Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Nov 2019 00:01:04 -0600
From:      Kyle Evans <kevans@freebsd.org>
To:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   ptrace(2) debugging
Message-ID:  <CACNAnaHtsAaULLp0icE_=vY4eq2CuJ6Oq4Zx868axaYXArSOeQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

I'm working on implementing `reptyr -T` on FreeBSD because I'm pretty
bad about starting long-running jobs outside of tmux and often desire
to reparent these jobs into tmux. I've gotten to a point where it's
getting stuck in waitpid(2) when attempting to work over the session
leader to ignore SIGHUP. The chain of operations looks roughly like
this:

PT_ATTACH -> waitpid -> kill(SIGCONT) -> PT_TO_SCE -> waitpid ->
PT_TO_SCE -> waitpid

Each of the waitpids are paired with a PT_LWPINFO. The first waitpid
observes SIGSTOP. The second waitpid observes SIGCONT. I would expect
the third to observe PL_FLAG_SCE on ptrace_lwpinfo->pl_flags, but
instead it actually hangs as the target process is now sleep-inhibited
and stuck in "pause" wchan.

I've uploaded a truss excerpt at [0] in case it's helpful -- pid=10204
is the process I'm reparenting, initially just attached/detached to
make sure reptyr *can* do this. pid=10187 is the sshd that it's
running under, and pid=10188 is the shell running under that.

Anyone have good advice on debugging this? It seems like it might be
some kind of kernel bug, as it's already done this same dance once
before when grabbing sshd and my attempts to distill it down to a
simple test case failed. The FreeBSD part of reptyr needed some love,
though, so that can't be discounted either.

Thanks,

Kyle Evans

[0] https://people.freebsd.org/~kevans/truss.log



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaHtsAaULLp0icE_=vY4eq2CuJ6Oq4Zx868axaYXArSOeQ>