Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Nov 2019 09:17:05 -0600
From:      Kyle Evans <kevans@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: ptrace(2) debugging
Message-ID:  <CACNAnaHn5RrQAqGFuWxDDhN0bbFbe5chh2PTN-oh9tURWMYPbA@mail.gmail.com>
In-Reply-To: <20191124113956.GY2707@kib.kiev.ua>
References:  <CACNAnaHtsAaULLp0icE_=vY4eq2CuJ6Oq4Zx868axaYXArSOeQ@mail.gmail.com> <20191124113956.GY2707@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 24, 2019 at 5:40 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Sun, Nov 24, 2019 at 12:01:04AM -0600, Kyle Evans wrote:
> > Hi,
> >
> > I'm working on implementing `reptyr -T` on FreeBSD because I'm pretty
> > bad about starting long-running jobs outside of tmux and often desire
> > to reparent these jobs into tmux. I've gotten to a point where it's
> > getting stuck in waitpid(2) when attempting to work over the session
> > leader to ignore SIGHUP. The chain of operations looks roughly like
> > this:
> >
> > PT_ATTACH -> waitpid -> kill(SIGCONT) -> PT_TO_SCE -> waitpid ->
> > PT_TO_SCE -> waitpid
> >
> > Each of the waitpids are paired with a PT_LWPINFO. The first waitpid
> > observes SIGSTOP. The second waitpid observes SIGCONT. I would expect
> > the third to observe PL_FLAG_SCE on ptrace_lwpinfo->pl_flags, but
> > instead it actually hangs as the target process is now sleep-inhibited
> > and stuck in "pause" wchan.
> >
> > I've uploaded a truss excerpt at [0] in case it's helpful -- pid=10204
> > is the process I'm reparenting, initially just attached/detached to
> > make sure reptyr *can* do this. pid=10187 is the sshd that it's
> > running under, and pid=10188 is the shell running under that.
> >
> > Anyone have good advice on debugging this? It seems like it might be
> > some kind of kernel bug, as it's already done this same dance once
> > before when grabbing sshd and my attempts to distill it down to a
> > simple test case failed. The FreeBSD part of reptyr needed some love,
> > though, so that can't be discounted either.
> >
> > Thanks,
> >
> > Kyle Evans
> >
> > [0] https://people.freebsd.org/~kevans/truss.log
> How much work would be to provide a self-contained standalone test ?

I'm still struggling to write a self-contained example...
unfortunately a basic attach and trace them all to syscall entry isn't
sufficient. I'm slowly removing surface area from reptyr to try and
narrow it down- its operations between attaching to sshd and the
misbehavior are quite extensive, as it mmaps a page into the target,
opens a socket established by reptyr and passes an fd back over it.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaHn5RrQAqGFuWxDDhN0bbFbe5chh2PTN-oh9tURWMYPbA>