Date: Sun, 24 Nov 2019 21:26:40 -0600 From: Kyle Evans <kevans@freebsd.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: ptrace(2) debugging Message-ID: <CACNAnaGTpY2_npCvr2TC=Fif11%2BwcUVS8488jv%2Bon3AF=qZ=dQ@mail.gmail.com> In-Reply-To: <CACNAnaFAN=ZTU9ZQ5aKXW7bAeckJ%2BMvFgUMgLToFKhEn4LsG8A@mail.gmail.com> References: <CACNAnaHtsAaULLp0icE_=vY4eq2CuJ6Oq4Zx868axaYXArSOeQ@mail.gmail.com> <20191124113956.GY2707@kib.kiev.ua> <CACNAnaHn5RrQAqGFuWxDDhN0bbFbe5chh2PTN-oh9tURWMYPbA@mail.gmail.com> <CACNAnaFAN=ZTU9ZQ5aKXW7bAeckJ%2BMvFgUMgLToFKhEn4LsG8A@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 24, 2019 at 8:25 PM Kyle Evans <kevans@freebsd.org> wrote: > > On Sun, Nov 24, 2019 at 9:17 AM Kyle Evans <kevans@freebsd.org> wrote: > > > > On Sun, Nov 24, 2019 at 5:40 AM Konstantin Belousov <kostikbel@gmail.com> wrote: > > > > > > On Sun, Nov 24, 2019 at 12:01:04AM -0600, Kyle Evans wrote: > > > > Hi, > > > > > > > > I'm working on implementing `reptyr -T` on FreeBSD because I'm pretty > > > > bad about starting long-running jobs outside of tmux and often desire > > > > to reparent these jobs into tmux. I've gotten to a point where it's > > > > getting stuck in waitpid(2) when attempting to work over the session > > > > leader to ignore SIGHUP. The chain of operations looks roughly like > > > > this: > > > > > > > > PT_ATTACH -> waitpid -> kill(SIGCONT) -> PT_TO_SCE -> waitpid -> > > > > PT_TO_SCE -> waitpid > > > > > > > > Each of the waitpids are paired with a PT_LWPINFO. The first waitpid > > > > observes SIGSTOP. The second waitpid observes SIGCONT. I would expect > > > > the third to observe PL_FLAG_SCE on ptrace_lwpinfo->pl_flags, but > > > > instead it actually hangs as the target process is now sleep-inhibited > > > > and stuck in "pause" wchan. > > > > > > > > I've uploaded a truss excerpt at [0] in case it's helpful -- pid=10204 > > > > is the process I'm reparenting, initially just attached/detached to > > > > make sure reptyr *can* do this. pid=10187 is the sshd that it's > > > > running under, and pid=10188 is the shell running under that. > > > > > > > > Anyone have good advice on debugging this? It seems like it might be > > > > some kind of kernel bug, as it's already done this same dance once > > > > before when grabbing sshd and my attempts to distill it down to a > > > > simple test case failed. The FreeBSD part of reptyr needed some love, > > > > though, so that can't be discounted either. > > > > > > > > Thanks, > > > > > > > > Kyle Evans > > > > > > > > [0] https://people.freebsd.org/~kevans/truss.log > > > How much work would be to provide a self-contained standalone test ? > > > > I'm still struggling to write a self-contained example... > > unfortunately a basic attach and trace them all to syscall entry isn't > > sufficient. I'm slowly removing surface area from reptyr to try and > > narrow it down- its operations between attaching to sshd and the > > misbehavior are quite extensive, as it mmaps a page into the target, > > opens a socket established by reptyr and passes an fd back over it. > > I managed to narrow it down, kind of. The problem is specifically with > trying to trace zsh as a session leader. Easiest reproducer is to > change shell to zsh and run this: > https://people.freebsd.org/~kevans/ptrace_test.c -> you'll hang and > have to ^C that sucker. My experiments showed that running this on zsh > spawned any other way is fine, and changing shell to /bin/sh is also > fine. > Follow up part three, zsh is in sigsuspend() while a child is executing and this is the cause. More effective reproducer: https://people.freebsd.org/~kevans/ptrace_test2.c -> the behavior makes a little more sense to me, but that seems less than ideal.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaGTpY2_npCvr2TC=Fif11%2BwcUVS8488jv%2Bon3AF=qZ=dQ>