From owner-freebsd-hackers@freebsd.org Mon Nov 25 02:26:09 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 33BCC1C4A2D for ; Mon, 25 Nov 2019 02:26:09 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47LrW50fGxz3KD6 for ; Mon, 25 Nov 2019 02:26:09 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) (Authenticated sender: kevans) by smtp.freebsd.org (Postfix) with ESMTPSA id DC6044DB3 for ; Mon, 25 Nov 2019 02:26:08 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: by mail-qt1-f181.google.com with SMTP id y10so15503373qto.3 for ; Sun, 24 Nov 2019 18:26:08 -0800 (PST) X-Gm-Message-State: APjAAAVaFkvTSlErqLsK3ZwjdUq0lGNNa/72qlZUjb6S5u403EjdurSd MVdAQkWWDJcel0hgQepcbDxyHxBerrLvPnaVXxI= X-Google-Smtp-Source: APXvYqw92oZ2hzaM0Yd+Nz00j/Wn7H/LyLRhxzRf6lqn0xRaKmE5vFSrXn3j7zcjy5FMoDIM6VhDbQl/EugyejXvRlY= X-Received: by 2002:ac8:41cc:: with SMTP id o12mr26198457qtm.310.1574648768219; Sun, 24 Nov 2019 18:26:08 -0800 (PST) MIME-Version: 1.0 References: <20191124113956.GY2707@kib.kiev.ua> In-Reply-To: From: Kyle Evans Date: Sun, 24 Nov 2019 20:25:56 -0600 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: ptrace(2) debugging To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2019 02:26:09 -0000 On Sun, Nov 24, 2019 at 9:17 AM Kyle Evans wrote: > > On Sun, Nov 24, 2019 at 5:40 AM Konstantin Belousov wrote: > > > > On Sun, Nov 24, 2019 at 12:01:04AM -0600, Kyle Evans wrote: > > > Hi, > > > > > > I'm working on implementing `reptyr -T` on FreeBSD because I'm pretty > > > bad about starting long-running jobs outside of tmux and often desire > > > to reparent these jobs into tmux. I've gotten to a point where it's > > > getting stuck in waitpid(2) when attempting to work over the session > > > leader to ignore SIGHUP. The chain of operations looks roughly like > > > this: > > > > > > PT_ATTACH -> waitpid -> kill(SIGCONT) -> PT_TO_SCE -> waitpid -> > > > PT_TO_SCE -> waitpid > > > > > > Each of the waitpids are paired with a PT_LWPINFO. The first waitpid > > > observes SIGSTOP. The second waitpid observes SIGCONT. I would expect > > > the third to observe PL_FLAG_SCE on ptrace_lwpinfo->pl_flags, but > > > instead it actually hangs as the target process is now sleep-inhibited > > > and stuck in "pause" wchan. > > > > > > I've uploaded a truss excerpt at [0] in case it's helpful -- pid=10204 > > > is the process I'm reparenting, initially just attached/detached to > > > make sure reptyr *can* do this. pid=10187 is the sshd that it's > > > running under, and pid=10188 is the shell running under that. > > > > > > Anyone have good advice on debugging this? It seems like it might be > > > some kind of kernel bug, as it's already done this same dance once > > > before when grabbing sshd and my attempts to distill it down to a > > > simple test case failed. The FreeBSD part of reptyr needed some love, > > > though, so that can't be discounted either. > > > > > > Thanks, > > > > > > Kyle Evans > > > > > > [0] https://people.freebsd.org/~kevans/truss.log > > How much work would be to provide a self-contained standalone test ? > > I'm still struggling to write a self-contained example... > unfortunately a basic attach and trace them all to syscall entry isn't > sufficient. I'm slowly removing surface area from reptyr to try and > narrow it down- its operations between attaching to sshd and the > misbehavior are quite extensive, as it mmaps a page into the target, > opens a socket established by reptyr and passes an fd back over it. I managed to narrow it down, kind of. The problem is specifically with trying to trace zsh as a session leader. Easiest reproducer is to change shell to zsh and run this: https://people.freebsd.org/~kevans/ptrace_test.c -> you'll hang and have to ^C that sucker. My experiments showed that running this on zsh spawned any other way is fine, and changing shell to /bin/sh is also fine. Thanks, Kyle Evans