From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 29 06:50:51 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5301F10656A3
	for <freebsd-hackers@freebsd.org>; Thu, 29 Oct 2009 06:50:50 +0000 (UTC)
	(envelope-from rea-fbsd@codelabs.ru)
Received: from 0.mx.codelabs.ru (0.mx.codelabs.ru [144.206.177.45])
	by mx1.freebsd.org (Postfix) with ESMTP id D57168FC2A
	for <freebsd-hackers@freebsd.org>; Thu, 29 Oct 2009 06:50:49 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=codelabs.ru; s=two; h=Date:From:To:Cc:Subject:Message-ID:
	Reply-To:References:MIME-Version:Content-Type:In-Reply-To:
	Sender; bh=rzdjl4zznLIKQA4G0GhnUq/jbsmY+2CcODK7u7uLVe8=; b=S22U5
	oAu8m3eaqo7Ns5V9/a/VmwPstJJnFyVGlv7HfYJUdq05kGpCAvXp9NxAd7BG2A8P
	Ho6TnKX+PUgqauukJMAr9iFVArN2ZpqUArQ0XCzTqaH8RGJeg2biBimXJ8uBkMTM
	l4MhzQTCskLN9Har4E89kbxhr2vegCXN06vE3FUEuPjnjDSFxhTHq1DBAsroEVqQ
	3OQnSDwoDByGQKoO2R6ziVYsPWqHPGaNq7YZfR905Rbfml6saQhiq6fdhXNsrpJW
	tkQGHLSN3IidqJcw1EqTyRFy9CToT61h35Wbt1paOoNwKnYHiS+8V1+UW4BOrsVj
	aiOwmxhXsJMe9M/Cw==
Received: from void.codelabs.ru (void.codelabs.ru [144.206.177.25])
	by 0.mx.codelabs.ru with esmtpsa (TLSv1:AES256-SHA:256)
	id 1N3OqL-0003PZ-Rt; Thu, 29 Oct 2009 09:50:46 +0300
Date: Thu, 29 Oct 2009 09:50:43 +0300
From: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
To: "Dorr H. Clark" <dclark@engr.scu.edu>
Message-ID: <jlHTT4ItuuL5pOoOT8LOA8wKXP0@m7CpruDFVVXtHTpJF1FPSUwA+UQ>
References: <Pine.GSO.4.21.0810072312220.4889-100000@nova41.dc.engr.scu.edu>
	<Pine.GSO.4.21.0910271711580.17024-100000@nova46.dc.engr.scu.edu>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="Dxnq1zWXvFF0Q93v"
Content-Disposition: inline
In-Reply-To: <Pine.GSO.4.21.0910271711580.17024-100000@nova46.dc.engr.scu.edu>
Sender: rea-fbsd@codelabs.ru
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-hackers@freebsd.org, freebsd-bugs@freebsd.org,
	freebsd-stable@freebsd.org
Subject: Re: ptrace problem 6.x/7.x - can someone explain this?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: rea-fbsd@codelabs.ru
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Oct 2009 06:50:56 -0000


--Dxnq1zWXvFF0Q93v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Dorr, good day.

Tue, Oct 27, 2009 at 05:32:34PM -0700, Dorr H. Clark wrote:
> We believe ptrace has a problem in 6.3; we have not tried other
> releases.  The same code, however, exists in 7.1.

And in HEAD too.

> The bug was first encountered in gdb...
> 
> (gdb) det
> Detaching from program: /usr/local/bin/emacs, process 66217
> (gdb) att 66224
> Attaching to program: /usr/local/bin/emacs, process 66224
> Error accessing memory address 0x281ba5a4: Device busy.
> (gdb) det
> Detaching from program: /usr/local/bin/emacs, process 66224
> ptrace: Device busy.
> (gdb) quit	<--- target process 66224 dies here
> 
> To isolate this problem, a wrote a simple minded test program was
> written that just attached and detached. This test program found 
> even the very first detach fails with EBUSY (see test source below):
> 
> $ ./test1 -p 66217 -c 1 -d 10
> pid 66217 count 1 delay 10
> Start of pass 0
> Calling PT_ATTACH pid 66217 addr 0x0 sig 0
> Calling PT_DETACH pid 66217 addr 0xffffffff sig 0
> Call 0 to PT_DETACH returned -1, errno 16
> 
> Once again, the target process died when the ptracing test program
> exitted, as would be expected if a detach had failed.
> 
> The failure return was coming from the following test in kern_ptrace()
> in sys_process.c
> 
>                 /* not currently stopped */ 
>                 if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 || 
>                     p->p_suspcount != p->p_numthreads  || 
>                     (p->p_flag & P_WAITED) == 0) { 
>                         error = EBUSY; 
>                         goto fail; 
>                 }

Yes, the ptraced process should have been waited for, even after
the PT_ATTACH call.  This is somewhat documented in ptrace(2),
-----
-----
but I agree that the wording is a bit sloppy.  I'll try to produce
slightly modified explanation in the manual page and will post the patch
here and as the PR.

I had modified your example to visually display the results of
each wait() call that is made after ptrace() invocation.  Here we go:
-----
$ ./test -p 45901
pid 45901 count 2 delay 5
Start of pass 0
Calling PT_ATTACH pid 45901 addr 0x0 sig 0
Attached
wait() yield 0x117f: stopped by signal 17; <-- after PT_ATTACH
wait() yield 0x57f: stopped by signal 5; <-- after PT_STEP
Calling PT_DETACH pid 45901 addr 0xffffffffffffffff sig 0
Detached.
-----

As you see, the process is stopped just after the PT_ATTACH with the
signal 17, SIGSTOP.  PT_STEP follows with the delivery of the SIGTRAP.
Both of these signals should be processed by the parent's wait().

And PT_DETACH works, apart from one thing: on my 8.0 PT_DETACH leads to
the segfault of the traced program.  I hadn't yet tried it on the other
versions, so may be there is some bug in the code of test.c, or some bug
in the ptrace() implementation -- can't say for sure.  If anyone knows
why the program segfaults -- please, speak up.  The modified source of
the test.s is attached.

> This is applied to all operations except PT_TRACE_ME, PT_ATTACH, and
> some instances of PT_CLEAR_STEP.
> 
> P_WAITED is generally not true. In particular, it's not set
> automatically when a process is PT_ATTACHed.   It is cleared by
> PT_DETACH and again when ptrace sends a signal (PT_CONTINUE,
> PT_DETACH.)  _But_ it's set in only two places, and they aren't in
> ptrace code.
> 
> 2 sys/kern/kern_exit.c      kern_wait         773 p->p_flag |= P_WAITED;
> 3 compat/svr4/svr4_misc.c   svr4_sys_waitsys 1351 q->p_flag |= P_WAITED;
> 
> The relevant one is the first one, primarily. Here's the code:
> 
>                 mtx_lock_spin(&sched_lock); 
>                 if ((p->p_flag & P_STOPPED_SIG) && 
>                     (p->p_suspcount == p->p_numthreads) && 
>                     (p->p_flag & P_WAITED) == 0 && 
>                     (p->p_flag & P_TRACED || options & WUNTRACED)) { 
>                         mtx_unlock_spin(&sched_lock); 
>                         p->p_flag |= P_WAITED; 
>                         sx_xunlock(&proctree_lock); 
>                         td->td_retval[0] = p->p_pid; 
>                         if (status) 
>                                 *status = W_STOPCODE(p->p_xstat); 
>                         PROC_UNLOCK(p); 
>                         return (0); 
>                 } 
>                 mtx_unlock_spin(&sched_lock); 
> 
> So it's only set on processes which are already traced. But it's not
> set until someone calls wait4() on them - or the equivalent sysV
> compatability routine.
> 
> Gdb doesn't always wait4() for processes immediately opon tracing
> them, and the ptrace man page does not imply this is needed. 

Hmm, there is at least one thread on the simular matter,
  http://sourceware.org/ml/gdb/2008-12/msg00041.html
and people are saying that wait() still should be present.

> Moreover, it's not clear why it should matter. The process
> needs to be stopped in order for it to make sense to do most
> of the things ptrace does. But - why should it need to be waited for?

To see if it was really stopped, I presume.

> And what kind of sense does this make to someone writing a debugging
> tool, where the natural logic seems to be:
> - attach to process

- wait for the process' attachment by doing wait().

> - look at some stuff
> - stick in some kind of breakpoint or similar and start it going again
>   (or 'step' it)
> - wait for it to stop
> - look at and modify stuff
> - detach, or set it moving again
> 
> By way of experiment, the test for P_WAITED was removed. Gdb no longer had
> problems, and no new issues with gdb were encountered (although this
> was just interactive, no "gdb coverage test" was attempted).

By the way, I can't reproduce gdb faults with the 8.0 sources.  Will
try 7.x, but I think that I have no 6.x handy.
-- 
Eygene
 _                ___       _.--.   #
 \`.|\..----...-'`   `-._.-'_.-'`   #  Remember that it is hard
 /  ' `         ,       __.--'      #  to read the on-line manual
 )/' _/     \   `-_,   /            #  while single-stepping the kernel.
 `-'" `"\_  ,_.-;_.-\_ ',  fsc/as   #
     _.-'_./   {_.'   ; /           #    -- FreeBSD Developers handbook
    {_.-``-'         {_/            #

--Dxnq1zWXvFF0Q93v--