From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 29 06:50:51 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5301F10656A3 for ; Thu, 29 Oct 2009 06:50:50 +0000 (UTC) (envelope-from rea-fbsd@codelabs.ru) Received: from 0.mx.codelabs.ru (0.mx.codelabs.ru [144.206.177.45]) by mx1.freebsd.org (Postfix) with ESMTP id D57168FC2A for ; Thu, 29 Oct 2009 06:50:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=codelabs.ru; s=two; h=Date:From:To:Cc:Subject:Message-ID: Reply-To:References:MIME-Version:Content-Type:In-Reply-To: Sender; bh=rzdjl4zznLIKQA4G0GhnUq/jbsmY+2CcODK7u7uLVe8=; b=S22U5 oAu8m3eaqo7Ns5V9/a/VmwPstJJnFyVGlv7HfYJUdq05kGpCAvXp9NxAd7BG2A8P Ho6TnKX+PUgqauukJMAr9iFVArN2ZpqUArQ0XCzTqaH8RGJeg2biBimXJ8uBkMTM l4MhzQTCskLN9Har4E89kbxhr2vegCXN06vE3FUEuPjnjDSFxhTHq1DBAsroEVqQ 3OQnSDwoDByGQKoO2R6ziVYsPWqHPGaNq7YZfR905Rbfml6saQhiq6fdhXNsrpJW tkQGHLSN3IidqJcw1EqTyRFy9CToT61h35Wbt1paOoNwKnYHiS+8V1+UW4BOrsVj aiOwmxhXsJMe9M/Cw== Received: from void.codelabs.ru (void.codelabs.ru [144.206.177.25]) by 0.mx.codelabs.ru with esmtpsa (TLSv1:AES256-SHA:256) id 1N3OqL-0003PZ-Rt; Thu, 29 Oct 2009 09:50:46 +0300 Date: Thu, 29 Oct 2009 09:50:43 +0300 From: Eygene Ryabinkin To: "Dorr H. Clark" Message-ID: References: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="Dxnq1zWXvFF0Q93v" Content-Disposition: inline In-Reply-To: Sender: rea-fbsd@codelabs.ru X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-hackers@freebsd.org, freebsd-bugs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ptrace problem 6.x/7.x - can someone explain this? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rea-fbsd@codelabs.ru List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Oct 2009 06:50:56 -0000 --Dxnq1zWXvFF0Q93v Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Dorr, good day. Tue, Oct 27, 2009 at 05:32:34PM -0700, Dorr H. Clark wrote: > We believe ptrace has a problem in 6.3; we have not tried other > releases. The same code, however, exists in 7.1. And in HEAD too. > The bug was first encountered in gdb... > > (gdb) det > Detaching from program: /usr/local/bin/emacs, process 66217 > (gdb) att 66224 > Attaching to program: /usr/local/bin/emacs, process 66224 > Error accessing memory address 0x281ba5a4: Device busy. > (gdb) det > Detaching from program: /usr/local/bin/emacs, process 66224 > ptrace: Device busy. > (gdb) quit <--- target process 66224 dies here > > To isolate this problem, a wrote a simple minded test program was > written that just attached and detached. This test program found > even the very first detach fails with EBUSY (see test source below): > > $ ./test1 -p 66217 -c 1 -d 10 > pid 66217 count 1 delay 10 > Start of pass 0 > Calling PT_ATTACH pid 66217 addr 0x0 sig 0 > Calling PT_DETACH pid 66217 addr 0xffffffff sig 0 > Call 0 to PT_DETACH returned -1, errno 16 > > Once again, the target process died when the ptracing test program > exitted, as would be expected if a detach had failed. > > The failure return was coming from the following test in kern_ptrace() > in sys_process.c > > /* not currently stopped */ > if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 || > p->p_suspcount != p->p_numthreads || > (p->p_flag & P_WAITED) == 0) { > error = EBUSY; > goto fail; > } Yes, the ptraced process should have been waited for, even after the PT_ATTACH call. This is somewhat documented in ptrace(2), ----- ----- but I agree that the wording is a bit sloppy. I'll try to produce slightly modified explanation in the manual page and will post the patch here and as the PR. I had modified your example to visually display the results of each wait() call that is made after ptrace() invocation. Here we go: ----- $ ./test -p 45901 pid 45901 count 2 delay 5 Start of pass 0 Calling PT_ATTACH pid 45901 addr 0x0 sig 0 Attached wait() yield 0x117f: stopped by signal 17; <-- after PT_ATTACH wait() yield 0x57f: stopped by signal 5; <-- after PT_STEP Calling PT_DETACH pid 45901 addr 0xffffffffffffffff sig 0 Detached. ----- As you see, the process is stopped just after the PT_ATTACH with the signal 17, SIGSTOP. PT_STEP follows with the delivery of the SIGTRAP. Both of these signals should be processed by the parent's wait(). And PT_DETACH works, apart from one thing: on my 8.0 PT_DETACH leads to the segfault of the traced program. I hadn't yet tried it on the other versions, so may be there is some bug in the code of test.c, or some bug in the ptrace() implementation -- can't say for sure. If anyone knows why the program segfaults -- please, speak up. The modified source of the test.s is attached. > This is applied to all operations except PT_TRACE_ME, PT_ATTACH, and > some instances of PT_CLEAR_STEP. > > P_WAITED is generally not true. In particular, it's not set > automatically when a process is PT_ATTACHed. It is cleared by > PT_DETACH and again when ptrace sends a signal (PT_CONTINUE, > PT_DETACH.) _But_ it's set in only two places, and they aren't in > ptrace code. > > 2 sys/kern/kern_exit.c kern_wait 773 p->p_flag |= P_WAITED; > 3 compat/svr4/svr4_misc.c svr4_sys_waitsys 1351 q->p_flag |= P_WAITED; > > The relevant one is the first one, primarily. Here's the code: > > mtx_lock_spin(&sched_lock); > if ((p->p_flag & P_STOPPED_SIG) && > (p->p_suspcount == p->p_numthreads) && > (p->p_flag & P_WAITED) == 0 && > (p->p_flag & P_TRACED || options & WUNTRACED)) { > mtx_unlock_spin(&sched_lock); > p->p_flag |= P_WAITED; > sx_xunlock(&proctree_lock); > td->td_retval[0] = p->p_pid; > if (status) > *status = W_STOPCODE(p->p_xstat); > PROC_UNLOCK(p); > return (0); > } > mtx_unlock_spin(&sched_lock); > > So it's only set on processes which are already traced. But it's not > set until someone calls wait4() on them - or the equivalent sysV > compatability routine. > > Gdb doesn't always wait4() for processes immediately opon tracing > them, and the ptrace man page does not imply this is needed. Hmm, there is at least one thread on the simular matter, http://sourceware.org/ml/gdb/2008-12/msg00041.html and people are saying that wait() still should be present. > Moreover, it's not clear why it should matter. The process > needs to be stopped in order for it to make sense to do most > of the things ptrace does. But - why should it need to be waited for? To see if it was really stopped, I presume. > And what kind of sense does this make to someone writing a debugging > tool, where the natural logic seems to be: > - attach to process - wait for the process' attachment by doing wait(). > - look at some stuff > - stick in some kind of breakpoint or similar and start it going again > (or 'step' it) > - wait for it to stop > - look at and modify stuff > - detach, or set it moving again > > By way of experiment, the test for P_WAITED was removed. Gdb no longer had > problems, and no new issues with gdb were encountered (although this > was just interactive, no "gdb coverage test" was attempted). By the way, I can't reproduce gdb faults with the 8.0 sources. Will try 7.x, but I think that I have no 6.x handy. -- Eygene _ ___ _.--. # \`.|\..----...-'` `-._.-'_.-'` # Remember that it is hard / ' ` , __.--' # to read the on-line manual )/' _/ \ `-_, / # while single-stepping the kernel. `-'" `"\_ ,_.-;_.-\_ ', fsc/as # _.-'_./ {_.' ; / # -- FreeBSD Developers handbook {_.-``-' {_/ # --Dxnq1zWXvFF0Q93v--