From owner-freebsd-current@FreeBSD.ORG Fri Jan 27 18:14:24 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10C81106566C for ; Fri, 27 Jan 2012 18:14:24 +0000 (UTC) (envelope-from dmitrym@juniper.net) Received: from exprod7og123.obsmtp.com (exprod7og123.obsmtp.com [64.18.2.24]) by mx1.freebsd.org (Postfix) with ESMTP id 79BD18FC17 for ; Fri, 27 Jan 2012 18:14:23 +0000 (UTC) Received: from P-EMHUB01-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob123.postini.com ([64.18.6.12]) with SMTP ID DSNKTyLpfW/I/xB/Jp5++msSFEykxWpLRH4l@postini.com; Fri, 27 Jan 2012 10:14:23 PST Received: from magenta.juniper.net (172.17.27.123) by P-EMHUB01-HQ.jnpr.net (172.24.192.33) with Microsoft SMTP Server (TLS) id 8.3.213.0; Fri, 27 Jan 2012 10:12:35 -0800 Received: from [172.24.26.191] (dmitrym-lnx.jnpr.net [172.24.26.191]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id q0RICJ141509; Fri, 27 Jan 2012 10:12:33 -0800 (PST) (envelope-from dmitrym@juniper.net) Message-ID: <4F22E8FD.6010201@juniper.net> Date: Fri, 27 Jan 2012 10:12:13 -0800 From: Dmitry Mikulin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111124 Thunderbird/8.0 MIME-Version: 1.0 To: Kostik Belousov References: <749E238A-A85F-4264-9DEB-BCE1BBD21C9D@juniper.net> <20120125074824.GD2726@deviant.kiev.zoral.com.ua> <4F2094B4.70707@juniper.net> <20120126122326.GT2726@deviant.kiev.zoral.com.ua> In-Reply-To: <20120126122326.GT2726@deviant.kiev.zoral.com.ua> Content-Type: multipart/mixed; boundary="------------050804080208020403030502" X-Mailman-Approved-At: Fri, 27 Jan 2012 18:19:37 +0000 Cc: freebsd-current Current , Marcel Moolenaar Subject: Re: [ptrace] please review follow fork/exec changes X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jan 2012 18:14:24 -0000 --------------050804080208020403030502 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Attached are 4 separate patches for each somewhat independent portion of the kernel work related to the follow-fork implementation. On 01/26/2012 04:23 AM, Kostik Belousov wrote: > On Wed, Jan 25, 2012 at 03:48:04PM -0800, Dmitry Mikulin wrote: >> Thanks for taking a look at it. I'll try to explain the changes the best I >> can: the work was done nearly 6 months ago... >> >>> I would certainly appreciate some more words describing the changes. >>> >>> What is the goal of introducing the PT_FOLLOW_EXEC ? To not force >>> the debugger to filter all syscall exits to see exec events ? >> PT_FOLLOW_EXEC was added to trace the exec() family of calls. When it's >> enabled, the kernel will generate a trap to give debugger a chance to clean >> up old process info and initialize its state with the new binary. >> > It was more a question, why PT_FLAG_EXEC is not enough. > >> >>> Why did you moved the stopevent/ptracestop for exec events from >>> syscallret() to kern_execve() ? If moving, why the handling of TDB_EXEC >>> is not removed from syscallret() ? I do not think that TDB_EXEC can be >>> seen there after the patch. The same question about TDB_FORK. >> The reason I moved the event notifications from syscallret() is because the >> debugger is interested successful completion of either fork or exec, not >> just the fact that they have been called. If, say, a call to fork() failed, >> as far as debugger is concerned, fork() might as well had never been >> called. Having a ptracestop in syscallret triggered a trap on every return >> from fork without telling the debugger whether a new process had been >> created or not. Same goes for exec(). > PT_FLAG_EXEC is only set if an exec-kind of syscall succeeded. The same > is true for PT_FLAG_FORKED, the flag is set only if a new child was > successfully created. > >> Looking at the code now I think I should have removed TDB_EXEC/TDB_FORK >> processing from syscallret(). I'll do that and verify that it works as >> expected. >> >>> If possible, I would greatly prefer to have fork changes separated. >> You mean separate fork changes into a separate patch from exec changes? > Yes. Even more, it seems that fork changes should be separated into > bug fixes and new functionality. > >>> I doubt that disallowing RFMEM while tracing is the right change. It may >>> be to fix some (undescribed) bug, but RFMEM is documented behaviour used >>> not only for vfork(2), and the change just breaks rfork(2) for debugged >>> processes. >>> >>> Even vfork() should not be broken, since I believe there are programs >>> that rely on the vfork() model, in particular, C shell. It will be >>> broken if vfork() is substituted by fork(). The fact that it breaks only >>> under debugger will make it esp. confusing. >> I need to dig up the details since I can't recall the exact reason for >> forcing fork() in cases of user calls to vfork() under gdb. I believe it >> had to do with when child process memory was available for writing in case >> of vfork() and it wasn't early enough to complete the switch over from >> parent to child in gdb. After consulting with our internal kernel experts >> we decided that doing fork() instead of vfork() under gdb is a low impact >> change. >> >>> Why do we need to have TDB_FORK set for td2 ? >> The debugger needs to intercept fork() in both parent and child so it can >> detach from the old process and attach to the new one. Maybe it'll make >> more sense in the context of gdb changes. Should I send them too? Don't >> think Marcel included that patch... > No, I think the kernel changes are enough for now. > >>> Does the orphan list change intended to not lost the child after fork ? >>> But the child shall be traced, so debugger would get the SIGTRAP on >>> the attach on fork returning to usermode. I remember that I explicitely >>> tested this when adding followfork changes. >> Yes, the debugger gets SIGTRAPs. The problem arises when the real parent of >> the forked process has the code to collect termination status. Since >> attaching to a process changes the parent/child relationships, we need to >> keep track of the children lost due to re-parenting so we can properly >> attribute their exit status to the "real" parent. >> > I do not quite understand it. From the description it sounds as the problem > that is not related to follow fork changes at all, and shall be presented > regardless of the follow fork. If yes, I think this shall be separated > into a standalone patch. --------------050804080208020403030502 Content-Type: text/x-patch; name="follow-exec.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="follow-exec.patch" Index: sys/kern/kern_exec.c =================================================================== --- sys/kern/kern_exec.c (revision 230617) +++ sys/kern/kern_exec.c (working copy) @@ -55,6 +55,7 @@ #include #include #include +#include #include #include #include @@ -888,16 +889,22 @@ free(imgp->freepath, M_TEMP); if (error == 0) { + if ((p->p_flag & (P_TRACED | P_FOLLOWEXEC)) == + (P_TRACED | P_FOLLOWEXEC)) { + PROC_LOCK(p); + td->td_dbgflags |= TDB_EXEC; + PROC_UNLOCK(p); + } + /* + * Stop the process here if its stop event mask has + * the S_EXEC bit set. + */ + STOPEVENT(p, S_EXEC, 0); + PTRACESTOP_SC(p, td, S_PT_EXEC); PROC_LOCK(p); - td->td_dbgflags |= TDB_EXEC; + td->td_dbgflags &= ~TDB_EXEC; PROC_UNLOCK(p); - - /* - * Stop the process here if its stop event mask has - * the S_EXEC bit set. - */ - STOPEVENT(p, S_EXEC, 0); - goto done2; + goto done2; } exec_fail: Index: sys/kern/sys_process.c =================================================================== --- sys/kern/sys_process.c (revision 230617) +++ sys/kern/sys_process.c (working copy) @@ -660,6 +660,7 @@ case PT_TO_SCX: case PT_SYSCALL: case PT_FOLLOW_FORK: + case PT_FOLLOW_EXEC: case PT_DETACH: sx_xlock(&proctree_lock); proctree_locked = 1; @@ -874,6 +875,17 @@ p->p_flag &= ~P_FOLLOWFORK; break; + case PT_FOLLOW_EXEC: + if (data) { + p->p_flag |= P_FOLLOWEXEC; + p->p_stops |= S_PT_EXEC; + } + else { + p->p_flag &= ~P_FOLLOWEXEC; + p->p_stops &= ~S_PT_EXEC; + } + break; + case PT_STEP: case PT_CONTINUE: case PT_TO_SCE: @@ -936,7 +948,8 @@ p->p_sigparent = SIGCHLD; } p->p_oppid = 0; - p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK); + p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK | + P_FOLLOWEXEC); /* should we send SIGCHLD? */ /* childproc_continued(p); */ Index: sys/sys/proc.h =================================================================== --- sys/sys/proc.h (revision 230617) +++ sys/sys/proc.h (working copy) @@ -617,6 +617,7 @@ #define P_INMEM 0x10000000 /* Loaded into memory. */ #define P_SWAPPINGOUT 0x20000000 /* Process is being swapped out. */ #define P_SWAPPINGIN 0x40000000 /* Process is being swapped in. */ +#define P_FOLLOWEXEC 0x80000000 /* Notify debugger of exec events. */ #define P_STOPPED (P_STOPPED_SIG|P_STOPPED_SINGLE|P_STOPPED_TRACE) #define P_SHOULDSTOP(p) ((p)->p_flag & P_STOPPED) Index: sys/sys/ptrace.h =================================================================== --- sys/sys/ptrace.h (revision 230617) +++ sys/sys/ptrace.h (working copy) @@ -64,6 +64,7 @@ #define PT_SYSCALL 22 #define PT_FOLLOW_FORK 23 +#define PT_FOLLOW_EXEC 24 #define PT_GETREGS 33 /* get general-purpose registers */ #define PT_SETREGS 34 /* set general-purpose registers */ @@ -142,6 +143,7 @@ */ #define S_PT_SCE 0x000010000 #define S_PT_SCX 0x000020000 +#define S_PT_EXEC 0x000080000 int ptrace_set_pc(struct thread *_td, unsigned long _addr); int ptrace_single_step(struct thread *_td); --------------050804080208020403030502 Content-Type: text/x-patch; name="follow-fork.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="follow-fork.patch" Index: sys/kern/kern_fork.c =================================================================== --- sys/kern/kern_fork.c (revision 230617) +++ sys/kern/kern_fork.c (working copy) @@ -59,6 +59,7 @@ #include #include #include +#include #include #include #include @@ -702,7 +703,8 @@ * for runaway child. */ td->td_dbgflags |= TDB_FORK; - td->td_dbg_forked = p2->p_pid; + td2->td_dbgflags |= TDB_FORK; + td->td_dbg_forked = td2->td_dbg_forked = p2->p_pid; td2->td_dbgflags |= TDB_STOPATFORK; _PHOLD(p2); p2_held = 1; @@ -731,6 +733,13 @@ knote_fork(&p1->p_klist, p2->p_pid); SDT_PROBE(proc, kernel, , create, p2, p1, flags, 0, 0); + if (td->td_dbgflags & TDB_FORK) { + PTRACESTOP_SC(p1, td, S_PT_FORK); + PROC_LOCK(p1); + td->td_dbgflags &= ~TDB_FORK; + PROC_UNLOCK(p1); + } + /* * Wait until debugger is attached to child. */ @@ -1036,6 +1045,7 @@ proc_reparent(p, dbg); sx_xunlock(&proctree_lock); ptracestop(td, SIGSTOP); + td->td_dbgflags &= ~TDB_FORK; } else { /* * ... otherwise clear the request. Index: sys/kern/sys_process.c =================================================================== --- sys/kern/sys_process.c (revision 230617) +++ sys/kern/sys_process.c (working copy) @@ -868,10 +868,14 @@ break; case PT_FOLLOW_FORK: - if (data) + if (data) { p->p_flag |= P_FOLLOWFORK; - else + p->p_stops |= S_PT_FORK; + } + else { p->p_flag &= ~P_FOLLOWFORK; + p->p_stops &= ~S_PT_FORK; + } break; case PT_STEP: Index: sys/sys/ptrace.h =================================================================== --- sys/sys/ptrace.h (revision 230617) +++ sys/sys/ptrace.h (working copy) @@ -142,6 +142,7 @@ */ #define S_PT_SCE 0x000010000 #define S_PT_SCX 0x000020000 +#define S_PT_FORK 0x000040000 int ptrace_set_pc(struct thread *_td, unsigned long _addr); int ptrace_single_step(struct thread *_td); --------------050804080208020403030502 Content-Type: text/x-patch; name="orphan.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="orphan.patch" Index: sys/kern/kern_exit.c =================================================================== --- sys/kern/kern_exit.c (revision 230617) +++ sys/kern/kern_exit.c (working copy) @@ -680,7 +680,7 @@ */ void proc_reap(struct thread *td, struct proc *p, int *status, int options, - struct rusage *rusage) + struct rusage *rusage, int child) { struct proc *q, *t; @@ -720,7 +720,6 @@ if (p->p_oppid && (t = pfind(p->p_oppid)) != NULL) { PROC_LOCK(p); proc_reparent(p, t); - p->p_pptr->p_dbg_child--; p->p_oppid = 0; PROC_UNLOCK(p); pksignal(t, SIGCHLD, p->p_ksi); @@ -738,7 +737,10 @@ sx_xlock(&allproc_lock); LIST_REMOVE(p, p_list); /* off zombproc */ sx_xunlock(&allproc_lock); - LIST_REMOVE(p, p_sibling); + if (child) + LIST_REMOVE(p, p_sibling); + else + LIST_REMOVE(p, p_orphan); leavepgrp(p); #ifdef PROCDESC if (p->p_procdesc != NULL) @@ -859,7 +861,7 @@ nfound++; PROC_SLOCK(p); if (p->p_state == PRS_ZOMBIE) { - proc_reap(td, p, status, options, rusage); + proc_reap(td, p, status, options, rusage, 1); return (0); } if ((p->p_flag & P_STOPPED_SIG) && @@ -893,16 +895,47 @@ if (status) *status = SIGCONT; + } + PROC_UNLOCK(p); + } + LIST_FOREACH(p, &q->p_orphans, p_orphan) { + PROC_LOCK(p); + if (pid != WAIT_ANY && + p->p_pid != pid && p->p_pgid != -pid) { + PROC_UNLOCK(p); + continue; + } + if (p_canwait(td, p)) { + PROC_UNLOCK(p); + continue; + } + + /* + * This special case handles a kthread spawned by linux_clone + * (see linux_misc.c). The linux_wait4 and linux_waitpid + * functions need to be able to distinguish between waiting + * on a process and waiting on a thread. It is a thread if + * p_sigparent is not SIGCHLD, and the WLINUXCLONE option + * signifies we want to wait for threads and not processes. + */ + if ((p->p_sigparent != SIGCHLD) ^ + ((options & WLINUXCLONE) != 0)) { + PROC_UNLOCK(p); + continue; + } + + nfound++; + PROC_SLOCK(p); + if (p->p_state == PRS_ZOMBIE) { + proc_reap(td, p, status, options, rusage, 0); return (0); } + PROC_SUNLOCK(p); PROC_UNLOCK(p); } if (nfound == 0) { sx_xunlock(&proctree_lock); - if (td->td_proc->p_dbg_child) - return (0); - else - return (ECHILD); + return (ECHILD); } if (options & WNOHANG) { sx_xunlock(&proctree_lock); @@ -929,6 +962,7 @@ void proc_reparent(struct proc *child, struct proc *parent) { + struct proc *p; sx_assert(&proctree_lock, SX_XLOCKED); PROC_LOCK_ASSERT(child, MA_OWNED); @@ -940,5 +974,17 @@ PROC_UNLOCK(child->p_pptr); LIST_REMOVE(child, p_sibling); LIST_INSERT_HEAD(&parent->p_children, child, p_sibling); + + LIST_FOREACH(p, &parent->p_orphans, p_orphan) { + if (p == child) { + LIST_REMOVE(child, p_orphan); + break; + } + } + if (child->p_flag & P_TRACED) { + LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan); + PROC_UNLOCK(child); + PROC_LOCK(child); + } child->p_pptr = parent; } Index: sys/kern/kern_fork.c =================================================================== --- sys/kern/kern_fork.c (revision 230617) +++ sys/kern/kern_fork.c (working copy) @@ -590,6 +590,7 @@ LIST_INSERT_AFTER(p1, p2, p_pglist); PGRP_UNLOCK(p1->p_pgrp); LIST_INIT(&p2->p_children); + LIST_INIT(&p2->p_orphans); callout_init(&p2->p_itcallout, CALLOUT_MPSAFE); Index: sys/kern/sys_process.c =================================================================== --- sys/kern/sys_process.c (revision 230617) +++ sys/kern/sys_process.c (working copy) @@ -841,8 +841,6 @@ p->p_flag |= P_TRACED; p->p_oppid = p->p_pptr->p_pid; if (p->p_pptr != td->td_proc) { - /* Remember that a child is being debugged(traced). */ - p->p_pptr->p_dbg_child++; proc_reparent(p, td->td_proc); } data = SIGSTOP; @@ -931,7 +929,6 @@ PROC_UNLOCK(pp); PROC_LOCK(p); proc_reparent(p, pp); - p->p_pptr->p_dbg_child--; if (pp == initproc) p->p_sigparent = SIGCHLD; } Index: sys/kern/sys_procdesc.c =================================================================== --- sys/kern/sys_procdesc.c (revision 230617) +++ sys/kern/sys_procdesc.c (working copy) @@ -367,7 +367,7 @@ * procdesc_reap(). */ PROC_SLOCK(p); - proc_reap(curthread, p, NULL, 0, NULL); + proc_reap(curthread, p, NULL, 0, NULL, 0); } else { /* * If the process is not yet dead, we need to kill it, but we Index: sys/sys/proc.h =================================================================== --- sys/sys/proc.h (revision 230617) +++ sys/sys/proc.h (working copy) @@ -505,8 +505,6 @@ /* The following fields are all zeroed upon creation in fork. */ #define p_startzero p_oppid pid_t p_oppid; /* (c + e) Save ppid in ptrace. XXX */ - int p_dbg_child; /* (c + e) # of debugged children in - ptrace. */ struct vmspace *p_vmspace; /* (b) Address space. */ u_int p_swtick; /* (c) Tick when swapped in or out. */ struct itimerval p_realtimer; /* (c) Alarm timer. */ @@ -574,6 +572,8 @@ after fork. */ uint64_t p_prev_runtime; /* (c) Resource usage accounting. */ struct racct *p_racct; /* (b) Resource accounting. */ + LIST_ENTRY(proc) p_orphan; /* (e) List of orphan processes. */ + LIST_HEAD(, proc) p_orphans; /* (e) Pointer to list of orphans. */ }; #define p_session p_pgrp->pg_session @@ -865,7 +865,7 @@ void proc_linkup0(struct proc *p, struct thread *td); void proc_linkup(struct proc *p, struct thread *td); void proc_reap(struct thread *td, struct proc *p, int *status, int options, - struct rusage *rusage); + struct rusage *rusage, int child); void proc_reparent(struct proc *child, struct proc *newparent); struct pstats *pstats_alloc(void); void pstats_fork(struct pstats *src, struct pstats *dst); --------------050804080208020403030502 Content-Type: text/x-patch; name="vfork-fork.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="vfork-fork.patch" Index: sys/kern/kern_fork.c =================================================================== --- sys/kern/kern_fork.c (revision 230617) +++ sys/kern/kern_fork.c (working copy) @@ -797,6 +797,10 @@ p1 = td->td_proc; + /* Don't do vfork while being traced. */ + if (p1->p_flag & P_TRACED) + flags &= ~(RFPPWAIT | RFMEM); + /* * Here we don't create a new process, but we divorce * certain parts of a process from itself. --------------050804080208020403030502--