Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Jan 2012 10:12:13 -0800
From:      Dmitry Mikulin <dmitrym@juniper.net>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        freebsd-current Current <freebsd-current@freebsd.org>, Marcel Moolenaar <marcelm@juniper.net>
Subject:   Re: [ptrace] please review follow fork/exec changes
Message-ID:  <4F22E8FD.6010201@juniper.net>
In-Reply-To: <20120126122326.GT2726@deviant.kiev.zoral.com.ua>
References:  <749E238A-A85F-4264-9DEB-BCE1BBD21C9D@juniper.net> <20120125074824.GD2726@deviant.kiev.zoral.com.ua> <4F2094B4.70707@juniper.net> <20120126122326.GT2726@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
--------------050804080208020403030502
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

Attached are 4 separate patches for each somewhat independent  portion of the kernel work related to the follow-fork implementation.


On 01/26/2012 04:23 AM, Kostik Belousov wrote:
> On Wed, Jan 25, 2012 at 03:48:04PM -0800, Dmitry Mikulin wrote:
>> Thanks for taking a look at it. I'll try to explain the changes the best I
>> can: the work was done nearly 6 months ago...
>>
>>> I would certainly appreciate some more words describing the changes.
>>>
>>> What is the goal of introducing the PT_FOLLOW_EXEC ? To not force
>>> the debugger to filter all syscall exits to see exec events ?
>> PT_FOLLOW_EXEC was added to trace the exec() family of calls. When it's
>> enabled, the kernel will generate a trap to give debugger a chance to clean
>> up old process info and initialize its state with the new binary.
>>
> It was more a question, why PT_FLAG_EXEC is not enough.
>
>>
>>> Why did you moved the stopevent/ptracestop for exec events from
>>> syscallret() to kern_execve() ? If moving, why the handling of TDB_EXEC
>>> is not removed from syscallret() ? I do not think that TDB_EXEC can be
>>> seen there after the patch. The same question about TDB_FORK.
>> The reason I moved the event notifications from syscallret() is because the
>> debugger is interested successful completion of either fork or exec, not
>> just the fact that they have been called. If, say, a call to fork() failed,
>> as far as debugger is concerned, fork() might as well had never been
>> called. Having a ptracestop in syscallret triggered a trap on every return
>> from fork without telling the debugger whether a new process had been
>> created or not. Same goes for exec().
> PT_FLAG_EXEC is only set if an exec-kind of syscall succeeded. The same
> is true for PT_FLAG_FORKED, the flag is set only if a new child was
> successfully created.
>
>> Looking at the code now I think I should have removed TDB_EXEC/TDB_FORK
>> processing from syscallret(). I'll do that and verify that it works as
>> expected.
>>
>>> If possible, I would greatly prefer to have fork changes separated.
>> You mean separate fork changes into a separate patch from exec changes?
> Yes. Even more, it seems that fork changes should be separated into
> bug fixes and new functionality.
>
>>> I doubt that disallowing RFMEM while tracing is the right change. It may
>>> be to fix some (undescribed) bug, but RFMEM is documented behaviour used
>>> not only for vfork(2), and the change just breaks rfork(2) for debugged
>>> processes.
>>>
>>> Even vfork() should not be broken, since I believe there are programs
>>> that rely on the vfork() model, in particular, C shell. It will be
>>> broken if vfork() is substituted by fork(). The fact that it breaks only
>>> under debugger will make it esp. confusing.
>> I need to dig up the details since I can't recall the exact reason for
>> forcing fork() in cases of user calls to vfork() under gdb. I believe it
>> had to do with when child process memory was available for writing in case
>> of vfork() and it wasn't early enough to complete the switch over from
>> parent to child in gdb. After consulting with our internal kernel experts
>> we decided that doing fork() instead of vfork() under gdb is a low impact
>> change.
>>
>>> Why do we need to have TDB_FORK set for td2 ?
>> The debugger needs to intercept fork() in both parent and child so it can
>> detach from the old process and attach to the new one. Maybe it'll make
>> more sense in the context of gdb changes. Should I send them too? Don't
>> think Marcel included that patch...
> No, I think the kernel changes are enough for now.
>
>>> Does the orphan list change intended to not lost the child after fork ?
>>> But the child shall be traced, so debugger would get the SIGTRAP on
>>> the attach on fork returning to usermode. I remember that I explicitely
>>> tested this when adding followfork changes.
>> Yes, the debugger gets SIGTRAPs. The problem arises when the real parent of
>> the forked process has the code to collect termination status. Since
>> attaching to a process changes the parent/child relationships, we need to
>> keep track of the children lost due to re-parenting so we can properly
>> attribute their exit status to the "real" parent.
>>
> I do not quite understand it. From the description it sounds as the problem
> that is not related to follow fork changes at all, and shall be presented
> regardless of the follow fork. If yes, I think this shall be separated
> into a standalone patch.



--------------050804080208020403030502
Content-Type: text/x-patch; name="follow-exec.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="follow-exec.patch"

Index: sys/kern/kern_exec.c
===================================================================
--- sys/kern/kern_exec.c	(revision 230617)
+++ sys/kern/kern_exec.c	(working copy)
@@ -55,6 +55,7 @@
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/pioctl.h>
+#include <sys/ptrace.h>
 #include <sys/namei.h>
 #include <sys/resourcevar.h>
 #include <sys/sdt.h>
@@ -888,16 +889,22 @@
 	free(imgp->freepath, M_TEMP);
 
 	if (error == 0) {
+		if ((p->p_flag & (P_TRACED | P_FOLLOWEXEC)) == 
+		    (P_TRACED |	P_FOLLOWEXEC)) {
+			PROC_LOCK(p);
+			td->td_dbgflags |= TDB_EXEC;
+			PROC_UNLOCK(p);
+		}
+ 		/*
+ 		 * Stop the process here if its stop event mask has
+ 		 * the S_EXEC bit set.
+ 		 */
+ 		STOPEVENT(p, S_EXEC, 0);
+		PTRACESTOP_SC(p, td, S_PT_EXEC);
 		PROC_LOCK(p);
-		td->td_dbgflags |= TDB_EXEC;
+		td->td_dbgflags &= ~TDB_EXEC;
 		PROC_UNLOCK(p);
-
-		/*
-		 * Stop the process here if its stop event mask has
-		 * the S_EXEC bit set.
-		 */
-		STOPEVENT(p, S_EXEC, 0);
-		goto done2;
+ 		goto done2;
 	}
 
 exec_fail:
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c	(revision 230617)
+++ sys/kern/sys_process.c	(working copy)
@@ -660,6 +660,7 @@
 	case PT_TO_SCX:
 	case PT_SYSCALL:
 	case PT_FOLLOW_FORK:
+	case PT_FOLLOW_EXEC:
 	case PT_DETACH:
 		sx_xlock(&proctree_lock);
 		proctree_locked = 1;
@@ -874,6 +875,17 @@
 			p->p_flag &= ~P_FOLLOWFORK;
 		break;
 
+	case PT_FOLLOW_EXEC:
+		if (data) {
+			p->p_flag |= P_FOLLOWEXEC;
+			p->p_stops |= S_PT_EXEC;
+		}
+		else {
+			p->p_flag &= ~P_FOLLOWEXEC;
+			p->p_stops &= ~S_PT_EXEC;
+		}
+		break;
+
 	case PT_STEP:
 	case PT_CONTINUE:
 	case PT_TO_SCE:
@@ -936,7 +948,8 @@
 					p->p_sigparent = SIGCHLD;
 			}
 			p->p_oppid = 0;
-			p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK);
+			p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK | 
+				       P_FOLLOWEXEC);
 
 			/* should we send SIGCHLD? */
 			/* childproc_continued(p); */
Index: sys/sys/proc.h
===================================================================
--- sys/sys/proc.h	(revision 230617)
+++ sys/sys/proc.h	(working copy)
@@ -617,6 +617,7 @@
 #define	P_INMEM		0x10000000 /* Loaded into memory. */
 #define	P_SWAPPINGOUT	0x20000000 /* Process is being swapped out. */
 #define	P_SWAPPINGIN	0x40000000 /* Process is being swapped in. */
+#define	P_FOLLOWEXEC	0x80000000 /* Notify debugger of exec events. */
 
 #define	P_STOPPED	(P_STOPPED_SIG|P_STOPPED_SINGLE|P_STOPPED_TRACE)
 #define	P_SHOULDSTOP(p)	((p)->p_flag & P_STOPPED)
Index: sys/sys/ptrace.h
===================================================================
--- sys/sys/ptrace.h	(revision 230617)
+++ sys/sys/ptrace.h	(working copy)
@@ -64,6 +64,7 @@
 #define	PT_SYSCALL	22
 
 #define	PT_FOLLOW_FORK	23
+#define	PT_FOLLOW_EXEC  24
 
 #define PT_GETREGS      33	/* get general-purpose registers */
 #define PT_SETREGS      34	/* set general-purpose registers */
@@ -142,6 +143,7 @@
  */
 #define	S_PT_SCE	0x000010000
 #define	S_PT_SCX	0x000020000
+#define	S_PT_EXEC	0x000080000
 
 int	ptrace_set_pc(struct thread *_td, unsigned long _addr);
 int	ptrace_single_step(struct thread *_td);

--------------050804080208020403030502
Content-Type: text/x-patch; name="follow-fork.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="follow-fork.patch"

Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c	(revision 230617)
+++ sys/kern/kern_fork.c	(working copy)
@@ -59,6 +59,7 @@
 #include <sys/proc.h>
 #include <sys/procdesc.h>
 #include <sys/pioctl.h>
+#include <sys/ptrace.h>
 #include <sys/racct.h>
 #include <sys/resourcevar.h>
 #include <sys/sched.h>
@@ -702,7 +703,8 @@
 		 * for runaway child.
 		 */
 		td->td_dbgflags |= TDB_FORK;
-		td->td_dbg_forked = p2->p_pid;
+		td2->td_dbgflags |= TDB_FORK;
+		td->td_dbg_forked = td2->td_dbg_forked = p2->p_pid;
 		td2->td_dbgflags |= TDB_STOPATFORK;
 		_PHOLD(p2);
 		p2_held = 1;
@@ -731,6 +733,13 @@
 	knote_fork(&p1->p_klist, p2->p_pid);
 	SDT_PROBE(proc, kernel, , create, p2, p1, flags, 0, 0);
 
+	if (td->td_dbgflags & TDB_FORK) {
+		PTRACESTOP_SC(p1, td, S_PT_FORK);
+		PROC_LOCK(p1);
+		td->td_dbgflags &= ~TDB_FORK;
+		PROC_UNLOCK(p1);
+	}
+
 	/*
 	 * Wait until debugger is attached to child.
 	 */
@@ -1036,6 +1045,7 @@
 			proc_reparent(p, dbg);
 			sx_xunlock(&proctree_lock);
 			ptracestop(td, SIGSTOP);
+			td->td_dbgflags &= ~TDB_FORK;
 		} else {
 			/*
 			 * ... otherwise clear the request.
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c	(revision 230617)
+++ sys/kern/sys_process.c	(working copy)
@@ -868,10 +868,14 @@
 		break;
 
 	case PT_FOLLOW_FORK:
-		if (data)
+		if (data) {
 			p->p_flag |= P_FOLLOWFORK;
-		else
+			p->p_stops |= S_PT_FORK;
+		}
+		else {
 			p->p_flag &= ~P_FOLLOWFORK;
+			p->p_stops &= ~S_PT_FORK;
+		}
 		break;
 
 	case PT_STEP:
Index: sys/sys/ptrace.h
===================================================================
--- sys/sys/ptrace.h	(revision 230617)
+++ sys/sys/ptrace.h	(working copy)
@@ -142,6 +142,7 @@
  */
 #define	S_PT_SCE	0x000010000
 #define	S_PT_SCX	0x000020000
+#define	S_PT_FORK	0x000040000
 
 int	ptrace_set_pc(struct thread *_td, unsigned long _addr);
 int	ptrace_single_step(struct thread *_td);

--------------050804080208020403030502
Content-Type: text/x-patch; name="orphan.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="orphan.patch"

Index: sys/kern/kern_exit.c
===================================================================
--- sys/kern/kern_exit.c	(revision 230617)
+++ sys/kern/kern_exit.c	(working copy)
@@ -680,7 +680,7 @@
  */
 void
 proc_reap(struct thread *td, struct proc *p, int *status, int options,
-    struct rusage *rusage)
+    struct rusage *rusage, int child)
 {
 	struct proc *q, *t;
 
@@ -720,7 +720,6 @@
 	if (p->p_oppid && (t = pfind(p->p_oppid)) != NULL) {
 		PROC_LOCK(p);
 		proc_reparent(p, t);
-		p->p_pptr->p_dbg_child--;
 		p->p_oppid = 0;
 		PROC_UNLOCK(p);
 		pksignal(t, SIGCHLD, p->p_ksi);
@@ -738,7 +737,10 @@
 	sx_xlock(&allproc_lock);
 	LIST_REMOVE(p, p_list);	/* off zombproc */
 	sx_xunlock(&allproc_lock);
-	LIST_REMOVE(p, p_sibling);
+	if (child)
+		LIST_REMOVE(p, p_sibling);
+	else
+		LIST_REMOVE(p, p_orphan);
 	leavepgrp(p);
 #ifdef PROCDESC
 	if (p->p_procdesc != NULL)
@@ -859,7 +861,7 @@
 		nfound++;
 		PROC_SLOCK(p);
 		if (p->p_state == PRS_ZOMBIE) {
-			proc_reap(td, p, status, options, rusage);
+			proc_reap(td, p, status, options, rusage, 1);
 			return (0);
 		}
 		if ((p->p_flag & P_STOPPED_SIG) &&
@@ -893,16 +895,47 @@
 
 			if (status)
 				*status = SIGCONT;
+		}
+		PROC_UNLOCK(p);
+	}
+	LIST_FOREACH(p, &q->p_orphans, p_orphan) {
+		PROC_LOCK(p);
+		if (pid != WAIT_ANY &&
+		    p->p_pid != pid && p->p_pgid != -pid) {
+			PROC_UNLOCK(p);
+			continue;
+		}
+		if (p_canwait(td, p)) {
+			PROC_UNLOCK(p);
+			continue;
+		}
+
+		/*
+		 * This special case handles a kthread spawned by linux_clone
+		 * (see linux_misc.c).  The linux_wait4 and linux_waitpid
+		 * functions need to be able to distinguish between waiting
+		 * on a process and waiting on a thread.  It is a thread if
+		 * p_sigparent is not SIGCHLD, and the WLINUXCLONE option
+		 * signifies we want to wait for threads and not processes.
+		 */
+		if ((p->p_sigparent != SIGCHLD) ^
+		    ((options & WLINUXCLONE) != 0)) {
+			PROC_UNLOCK(p);
+			continue;
+		}
+
+		nfound++;
+		PROC_SLOCK(p);
+		if (p->p_state == PRS_ZOMBIE) {
+		  proc_reap(td, p, status, options, rusage, 0);
 			return (0);
 		}
+		PROC_SUNLOCK(p);
 		PROC_UNLOCK(p);
 	}
 	if (nfound == 0) {
 		sx_xunlock(&proctree_lock);
-		if (td->td_proc->p_dbg_child)
-			return (0);
-		else
-			return (ECHILD);
+		return (ECHILD);
 	}
 	if (options & WNOHANG) {
 		sx_xunlock(&proctree_lock);
@@ -929,6 +962,7 @@
 void
 proc_reparent(struct proc *child, struct proc *parent)
 {
+	struct proc *p;
 
 	sx_assert(&proctree_lock, SX_XLOCKED);
 	PROC_LOCK_ASSERT(child, MA_OWNED);
@@ -940,5 +974,17 @@
 	PROC_UNLOCK(child->p_pptr);
 	LIST_REMOVE(child, p_sibling);
 	LIST_INSERT_HEAD(&parent->p_children, child, p_sibling);
+
+	LIST_FOREACH(p, &parent->p_orphans, p_orphan) {
+		if (p == child) {
+			LIST_REMOVE(child, p_orphan);
+			break;
+		}
+	}
+	if (child->p_flag & P_TRACED) {
+		LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan);
+		PROC_UNLOCK(child);
+		PROC_LOCK(child);
+	}
 	child->p_pptr = parent;
 }
Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c	(revision 230617)
+++ sys/kern/kern_fork.c	(working copy)
@@ -590,6 +590,7 @@
 	LIST_INSERT_AFTER(p1, p2, p_pglist);
 	PGRP_UNLOCK(p1->p_pgrp);
 	LIST_INIT(&p2->p_children);
+	LIST_INIT(&p2->p_orphans);
 
 	callout_init(&p2->p_itcallout, CALLOUT_MPSAFE);
 
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c	(revision 230617)
+++ sys/kern/sys_process.c	(working copy)
@@ -841,8 +841,6 @@
 		p->p_flag |= P_TRACED;
 		p->p_oppid = p->p_pptr->p_pid;
 		if (p->p_pptr != td->td_proc) {
-			/* Remember that a child is being debugged(traced). */
-			p->p_pptr->p_dbg_child++;
 			proc_reparent(p, td->td_proc);
 		}
 		data = SIGSTOP;
@@ -931,7 +929,6 @@
 					PROC_UNLOCK(pp);
 				PROC_LOCK(p);
 				proc_reparent(p, pp);
-				p->p_pptr->p_dbg_child--;
 				if (pp == initproc)
 					p->p_sigparent = SIGCHLD;
 			}
Index: sys/kern/sys_procdesc.c
===================================================================
--- sys/kern/sys_procdesc.c	(revision 230617)
+++ sys/kern/sys_procdesc.c	(working copy)
@@ -367,7 +367,7 @@
 		 * procdesc_reap().
 		 */
 		PROC_SLOCK(p);
-		proc_reap(curthread, p, NULL, 0, NULL);
+		proc_reap(curthread, p, NULL, 0, NULL, 0);
 	} else {
 		/*
 		 * If the process is not yet dead, we need to kill it, but we
Index: sys/sys/proc.h
===================================================================
--- sys/sys/proc.h	(revision 230617)
+++ sys/sys/proc.h	(working copy)
@@ -505,8 +505,6 @@
 /* The following fields are all zeroed upon creation in fork. */
 #define	p_startzero	p_oppid
 	pid_t		p_oppid;	/* (c + e) Save ppid in ptrace. XXX */
-	int		p_dbg_child;	/* (c + e) # of debugged children in
-							ptrace. */
 	struct vmspace	*p_vmspace;	/* (b) Address space. */
 	u_int		p_swtick;	/* (c) Tick when swapped in or out. */
 	struct itimerval p_realtimer;	/* (c) Alarm timer. */
@@ -574,6 +572,8 @@
 					   after fork. */
 	uint64_t	p_prev_runtime;	/* (c) Resource usage accounting. */
 	struct racct	*p_racct;	/* (b) Resource accounting. */
+	LIST_ENTRY(proc) p_orphan;	/* (e) List of orphan processes. */
+	LIST_HEAD(, proc) p_orphans;	/* (e) Pointer to list of orphans. */
 };
 
 #define	p_session	p_pgrp->pg_session
@@ -865,7 +865,7 @@
 void	proc_linkup0(struct proc *p, struct thread *td);
 void	proc_linkup(struct proc *p, struct thread *td);
 void	proc_reap(struct thread *td, struct proc *p, int *status, int options,
-	    struct rusage *rusage);
+	    struct rusage *rusage, int child);
 void	proc_reparent(struct proc *child, struct proc *newparent);
 struct	pstats *pstats_alloc(void);
 void	pstats_fork(struct pstats *src, struct pstats *dst);

--------------050804080208020403030502
Content-Type: text/x-patch; name="vfork-fork.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="vfork-fork.patch"

Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c	(revision 230617)
+++ sys/kern/kern_fork.c	(working copy)
@@ -797,6 +797,10 @@
 
 	p1 = td->td_proc;
 
+	/* Don't do vfork while being traced. */
+	if (p1->p_flag & P_TRACED)
+		flags &= ~(RFPPWAIT | RFMEM);
+
 	/*
 	 * Here we don't create a new process, but we divorce
 	 * certain parts of a process from itself.

--------------050804080208020403030502--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F22E8FD.6010201>