Date: Fri, 27 Jan 2012 10:12:13 -0800 From: Dmitry Mikulin <dmitrym@juniper.net> To: Kostik Belousov <kostikbel@gmail.com> Cc: freebsd-current Current <freebsd-current@freebsd.org>, Marcel Moolenaar <marcelm@juniper.net> Subject: Re: [ptrace] please review follow fork/exec changes Message-ID: <4F22E8FD.6010201@juniper.net> In-Reply-To: <20120126122326.GT2726@deviant.kiev.zoral.com.ua> References: <749E238A-A85F-4264-9DEB-BCE1BBD21C9D@juniper.net> <20120125074824.GD2726@deviant.kiev.zoral.com.ua> <4F2094B4.70707@juniper.net> <20120126122326.GT2726@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
--------------050804080208020403030502
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Attached are 4 separate patches for each somewhat independent portion of the kernel work related to the follow-fork implementation.
On 01/26/2012 04:23 AM, Kostik Belousov wrote:
> On Wed, Jan 25, 2012 at 03:48:04PM -0800, Dmitry Mikulin wrote:
>> Thanks for taking a look at it. I'll try to explain the changes the best I
>> can: the work was done nearly 6 months ago...
>>
>>> I would certainly appreciate some more words describing the changes.
>>>
>>> What is the goal of introducing the PT_FOLLOW_EXEC ? To not force
>>> the debugger to filter all syscall exits to see exec events ?
>> PT_FOLLOW_EXEC was added to trace the exec() family of calls. When it's
>> enabled, the kernel will generate a trap to give debugger a chance to clean
>> up old process info and initialize its state with the new binary.
>>
> It was more a question, why PT_FLAG_EXEC is not enough.
>
>>
>>> Why did you moved the stopevent/ptracestop for exec events from
>>> syscallret() to kern_execve() ? If moving, why the handling of TDB_EXEC
>>> is not removed from syscallret() ? I do not think that TDB_EXEC can be
>>> seen there after the patch. The same question about TDB_FORK.
>> The reason I moved the event notifications from syscallret() is because the
>> debugger is interested successful completion of either fork or exec, not
>> just the fact that they have been called. If, say, a call to fork() failed,
>> as far as debugger is concerned, fork() might as well had never been
>> called. Having a ptracestop in syscallret triggered a trap on every return
>> from fork without telling the debugger whether a new process had been
>> created or not. Same goes for exec().
> PT_FLAG_EXEC is only set if an exec-kind of syscall succeeded. The same
> is true for PT_FLAG_FORKED, the flag is set only if a new child was
> successfully created.
>
>> Looking at the code now I think I should have removed TDB_EXEC/TDB_FORK
>> processing from syscallret(). I'll do that and verify that it works as
>> expected.
>>
>>> If possible, I would greatly prefer to have fork changes separated.
>> You mean separate fork changes into a separate patch from exec changes?
> Yes. Even more, it seems that fork changes should be separated into
> bug fixes and new functionality.
>
>>> I doubt that disallowing RFMEM while tracing is the right change. It may
>>> be to fix some (undescribed) bug, but RFMEM is documented behaviour used
>>> not only for vfork(2), and the change just breaks rfork(2) for debugged
>>> processes.
>>>
>>> Even vfork() should not be broken, since I believe there are programs
>>> that rely on the vfork() model, in particular, C shell. It will be
>>> broken if vfork() is substituted by fork(). The fact that it breaks only
>>> under debugger will make it esp. confusing.
>> I need to dig up the details since I can't recall the exact reason for
>> forcing fork() in cases of user calls to vfork() under gdb. I believe it
>> had to do with when child process memory was available for writing in case
>> of vfork() and it wasn't early enough to complete the switch over from
>> parent to child in gdb. After consulting with our internal kernel experts
>> we decided that doing fork() instead of vfork() under gdb is a low impact
>> change.
>>
>>> Why do we need to have TDB_FORK set for td2 ?
>> The debugger needs to intercept fork() in both parent and child so it can
>> detach from the old process and attach to the new one. Maybe it'll make
>> more sense in the context of gdb changes. Should I send them too? Don't
>> think Marcel included that patch...
> No, I think the kernel changes are enough for now.
>
>>> Does the orphan list change intended to not lost the child after fork ?
>>> But the child shall be traced, so debugger would get the SIGTRAP on
>>> the attach on fork returning to usermode. I remember that I explicitely
>>> tested this when adding followfork changes.
>> Yes, the debugger gets SIGTRAPs. The problem arises when the real parent of
>> the forked process has the code to collect termination status. Since
>> attaching to a process changes the parent/child relationships, we need to
>> keep track of the children lost due to re-parenting so we can properly
>> attribute their exit status to the "real" parent.
>>
> I do not quite understand it. From the description it sounds as the problem
> that is not related to follow fork changes at all, and shall be presented
> regardless of the follow fork. If yes, I think this shall be separated
> into a standalone patch.
--------------050804080208020403030502
Content-Type: text/x-patch; name="follow-exec.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="follow-exec.patch"
Index: sys/kern/kern_exec.c
===================================================================
--- sys/kern/kern_exec.c (revision 230617)
+++ sys/kern/kern_exec.c (working copy)
@@ -55,6 +55,7 @@
#include <sys/priv.h>
#include <sys/proc.h>
#include <sys/pioctl.h>
+#include <sys/ptrace.h>
#include <sys/namei.h>
#include <sys/resourcevar.h>
#include <sys/sdt.h>
@@ -888,16 +889,22 @@
free(imgp->freepath, M_TEMP);
if (error == 0) {
+ if ((p->p_flag & (P_TRACED | P_FOLLOWEXEC)) ==
+ (P_TRACED | P_FOLLOWEXEC)) {
+ PROC_LOCK(p);
+ td->td_dbgflags |= TDB_EXEC;
+ PROC_UNLOCK(p);
+ }
+ /*
+ * Stop the process here if its stop event mask has
+ * the S_EXEC bit set.
+ */
+ STOPEVENT(p, S_EXEC, 0);
+ PTRACESTOP_SC(p, td, S_PT_EXEC);
PROC_LOCK(p);
- td->td_dbgflags |= TDB_EXEC;
+ td->td_dbgflags &= ~TDB_EXEC;
PROC_UNLOCK(p);
-
- /*
- * Stop the process here if its stop event mask has
- * the S_EXEC bit set.
- */
- STOPEVENT(p, S_EXEC, 0);
- goto done2;
+ goto done2;
}
exec_fail:
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c (revision 230617)
+++ sys/kern/sys_process.c (working copy)
@@ -660,6 +660,7 @@
case PT_TO_SCX:
case PT_SYSCALL:
case PT_FOLLOW_FORK:
+ case PT_FOLLOW_EXEC:
case PT_DETACH:
sx_xlock(&proctree_lock);
proctree_locked = 1;
@@ -874,6 +875,17 @@
p->p_flag &= ~P_FOLLOWFORK;
break;
+ case PT_FOLLOW_EXEC:
+ if (data) {
+ p->p_flag |= P_FOLLOWEXEC;
+ p->p_stops |= S_PT_EXEC;
+ }
+ else {
+ p->p_flag &= ~P_FOLLOWEXEC;
+ p->p_stops &= ~S_PT_EXEC;
+ }
+ break;
+
case PT_STEP:
case PT_CONTINUE:
case PT_TO_SCE:
@@ -936,7 +948,8 @@
p->p_sigparent = SIGCHLD;
}
p->p_oppid = 0;
- p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK);
+ p->p_flag &= ~(P_TRACED | P_WAITED | P_FOLLOWFORK |
+ P_FOLLOWEXEC);
/* should we send SIGCHLD? */
/* childproc_continued(p); */
Index: sys/sys/proc.h
===================================================================
--- sys/sys/proc.h (revision 230617)
+++ sys/sys/proc.h (working copy)
@@ -617,6 +617,7 @@
#define P_INMEM 0x10000000 /* Loaded into memory. */
#define P_SWAPPINGOUT 0x20000000 /* Process is being swapped out. */
#define P_SWAPPINGIN 0x40000000 /* Process is being swapped in. */
+#define P_FOLLOWEXEC 0x80000000 /* Notify debugger of exec events. */
#define P_STOPPED (P_STOPPED_SIG|P_STOPPED_SINGLE|P_STOPPED_TRACE)
#define P_SHOULDSTOP(p) ((p)->p_flag & P_STOPPED)
Index: sys/sys/ptrace.h
===================================================================
--- sys/sys/ptrace.h (revision 230617)
+++ sys/sys/ptrace.h (working copy)
@@ -64,6 +64,7 @@
#define PT_SYSCALL 22
#define PT_FOLLOW_FORK 23
+#define PT_FOLLOW_EXEC 24
#define PT_GETREGS 33 /* get general-purpose registers */
#define PT_SETREGS 34 /* set general-purpose registers */
@@ -142,6 +143,7 @@
*/
#define S_PT_SCE 0x000010000
#define S_PT_SCX 0x000020000
+#define S_PT_EXEC 0x000080000
int ptrace_set_pc(struct thread *_td, unsigned long _addr);
int ptrace_single_step(struct thread *_td);
--------------050804080208020403030502
Content-Type: text/x-patch; name="follow-fork.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="follow-fork.patch"
Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c (revision 230617)
+++ sys/kern/kern_fork.c (working copy)
@@ -59,6 +59,7 @@
#include <sys/proc.h>
#include <sys/procdesc.h>
#include <sys/pioctl.h>
+#include <sys/ptrace.h>
#include <sys/racct.h>
#include <sys/resourcevar.h>
#include <sys/sched.h>
@@ -702,7 +703,8 @@
* for runaway child.
*/
td->td_dbgflags |= TDB_FORK;
- td->td_dbg_forked = p2->p_pid;
+ td2->td_dbgflags |= TDB_FORK;
+ td->td_dbg_forked = td2->td_dbg_forked = p2->p_pid;
td2->td_dbgflags |= TDB_STOPATFORK;
_PHOLD(p2);
p2_held = 1;
@@ -731,6 +733,13 @@
knote_fork(&p1->p_klist, p2->p_pid);
SDT_PROBE(proc, kernel, , create, p2, p1, flags, 0, 0);
+ if (td->td_dbgflags & TDB_FORK) {
+ PTRACESTOP_SC(p1, td, S_PT_FORK);
+ PROC_LOCK(p1);
+ td->td_dbgflags &= ~TDB_FORK;
+ PROC_UNLOCK(p1);
+ }
+
/*
* Wait until debugger is attached to child.
*/
@@ -1036,6 +1045,7 @@
proc_reparent(p, dbg);
sx_xunlock(&proctree_lock);
ptracestop(td, SIGSTOP);
+ td->td_dbgflags &= ~TDB_FORK;
} else {
/*
* ... otherwise clear the request.
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c (revision 230617)
+++ sys/kern/sys_process.c (working copy)
@@ -868,10 +868,14 @@
break;
case PT_FOLLOW_FORK:
- if (data)
+ if (data) {
p->p_flag |= P_FOLLOWFORK;
- else
+ p->p_stops |= S_PT_FORK;
+ }
+ else {
p->p_flag &= ~P_FOLLOWFORK;
+ p->p_stops &= ~S_PT_FORK;
+ }
break;
case PT_STEP:
Index: sys/sys/ptrace.h
===================================================================
--- sys/sys/ptrace.h (revision 230617)
+++ sys/sys/ptrace.h (working copy)
@@ -142,6 +142,7 @@
*/
#define S_PT_SCE 0x000010000
#define S_PT_SCX 0x000020000
+#define S_PT_FORK 0x000040000
int ptrace_set_pc(struct thread *_td, unsigned long _addr);
int ptrace_single_step(struct thread *_td);
--------------050804080208020403030502
Content-Type: text/x-patch; name="orphan.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="orphan.patch"
Index: sys/kern/kern_exit.c
===================================================================
--- sys/kern/kern_exit.c (revision 230617)
+++ sys/kern/kern_exit.c (working copy)
@@ -680,7 +680,7 @@
*/
void
proc_reap(struct thread *td, struct proc *p, int *status, int options,
- struct rusage *rusage)
+ struct rusage *rusage, int child)
{
struct proc *q, *t;
@@ -720,7 +720,6 @@
if (p->p_oppid && (t = pfind(p->p_oppid)) != NULL) {
PROC_LOCK(p);
proc_reparent(p, t);
- p->p_pptr->p_dbg_child--;
p->p_oppid = 0;
PROC_UNLOCK(p);
pksignal(t, SIGCHLD, p->p_ksi);
@@ -738,7 +737,10 @@
sx_xlock(&allproc_lock);
LIST_REMOVE(p, p_list); /* off zombproc */
sx_xunlock(&allproc_lock);
- LIST_REMOVE(p, p_sibling);
+ if (child)
+ LIST_REMOVE(p, p_sibling);
+ else
+ LIST_REMOVE(p, p_orphan);
leavepgrp(p);
#ifdef PROCDESC
if (p->p_procdesc != NULL)
@@ -859,7 +861,7 @@
nfound++;
PROC_SLOCK(p);
if (p->p_state == PRS_ZOMBIE) {
- proc_reap(td, p, status, options, rusage);
+ proc_reap(td, p, status, options, rusage, 1);
return (0);
}
if ((p->p_flag & P_STOPPED_SIG) &&
@@ -893,16 +895,47 @@
if (status)
*status = SIGCONT;
+ }
+ PROC_UNLOCK(p);
+ }
+ LIST_FOREACH(p, &q->p_orphans, p_orphan) {
+ PROC_LOCK(p);
+ if (pid != WAIT_ANY &&
+ p->p_pid != pid && p->p_pgid != -pid) {
+ PROC_UNLOCK(p);
+ continue;
+ }
+ if (p_canwait(td, p)) {
+ PROC_UNLOCK(p);
+ continue;
+ }
+
+ /*
+ * This special case handles a kthread spawned by linux_clone
+ * (see linux_misc.c). The linux_wait4 and linux_waitpid
+ * functions need to be able to distinguish between waiting
+ * on a process and waiting on a thread. It is a thread if
+ * p_sigparent is not SIGCHLD, and the WLINUXCLONE option
+ * signifies we want to wait for threads and not processes.
+ */
+ if ((p->p_sigparent != SIGCHLD) ^
+ ((options & WLINUXCLONE) != 0)) {
+ PROC_UNLOCK(p);
+ continue;
+ }
+
+ nfound++;
+ PROC_SLOCK(p);
+ if (p->p_state == PRS_ZOMBIE) {
+ proc_reap(td, p, status, options, rusage, 0);
return (0);
}
+ PROC_SUNLOCK(p);
PROC_UNLOCK(p);
}
if (nfound == 0) {
sx_xunlock(&proctree_lock);
- if (td->td_proc->p_dbg_child)
- return (0);
- else
- return (ECHILD);
+ return (ECHILD);
}
if (options & WNOHANG) {
sx_xunlock(&proctree_lock);
@@ -929,6 +962,7 @@
void
proc_reparent(struct proc *child, struct proc *parent)
{
+ struct proc *p;
sx_assert(&proctree_lock, SX_XLOCKED);
PROC_LOCK_ASSERT(child, MA_OWNED);
@@ -940,5 +974,17 @@
PROC_UNLOCK(child->p_pptr);
LIST_REMOVE(child, p_sibling);
LIST_INSERT_HEAD(&parent->p_children, child, p_sibling);
+
+ LIST_FOREACH(p, &parent->p_orphans, p_orphan) {
+ if (p == child) {
+ LIST_REMOVE(child, p_orphan);
+ break;
+ }
+ }
+ if (child->p_flag & P_TRACED) {
+ LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan);
+ PROC_UNLOCK(child);
+ PROC_LOCK(child);
+ }
child->p_pptr = parent;
}
Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c (revision 230617)
+++ sys/kern/kern_fork.c (working copy)
@@ -590,6 +590,7 @@
LIST_INSERT_AFTER(p1, p2, p_pglist);
PGRP_UNLOCK(p1->p_pgrp);
LIST_INIT(&p2->p_children);
+ LIST_INIT(&p2->p_orphans);
callout_init(&p2->p_itcallout, CALLOUT_MPSAFE);
Index: sys/kern/sys_process.c
===================================================================
--- sys/kern/sys_process.c (revision 230617)
+++ sys/kern/sys_process.c (working copy)
@@ -841,8 +841,6 @@
p->p_flag |= P_TRACED;
p->p_oppid = p->p_pptr->p_pid;
if (p->p_pptr != td->td_proc) {
- /* Remember that a child is being debugged(traced). */
- p->p_pptr->p_dbg_child++;
proc_reparent(p, td->td_proc);
}
data = SIGSTOP;
@@ -931,7 +929,6 @@
PROC_UNLOCK(pp);
PROC_LOCK(p);
proc_reparent(p, pp);
- p->p_pptr->p_dbg_child--;
if (pp == initproc)
p->p_sigparent = SIGCHLD;
}
Index: sys/kern/sys_procdesc.c
===================================================================
--- sys/kern/sys_procdesc.c (revision 230617)
+++ sys/kern/sys_procdesc.c (working copy)
@@ -367,7 +367,7 @@
* procdesc_reap().
*/
PROC_SLOCK(p);
- proc_reap(curthread, p, NULL, 0, NULL);
+ proc_reap(curthread, p, NULL, 0, NULL, 0);
} else {
/*
* If the process is not yet dead, we need to kill it, but we
Index: sys/sys/proc.h
===================================================================
--- sys/sys/proc.h (revision 230617)
+++ sys/sys/proc.h (working copy)
@@ -505,8 +505,6 @@
/* The following fields are all zeroed upon creation in fork. */
#define p_startzero p_oppid
pid_t p_oppid; /* (c + e) Save ppid in ptrace. XXX */
- int p_dbg_child; /* (c + e) # of debugged children in
- ptrace. */
struct vmspace *p_vmspace; /* (b) Address space. */
u_int p_swtick; /* (c) Tick when swapped in or out. */
struct itimerval p_realtimer; /* (c) Alarm timer. */
@@ -574,6 +572,8 @@
after fork. */
uint64_t p_prev_runtime; /* (c) Resource usage accounting. */
struct racct *p_racct; /* (b) Resource accounting. */
+ LIST_ENTRY(proc) p_orphan; /* (e) List of orphan processes. */
+ LIST_HEAD(, proc) p_orphans; /* (e) Pointer to list of orphans. */
};
#define p_session p_pgrp->pg_session
@@ -865,7 +865,7 @@
void proc_linkup0(struct proc *p, struct thread *td);
void proc_linkup(struct proc *p, struct thread *td);
void proc_reap(struct thread *td, struct proc *p, int *status, int options,
- struct rusage *rusage);
+ struct rusage *rusage, int child);
void proc_reparent(struct proc *child, struct proc *newparent);
struct pstats *pstats_alloc(void);
void pstats_fork(struct pstats *src, struct pstats *dst);
--------------050804080208020403030502
Content-Type: text/x-patch; name="vfork-fork.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="vfork-fork.patch"
Index: sys/kern/kern_fork.c
===================================================================
--- sys/kern/kern_fork.c (revision 230617)
+++ sys/kern/kern_fork.c (working copy)
@@ -797,6 +797,10 @@
p1 = td->td_proc;
+ /* Don't do vfork while being traced. */
+ if (p1->p_flag & P_TRACED)
+ flags &= ~(RFPPWAIT | RFMEM);
+
/*
* Here we don't create a new process, but we divorce
* certain parts of a process from itself.
--------------050804080208020403030502--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F22E8FD.6010201>
