From owner-freebsd-hackers@freebsd.org Wed Feb 3 08:05:19 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D1F8A9A1E9 for ; Wed, 3 Feb 2016 08:05:19 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x242.google.com (mail-wm0-x242.google.com [IPv6:2a00:1450:400c:c09::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 14CF11D9C; Wed, 3 Feb 2016 08:05:19 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x242.google.com with SMTP id l66so6137486wml.2; Wed, 03 Feb 2016 00:05:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=jg2eW4B7R2FAWeIO+4dg2kPa3NaS8ju2VS5tRK1lk80=; b=0G6ap+9YrRpjFI4Y8x4hq0LhiU0y8mJBHIt3FJuND/c7fiMg5TUUqznoJlyiSS9npD /seDQ7ExehWVMXNgxxD/LLYGgNEKALyYpGShMwMp+gc60hO9eIFUvXI7hUI5NVDJpSmj OLk2SE2/sFdmIaG94QTzhz2McVDYt/DBmrT8XiBD3EKHhf196KpcqUM1wjPwjqIpIWT4 tzkf3W806TXJF9YqPn/OlTwdaCfFmfWkvceiF4gYgFVf/qnK4/rDCA+l/XlR/XvbEelK VrVuRXpN0+rNKN7hTT74Jq6cJTGqisBg+ykmM6W3mIf1sd6Uig49fZ7Dkfm5deNsLL/b Blug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:mail-followup-to :references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=jg2eW4B7R2FAWeIO+4dg2kPa3NaS8ju2VS5tRK1lk80=; b=elVmuovszDqPoGA+ZY+o5/hI3cY4PYpP3fpEEr43ZfN75XcApJGQnbifKqSRvR1VMq BdSFHjDlGmYEIY7x7MDwjJ/mQjUCv36DWBWe2GsjTAe4AB+NSysCTI0uGrXaQd2EZQ6q 53UyCKOtalMwtt0JLTs8Kxa1k9ex5PjDORa5QaDfX+zyCH0lEk2rE5Nf1UdTlVEDxFXI wkk8q7jIu61NgjiTABjnE+7usLF3Q7keUqW1u35b2CR4R+D8EHdmFCzmeP+wKLZWAo62 TO/TSCwzxsIWiqMljM7k/7NGxIibRhrpJU8sWi/R85LrMhl03DjfqI0tQZLu+nTUZZv6 51pg== X-Gm-Message-State: AG10YOThvO7HWXgqKVkR7eGiXQukbMJEpKwL4mGuwWzGt8B3BWFExbaDYGxRNxxEEPNGZQ== X-Received: by 10.195.11.35 with SMTP id ef3mr231647wjd.35.1454486717551; Wed, 03 Feb 2016 00:05:17 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id cb2sm5158346wjc.16.2016.02.03.00.05.16 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Wed, 03 Feb 2016 00:05:16 -0800 (PST) Date: Wed, 3 Feb 2016 09:05:15 +0100 From: Mateusz Guzik To: Konstantin Belousov , freebsd-hackers@freebsd.org, jmg@freebsd.org Subject: Re: [PATCH 2/2] fork: plug a use after free of the returned process pointer Message-ID: <20160203080514.GA8753@dft-labs.eu> Mail-Followup-To: Mateusz Guzik , Konstantin Belousov , freebsd-hackers@freebsd.org, jmg@freebsd.org References: <20160201103632.GL91220@kib.kiev.ua> <1454386069-29657-1-git-send-email-mjguzik@gmail.com> <1454386069-29657-3-git-send-email-mjguzik@gmail.com> <20160202132322.GU91220@kib.kiev.ua> <20160202175652.GA9812@dft-labs.eu> <20160202181635.GC91220@kib.kiev.ua> <20160202214427.GB9812@dft-labs.eu> <20160203010412.GC9812@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160203010412.GC9812@dft-labs.eu> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Feb 2016 08:05:19 -0000 On Wed, Feb 03, 2016 at 02:04:13AM +0100, Mateusz Guzik wrote: > On Tue, Feb 02, 2016 at 10:44:27PM +0100, Mateusz Guzik wrote: > > On Tue, Feb 02, 2016 at 08:16:35PM +0200, Konstantin Belousov wrote: > > > On Tue, Feb 02, 2016 at 06:56:52PM +0100, Mateusz Guzik wrote: > > > > On Tue, Feb 02, 2016 at 03:23:22PM +0200, Konstantin Belousov wrote: > > > > > On Tue, Feb 02, 2016 at 05:07:49AM +0100, Mateusz Guzik wrote: > > > > > > + flags = fr->fr_flags; > > > > > Why not use fr->fr_flags directly ? It is slightly more churn, but IMO > > > > > it is worth it. > > > > > > > > > > > > > I'm indiffernet on this one, can change it no problem. > > > > > > > > > > + /* > > > > > > + * Hold the process so that it cannot exit after we make it runnable, > > > > > > + * but before we wait for the debugger. > > > > > Is this possible ? The forked child must execute through fork_return(), > > > > > and there we do ptracestop() before the child has a chance to ever return > > > > > to usermode. > > > > > > > > > > Do you mean a scenario where the debugger detaches before child executes > > > > > fork_return() and TDP_STOPATFORK is cleared in advance ? > > > > > > > > > > > > > The comment is somewhat misworded and I forgot to update it, how about > > > > just stating we hold the process so that we can mark the thread runnable > > > > and not have it disappear under we are done. > > > This means that the reader has to guess too much, IMHO. > > > > > > At least, add a note that despite fork_return() stops when the child > > > is traced, it is not enough because ... . > > > > > > > > While I have not tested this particular bug, prior to the patch the > > > > following should possible: p2 is untraced and td2 is marked as runnable, > > > > after which it exits and p2 is automatically reaped. If the code reaches > > > > the TDB_STOPATFORK check after that, PROC_LOCK(p2) succeeds due to > > > I.e. td2 is reused and the TDB_STOPATFORK is set by unrelated activity ? > > > You reference the do_fork() code checking TDB_STOPATFORK, and not > > > fork_return(), I guess. > > > > > > > > > > processes being type stable. td2 dereference can cause no issues due to > > > > threads being type stable as well. But the thread could have been resued > > > > in a traced process, thus inducing cv_wait(&p2->p_dbgwait, ..) even > > > > though td2 is not linked in p2 anymore and p2 is not even a valid > > > > process, making curthread wait indefinitely since there is nobody to > > > > wake it up. > > > > > > > Well, if TDP_STOPATFORK bit is set, it has the same meaning due to type > > > stability, and eventually the wake up would be performed. It just the > > > unintended sleep waiting for condvar which is problematic and which I > > > agree with. > > > > > > CPU0 is executing fork1. p2 is not traced. > > > > CPU0 CPU1 > > p2 and td2 created > > td2 is marked runnable > > td2 is scheduled here > > td2 does not have TDB_STOPATFORK set > > td2 exits > > p2 is autoreaped > > td2's space is reused > > td2 gets linked into p3 > > td2 gets TDB_STOPATFORK > > PROC_LOCK(p2); > > TDB_STOPATFORK test on td2 > > cv_wait(&p2->p_dbgwait, ..); > > > > So at this point p2 has no linked threads and is free. td2 belongs to > > p3 and p2 is waiting for a wakeup which can't happen. > > > > Now that I look at it this may be broken in an additonal way, which is > > not fixed by the patch: what if td2 spawns a new thread and thr_exits? > > In this case testing td2 is still invalid. Maybe I'm just getting > > paranoid here, I don't have time to properly test this right now. In > > worst case should be fixable well enough with FIRST_THREAD_IN_PROC. > > > > How about the following comment around _PHOLD: > > We are going to make the main thread runnable. It can quickly exit, > > causing the process to be reaped and possibly reused, thus invalidating > > our p2 pointer. Protect against this by holding the process, which > > postpones the exit. > > > > And if the suspicion gets confimed the following would be added: > > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c > > index d0c3837..2a076ed 100644 > > --- a/sys/kern/kern_fork.c > > +++ b/sys/kern/kern_fork.c > > @@ -773,6 +773,12 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > > > > PROC_LOCK(p2); > > /* > > + * By the time we got here the thread could have created a new thread > > + * and exited. Reload the main thread to ensure we got the right > > + * pointer. > > + */ > > + td2 = FIRST_THREAD_IN_PROC(p2); > > + /* > > * Wait until debugger is attached to child. > > */ > > while ((td2->td_dbgflags & TDB_STOPATFORK) != 0) > > > > > > To end the speculation and hackery I decided to reorganize the func a > little bit instead. Namely it can be trivially modified to not drop the > lock proc lock, which gets rid of all aforementioned races. Here is a trivial change retaining doing knote_fork before waiting for the debugger. Here the problem is worked around by making td2 runnable after knote_fork is performed and without relocking p2 in-between. diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c index d0c3837..366262f 100644 --- a/sys/kern/kern_fork.c +++ b/sys/kern/kern_fork.c @@ -715,11 +715,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork) dtrace_fasttrap_fork(p1, p2); #endif - /* - * Hold the process so that it cannot exit after we make it runnable, - * but before we wait for the debugger. - */ - _PHOLD(p2); if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED | P_FOLLOWFORK)) { /* @@ -756,6 +751,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * fdrop(fp_procdesc, td); } + PROC_LOCK(p2); if ((flags & RFSTOPPED) == 0) { /* * If RFSTOPPED not requested, make child runnable and @@ -771,13 +767,11 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * *fr->fr_procp = p2; } - PROC_LOCK(p2); /* * Wait until debugger is attached to child. */ while ((td2->td_dbgflags & TDB_STOPATFORK) != 0) cv_wait(&p2->p_dbgwait, &p2->p_mtx); - _PRELE(p2); racct_proc_fork_done(p2); PROC_UNLOCK(p2); } > > I got 2 variants. For brevity both patches are applied on top of the > current patchset. When combined, the patch would be combined with second > patch in the patchset. > > Both currently and with the first patch below knote_fork can get a > now-freed or even recycled pid. I find that odd. This variant only saves > the pid and calls knote_fork. This is likely fine enough. > > Just in case, the second variant adds a primitive - > knlist_empty_lockless to perform a racy check to see if there are any > knotes. If so, the process is held and released after knote_fork. Note > that this is no more racy than current code with respect to spotting > knotes. What does improve is the fact that the process is guaranteed to > be around for during knote_fork, although I don't know how important > this is. > > With both variants we also save one lock/unlock round, and a second > lock/unlock with of knotes for the second variant in the common case. > > Variant 1: > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c > index d0c3837..fae4eaf 100644 > --- a/sys/kern/kern_fork.c > +++ b/sys/kern/kern_fork.c > @@ -378,7 +378,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > struct vmspace *vm2, struct file *fp_procdesc) > { > struct proc *p1, *pptr; > - int trypid; > + int p2_pid, trypid; > struct filedesc *fd; > struct filedesc_to_leader *fdtol; > struct sigacts *newsigacts; > @@ -715,11 +715,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork) > dtrace_fasttrap_fork(p1, p2); > #endif > - /* > - * Hold the process so that it cannot exit after we make it runnable, > - * but before we wait for the debugger. > - */ > - _PHOLD(p2); > if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED | > P_FOLLOWFORK)) { > /* > @@ -737,7 +732,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > td->td_pflags |= TDP_RFPPWAIT; > td->td_rfppwait_p = p2; > } > - PROC_UNLOCK(p2); > > /* > * Now can be swapped. > @@ -745,16 +739,10 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > _PRELE(p1); > PROC_UNLOCK(p1); > > - /* > - * Tell any interested parties about the new process. > - */ > - knote_fork(&p1->p_klist, p2->p_pid); > SDT_PROBE3(proc, , , create, p2, p1, flags); > > - if (flags & RFPROCDESC) { > + if (flags & RFPROCDESC) > procdesc_finit(p2->p_procdesc, fp_procdesc); > - fdrop(fp_procdesc, td); > - } > > if ((flags & RFSTOPPED) == 0) { > /* > @@ -771,15 +759,26 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > *fr->fr_procp = p2; > } > > - PROC_LOCK(p2); > /* > * Wait until debugger is attached to child. > */ > while ((td2->td_dbgflags & TDB_STOPATFORK) != 0) > cv_wait(&p2->p_dbgwait, &p2->p_mtx); > - _PRELE(p2); > racct_proc_fork_done(p2); > + /* > + * The process can exit and be waited on after we drop the lock. Save > + * the pid so that it can be used for knotes. > + */ > + p2_pid = p2->p_pid; > PROC_UNLOCK(p2); > + > + /* > + * Tell any interested parties about the new process. > + */ > + knote_fork(&p1->p_klist, p2_pid); > + > + if (flags & RFPROCDESC) > + fdrop(fp_procdesc, td); > } > > int > ======================== > > Variant 2: > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c > index d41ac96..3610d8a 100644 > --- a/sys/kern/kern_event.c > +++ b/sys/kern/kern_event.c > @@ -2038,12 +2038,22 @@ knlist_remove_inevent(struct knlist *knl, struct knote *kn) > (kn->kn_status & KN_HASKQLOCK) == KN_HASKQLOCK); > } > > +/* > + * For when the caller accepts that the check is inherently racy. > + */ > +int > +knlist_empty_lockless(struct knlist *knl) > +{ > + > + return SLIST_EMPTY(&knl->kl_list); > +} > + > int > knlist_empty(struct knlist *knl) > { > > KNL_ASSERT_LOCKED(knl); > - return SLIST_EMPTY(&knl->kl_list); > + return knlist_empty_lockless(knl); > } > > static struct mtx knlist_lock; > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c > index d0c3837..70490ef 100644 > --- a/sys/kern/kern_fork.c > +++ b/sys/kern/kern_fork.c > @@ -378,7 +378,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > struct vmspace *vm2, struct file *fp_procdesc) > { > struct proc *p1, *pptr; > - int trypid; > + int p2_held, trypid; > struct filedesc *fd; > struct filedesc_to_leader *fdtol; > struct sigacts *newsigacts; > @@ -387,6 +387,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > sx_assert(&proctree_lock, SX_SLOCKED); > sx_assert(&allproc_lock, SX_XLOCKED); > > + p2_held = 0; > p1 = td->td_proc; > flags = fr->fr_flags; > > @@ -715,11 +716,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork) > dtrace_fasttrap_fork(p1, p2); > #endif > - /* > - * Hold the process so that it cannot exit after we make it runnable, > - * but before we wait for the debugger. > - */ > - _PHOLD(p2); > if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED | > P_FOLLOWFORK)) { > /* > @@ -737,7 +733,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > td->td_pflags |= TDP_RFPPWAIT; > td->td_rfppwait_p = p2; > } > - PROC_UNLOCK(p2); > > /* > * Now can be swapped. > @@ -745,16 +740,10 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > _PRELE(p1); > PROC_UNLOCK(p1); > > - /* > - * Tell any interested parties about the new process. > - */ > - knote_fork(&p1->p_klist, p2->p_pid); > SDT_PROBE3(proc, , , create, p2, p1, flags); > > - if (flags & RFPROCDESC) { > + if (flags & RFPROCDESC) > procdesc_finit(p2->p_procdesc, fp_procdesc); > - fdrop(fp_procdesc, td); > - } > > if ((flags & RFSTOPPED) == 0) { > /* > @@ -771,15 +760,32 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * > *fr->fr_procp = p2; > } > > - PROC_LOCK(p2); > /* > * Wait until debugger is attached to child. > */ > while ((td2->td_dbgflags & TDB_STOPATFORK) != 0) > cv_wait(&p2->p_dbgwait, &p2->p_mtx); > - _PRELE(p2); > racct_proc_fork_done(p2); > + if (!knlist_empty_lockless(&p1->p_klist)) { > + /* > + * Hold the process so that it does not exit until we call > + * knote_fork. > + */ > + _PHOLD(p2); > + p2_held = 1; > + } > PROC_UNLOCK(p2); > + > + if (p2_held) { > + /* > + * Tell any interested parties about the new process. > + */ > + knote_fork(&p1->p_klist, p2->p_pid); > + PRELE(p2); > + } > + > + if (flags & RFPROCDESC) > + fdrop(fp_procdesc, td); > } > > int > diff --git a/sys/sys/event.h b/sys/sys/event.h > index 0f13231..771b3bb 100644 > --- a/sys/sys/event.h > +++ b/sys/sys/event.h > @@ -254,6 +254,7 @@ extern void knote_fork(struct knlist *list, int pid); > extern void knlist_add(struct knlist *knl, struct knote *kn, int islocked); > extern void knlist_remove(struct knlist *knl, struct knote *kn, int islocked); > extern void knlist_remove_inevent(struct knlist *knl, struct knote *kn); > +extern int knlist_empty_lockless(struct knlist *knl); > extern int knlist_empty(struct knlist *knl); > extern void knlist_init(struct knlist *knl, void *lock, > void (*kl_lock)(void *), void (*kl_unlock)(void *), > > -- > Mateusz Guzik -- Mateusz Guzik