Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Feb 2012 20:02:10 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Dmitry Mikulin <dmitrym@juniper.net>
Cc:        freebsd-current Current <freebsd-current@freebsd.org>, Marcel Moolenaar <marcelm@juniper.net>
Subject:   Re: [ptrace] please review follow fork/exec changes
Message-ID:  <20120215180210.GC3283@deviant.kiev.zoral.com.ua>
In-Reply-To: <4F3BF164.2020506@juniper.net>
References:  <20120210001725.GJ3283@deviant.kiev.zoral.com.ua> <4F3478B3.9040809@juniper.net> <20120213152825.GH3283@deviant.kiev.zoral.com.ua> <4F3988E8.2040705@juniper.net> <20120213222521.GK3283@deviant.kiev.zoral.com.ua> <4F3993C5.5020703@juniper.net> <20120215163252.GZ3283@deviant.kiev.zoral.com.ua> <4F3BE9C2.8040908@juniper.net> <20120215174031.GB3283@deviant.kiev.zoral.com.ua> <4F3BF164.2020506@juniper.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--5tsE7/DTPdxyIfi9
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Feb 15, 2012 at 09:54:44AM -0800, Dmitry Mikulin wrote:
>=20
>=20
> On 02/15/2012 09:40 AM, Konstantin Belousov wrote:
> >On Wed, Feb 15, 2012 at 09:22:10AM -0800, Dmitry Mikulin wrote:
> >>
> >>On 02/15/2012 08:32 AM, Konstantin Belousov wrote:
> >>>On Mon, Feb 13, 2012 at 02:50:45PM -0800, Dmitry Mikulin wrote:
> >>>>>>>It seems that now wait4(2) can be called from the real (non-debugg=
er)
> >>>>>>>parent first and result in the call to proc_reap(), isn't it ? We=
=20
> >>>>>>>would
> >>>>>>>then just reparent the child back to the caller, still leaving the
> >>>>>>>zombie and confusing debugger.
> >>>>>>When either gdb or the real parent gets to proc_reap() the process
> >>>>>>wouldn't
> >>>>>>get destroyed, it'll get caught by the following clause:
> >>>>>>     if (p->p_oppid&&    (t =3D pfind(p->p_oppid)) !=3D NULL) {
> >>>>>>
> >>>>>>and the real parent with get the child back into the children's list
> >>>>>>while
> >>>>>>gdb will get it into the orphan list. The second time around when
> >>>>>>proc_reap() is entered, p->p_oppid will be 0 and the process will g=
et
> >>>>>>really reaped. Does it make sense? And proc_reparent() attempts to=
=20
> >>>>>>keep
> >>>>>>the
> >>>>>>orphan list clean and not have the same entries and the list of
> >>>>>>siblings.
> >>>>>Right, this is what I figured. But I asked about some further=20
> >>>>>implication
> >>>>>of this change:
> >>>>>
> >>>>>if real parent spuriosly calls wait4(2) on the child pid after the=
=20
> >>>>>child
> >>>>>exited, but before the debugger called the wait4(), then exactly the
> >>>>>code you noted above will be run. This results in the child being fu=
lly
> >>>>>returned to the original parent.
> >>>>>
> >>>>>Next, the wait4() call from debugger gets an error, and zombie will =
be
> >>>>>kept around until parent calls wait4() for this pid once more.
> >>>>>
> >>>>>Am I missed something ?
> >>>>In this case the process will move from gdb's child list to gdb's orp=
han
> >>>>list when the real parent does a wait4(). Next time around the wait l=
oop
> >>>>in
> >>>>gdb it'll be caught by the orphan's proc_reap().
> >>>I do not see how the next debugger loop could find this process at all,
> >>>since the first wait4() call reparented it to the original parent.
> >>Not the debugger loop, the kern_wait() loop. The child get re-parented =
to
> >>the original parent but moves to the orphan list of the debugger proces=
s.
> >Either the debugger loop which calls wait4/waitpid, or the kern_wait loop
> >resulting from the debugger calling wait*.
> >
> >Could you, please, describe, how the patched kernel moves the wait'ed
> >zombie to the orphan list of the debugger ?
> >For me, it seems that there is another bug, the child appears both on
> >the childdren list, and on the orphan list of the real parent.
>=20
>=20
> The first attempt to reap the child will get into the
>     if (p->p_oppid && (t =3D pfind(p->p_oppid)) !=3D NULL) {
> clause, which will re-parent it to the real parent. The child will not be=
=20
> destroyed at this point.
>=20
> The following loop in proc_reparent() will make sure that the child does=
=20
> not stay in both lists:
>     LIST_FOREACH(p, &parent->p_orphans, p_orphan) {
>         if (p =3D=3D child) {
>             LIST_REMOVE(child, p_orphan);
>             break;
>         }
>     }
>=20
> Since the child parent is gdb and it's still being traced, the following=
=20
> will move it to gdb's orphan list:
>=20
>     if (child->p_flag & P_TRACED)
>         LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan);
No, the child parent at this point is no longer the gdb, it is the original
parent. And since P_TRACED is set, the process is inserted also in the
orphans list of the original parent.

This all happens during the first execution of wait4/waitpid from the
real parent, in the proc_reparent.

>=20
> After this the real parent will get the exit status.
>=20

> The next pass through the kern_wait() loop called from gdb will catch the=
=20
> child in its orphan list and will reap it this time for real since=20
> p->p_oppid will be set to 0 in the previous attempt to reap it. Gdb gets=
=20
> the exit code, the child is destroyed.
>=20
No, the child has no longer any assotiation with the debugger process,
since the block in the
	if (p->p_oppid && (t =3D pfind(p->p_oppid)) !=3D NULL) {
statement destroyed it.

--5tsE7/DTPdxyIfi9
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk878yIACgkQC3+MBN1Mb4jFuwCfQdv31kmmtUNAIqe1Ns5iO4/8
4k0AnjFqs12UDtnot3rJlh9qPrCJoIqA
=4+pR
-----END PGP SIGNATURE-----

--5tsE7/DTPdxyIfi9--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120215180210.GC3283>