Date: Thu, 9 Mar 2017 17:20:11 -0800 From: Bryan Drewery <bdrewery@FreeBSD.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: current@FreeBSD.org Subject: Re: r314708: panic: tdsendsignal: ksi on queue Message-ID: <c7980385-4c0b-c190-7e5b-a73d7ca4d581@FreeBSD.org> In-Reply-To: <767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org> References: <d510a9da-8293-ba22-a1e6-75b3ea7ffa1d@FreeBSD.org> <20170309144646.GB16105@kib.kiev.ua> <20170309231151.GA49720@stack.nl> <5e56d3d6-9e92-1b50-f720-ff16c58b74dd@FreeBSD.org> <b7068678-e593-f4ed-647c-036ff0bc0576@FreeBSD.org> <767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--HrLUqQbVE70X6mCT6F1Q8OnDTHmXX5pJC
Content-Type: multipart/mixed; boundary="2bbCBTnU3iJlkRUOoW4C5ocvX7nDVv2xx";
protected-headers="v1"
From: Bryan Drewery <bdrewery@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: current@FreeBSD.org
Message-ID: <c7980385-4c0b-c190-7e5b-a73d7ca4d581@FreeBSD.org>
Subject: Re: r314708: panic: tdsendsignal: ksi on queue
References: <d510a9da-8293-ba22-a1e6-75b3ea7ffa1d@FreeBSD.org>
<20170309144646.GB16105@kib.kiev.ua> <20170309231151.GA49720@stack.nl>
<5e56d3d6-9e92-1b50-f720-ff16c58b74dd@FreeBSD.org>
<b7068678-e593-f4ed-647c-036ff0bc0576@FreeBSD.org>
<767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org>
In-Reply-To: <767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org>
--2bbCBTnU3iJlkRUOoW4C5ocvX7nDVv2xx
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
On 3/9/2017 4:59 PM, Bryan Drewery wrote:
> On 3/9/2017 3:57 PM, Bryan Drewery wrote:
>> On 3/9/2017 3:47 PM, Bryan Drewery wrote:
>>> On 3/9/2017 3:11 PM, Jilles Tjoelker wrote:
>>>> On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:=
>>>>> Yes, there is a race, apparently, with the child zombie still not f=
inishing
>>>>> sending the SIGCHLD to the parent and parent exiting. The followin=
g should
>>>>> fix the issue, but I do not think that reproducing the problem is e=
asy.
>>>>
>>>>> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
>>>>> index c524fe5df37..ba5ff84e9de 100644
>>>>> --- a/sys/kern/kern_exit.c
>>>>> +++ b/sys/kern/kern_exit.c
>>>>> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>>>>> {
>>>>> struct proc *p, *nq, *q, *t;
>>>>> struct thread *tdt;
>>>>> + ksiginfo_t ksi;
>>>>> =20
>>>>> mtx_assert(&Giant, MA_NOTOWNED);
>>>>> KASSERT(rval =3D=3D 0 || signo =3D=3D 0, ("exit1 rv %d sig %d", r=
val, signo));
>>>>> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>>>>> proc_reparent(q, q->p_reaper);
>>>>> if (q->p_state =3D=3D PRS_ZOMBIE) {
>>>>> PROC_LOCK(q->p_reaper);
>>>>> - pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
>>>>> + if (q->p_ksi !=3D NULL) {
>>>>> + ksiginfo_init(&ksi);
>>>>> + ksiginfo_copy(q->p_ksi, &ksi);
>>>>> + }
>>>>> + pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=3D
>>>>> + NULL ? &ksi : NULL);
>>>>> PROC_UNLOCK(q->p_reaper);
>>>>> }
>>>>> } else {
>>>
>>> I just got something weird with this patch that wasn't happening befo=
re:
>>>
>>> /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j
>>> exp-10amd64 -p commit -z test devel/ccache
>>> [poudriere runs and completes with exit status 0]
>>>> time: command terminated abnormally =
=20
>>>> 28.08 real 9.92 user 10.38 sys =
=20
>>>> 23464 maximum resident set size =
=20
>>>> 4996 average shared memory size =
=20
>>>> 88 average unshared data size =
=20
>>>> 127 average unshared stack size =
=20
>>>> 282705 page reclaims =
=20
>>>> 5623 page faults =
=20
>>>> 0 swaps =
=20
>>>> 2673 block input operations =
=20
>>>> 4836 block output operations =
=20
>>>> 33 messages sent =
=20
>>>> 0 messages received =
=20
>>>> 37 signals received =
=20
>>>> 11226 voluntary context switches =
=20
>>>> 780 involuntary context switches =
=20
>>>> zsh: alarm /usr/bin/time -l src/bin/poudriere -e /usr/local/etc=
testport -j exp-10amd64
>>> exit status: 142 (SIGALRM).
>>>
>>> I don't see time(1) using SIGALRM or proc reaper at all.
>>>
>>> Rerunning it, and trying other simpler test cases, does not produce t=
he
>>> same result. It may be some race unrelated to this patch, dunno.
>>>
>>
>> I'm consistently getting foreground processes getting the wrong signal=
s
>> now. I'm removing this patch for now.
>=20
> It wasn't this patch doing this. Something else is very wrong with
> signal handling right now.
>=20
False alarm. This spurious SIGALRM issue is purely my fault. Ignore
those reports.
>>
>>>
>>>>
>>>> This patch introduces a subtle correctness bug. A real SIGCHLD ksigi=
nfo
>>>> should always be the zombie's p_ksi; otherwise, the siginfo may be l=
ost
>>>> if there are too many signals pending for the target process or in t=
he
>>>> system. If the siginfo is lost and the reaper normally passes si_pid=
to
>>>> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombi=
e
>>>> will remain until the reaper terminates.
>>>>
>>>> Conceptually the siginfo is sent to one process at a time only, so t=
he
>>>> bug is an artifact of the implementation. Perhaps the piece of code
>>>> added in r309886 can be moved or the ksiginfo can be removed from th=
e
>>>> parent's queue.
>>>>
>>>> If such a fix is not possible, it may be better to send a bare SIGCH=
LD
>>>> (si_code is SI_KERNEL or 0, depending on how many signals are pendin=
g)
>>>> in this situation and document that reapers must use WAIT_ANY or P_A=
LL.
>>>> (However, compared to the pre-r309886 situation they can still use
>>>> SIGCHLD to get notified when to call waitpid() or similar.)
>>>>
>>>
>>>
>>
>>
>=20
>=20
--=20
Regards,
Bryan Drewery
--2bbCBTnU3iJlkRUOoW4C5ocvX7nDVv2xx--
--HrLUqQbVE70X6mCT6F1Q8OnDTHmXX5pJC
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAEBAgAGBQJYwf9LAAoJEDXXcbtuRpfPaQUIAJTio1yOOyhAp/h8gW69sffs
9CwA3WtkCwa49GXImITgnv1m+75gtiw7kqyb9vsFbgyQZlb5imJghtHcvhY2NboY
f3K70IUuZ4kxFqki6YBUqo8wJoqjbRsV5lvK1i4kLRZrW8ZH46UbxSlkCyl1Jh+3
dZ8smOMjvNVK2oW/k8MNRrAn6fnsIpPsEYGePOGVuIHGxsKOMpI5lI3kEvaYEbOv
u5B/z7QPm/6aVIygqHLdrAW3rmVvmyfzYeIyZnEd4oYokrkEHcjBkO+YOvFogbQg
9JUE4fBqTU6q0wVYqiNriZxDLcBKHlzU6MwW1AB/qkFHhDRoadxV9855jgAgEMc=
=8fqQ
-----END PGP SIGNATURE-----
--HrLUqQbVE70X6mCT6F1Q8OnDTHmXX5pJC--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c7980385-4c0b-c190-7e5b-a73d7ca4d581>
