From owner-freebsd-current@freebsd.org Fri Mar 10 01:20:17 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E2529D05B7E for ; Fri, 10 Mar 2017 01:20:17 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id BE2E517B4 for ; Fri, 10 Mar 2017 01:20:17 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id BAA0ED05B7D; Fri, 10 Mar 2017 01:20:17 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA406D05B7B for ; Fri, 10 Mar 2017 01:20:17 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8896717B3; Fri, 10 Mar 2017 01:20:17 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by freefall.freebsd.org (Postfix) with ESMTPS id A4AC221A5; Fri, 10 Mar 2017 01:20:16 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [172.31.3.2]) by mail.xzibition.com (Postfix) with ESMTP id BEADB30C57; Fri, 10 Mar 2017 01:20:15 +0000 (UTC) X-Virus-Scanned: amavisd-new at mail.xzibition.com Received: from mail.xzibition.com ([172.31.3.2]) by mail.xzibition.com (mail.xzibition.com [172.31.3.2]) (amavisd-new, port 10026) with LMTP id 3qyQKNmKs71w; Fri, 10 Mar 2017 01:20:07 +0000 (UTC) Subject: Re: r314708: panic: tdsendsignal: ksi on queue DKIM-Filter: OpenDKIM Filter v2.9.2 mail.xzibition.com 0B65930C51 To: Konstantin Belousov References: <20170309144646.GB16105@kib.kiev.ua> <20170309231151.GA49720@stack.nl> <5e56d3d6-9e92-1b50-f720-ff16c58b74dd@FreeBSD.org> <767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org> Cc: current@FreeBSD.org From: Bryan Drewery Openpgp: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Organization: FreeBSD Message-ID: Date: Thu, 9 Mar 2017 17:20:11 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="HrLUqQbVE70X6mCT6F1Q8OnDTHmXX5pJC" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Mar 2017 01:20:18 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --HrLUqQbVE70X6mCT6F1Q8OnDTHmXX5pJC Content-Type: multipart/mixed; boundary="2bbCBTnU3iJlkRUOoW4C5ocvX7nDVv2xx"; protected-headers="v1" From: Bryan Drewery To: Konstantin Belousov Cc: current@FreeBSD.org Message-ID: Subject: Re: r314708: panic: tdsendsignal: ksi on queue References: <20170309144646.GB16105@kib.kiev.ua> <20170309231151.GA49720@stack.nl> <5e56d3d6-9e92-1b50-f720-ff16c58b74dd@FreeBSD.org> <767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org> In-Reply-To: <767b31de-835f-8fa6-85fb-34b276452479@FreeBSD.org> --2bbCBTnU3iJlkRUOoW4C5ocvX7nDVv2xx Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 3/9/2017 4:59 PM, Bryan Drewery wrote: > On 3/9/2017 3:57 PM, Bryan Drewery wrote: >> On 3/9/2017 3:47 PM, Bryan Drewery wrote: >>> On 3/9/2017 3:11 PM, Jilles Tjoelker wrote: >>>> On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:= >>>>> Yes, there is a race, apparently, with the child zombie still not f= inishing >>>>> sending the SIGCHLD to the parent and parent exiting. The followin= g should >>>>> fix the issue, but I do not think that reproducing the problem is e= asy. >>>> >>>>> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c >>>>> index c524fe5df37..ba5ff84e9de 100644 >>>>> --- a/sys/kern/kern_exit.c >>>>> +++ b/sys/kern/kern_exit.c >>>>> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo) >>>>> { >>>>> struct proc *p, *nq, *q, *t; >>>>> struct thread *tdt; >>>>> + ksiginfo_t ksi; >>>>> =20 >>>>> mtx_assert(&Giant, MA_NOTOWNED); >>>>> KASSERT(rval =3D=3D 0 || signo =3D=3D 0, ("exit1 rv %d sig %d", r= val, signo)); >>>>> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo) >>>>> proc_reparent(q, q->p_reaper); >>>>> if (q->p_state =3D=3D PRS_ZOMBIE) { >>>>> PROC_LOCK(q->p_reaper); >>>>> - pksignal(q->p_reaper, SIGCHLD, q->p_ksi); >>>>> + if (q->p_ksi !=3D NULL) { >>>>> + ksiginfo_init(&ksi); >>>>> + ksiginfo_copy(q->p_ksi, &ksi); >>>>> + } >>>>> + pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=3D >>>>> + NULL ? &ksi : NULL); >>>>> PROC_UNLOCK(q->p_reaper); >>>>> } >>>>> } else { >>> >>> I just got something weird with this patch that wasn't happening befo= re: >>> >>> /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j >>> exp-10amd64 -p commit -z test devel/ccache >>> [poudriere runs and completes with exit status 0] >>>> time: command terminated abnormally = =20 >>>> 28.08 real 9.92 user 10.38 sys = =20 >>>> 23464 maximum resident set size = =20 >>>> 4996 average shared memory size = =20 >>>> 88 average unshared data size = =20 >>>> 127 average unshared stack size = =20 >>>> 282705 page reclaims = =20 >>>> 5623 page faults = =20 >>>> 0 swaps = =20 >>>> 2673 block input operations = =20 >>>> 4836 block output operations = =20 >>>> 33 messages sent = =20 >>>> 0 messages received = =20 >>>> 37 signals received = =20 >>>> 11226 voluntary context switches = =20 >>>> 780 involuntary context switches = =20 >>>> zsh: alarm /usr/bin/time -l src/bin/poudriere -e /usr/local/etc= testport -j exp-10amd64 >>> exit status: 142 (SIGALRM). >>> >>> I don't see time(1) using SIGALRM or proc reaper at all. >>> >>> Rerunning it, and trying other simpler test cases, does not produce t= he >>> same result. It may be some race unrelated to this patch, dunno. >>> >> >> I'm consistently getting foreground processes getting the wrong signal= s >> now. I'm removing this patch for now. >=20 > It wasn't this patch doing this. Something else is very wrong with > signal handling right now. >=20 False alarm. This spurious SIGALRM issue is purely my fault. Ignore those reports. >> >>> >>>> >>>> This patch introduces a subtle correctness bug. A real SIGCHLD ksigi= nfo >>>> should always be the zombie's p_ksi; otherwise, the siginfo may be l= ost >>>> if there are too many signals pending for the target process or in t= he >>>> system. If the siginfo is lost and the reaper normally passes si_pid= to >>>> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombi= e >>>> will remain until the reaper terminates. >>>> >>>> Conceptually the siginfo is sent to one process at a time only, so t= he >>>> bug is an artifact of the implementation. Perhaps the piece of code >>>> added in r309886 can be moved or the ksiginfo can be removed from th= e >>>> parent's queue. >>>> >>>> If such a fix is not possible, it may be better to send a bare SIGCH= LD >>>> (si_code is SI_KERNEL or 0, depending on how many signals are pendin= g) >>>> in this situation and document that reapers must use WAIT_ANY or P_A= LL. >>>> (However, compared to the pre-r309886 situation they can still use >>>> SIGCHLD to get notified when to call waitpid() or similar.) >>>> >>> >>> >> >> >=20 >=20 --=20 Regards, Bryan Drewery --2bbCBTnU3iJlkRUOoW4C5ocvX7nDVv2xx-- --HrLUqQbVE70X6mCT6F1Q8OnDTHmXX5pJC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJYwf9LAAoJEDXXcbtuRpfPaQUIAJTio1yOOyhAp/h8gW69sffs 9CwA3WtkCwa49GXImITgnv1m+75gtiw7kqyb9vsFbgyQZlb5imJghtHcvhY2NboY f3K70IUuZ4kxFqki6YBUqo8wJoqjbRsV5lvK1i4kLRZrW8ZH46UbxSlkCyl1Jh+3 dZ8smOMjvNVK2oW/k8MNRrAn6fnsIpPsEYGePOGVuIHGxsKOMpI5lI3kEvaYEbOv u5B/z7QPm/6aVIygqHLdrAW3rmVvmyfzYeIyZnEd4oYokrkEHcjBkO+YOvFogbQg 9JUE4fBqTU6q0wVYqiNriZxDLcBKHlzU6MwW1AB/qkFHhDRoadxV9855jgAgEMc= =8fqQ -----END PGP SIGNATURE----- --HrLUqQbVE70X6mCT6F1Q8OnDTHmXX5pJC--