From owner-freebsd-current@freebsd.org Fri Mar 17 19:05:25 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57352D10779 for ; Fri, 17 Mar 2017 19:05:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 011CB1747; Fri, 17 Mar 2017 19:05:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v2HJ5E1m053910 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 17 Mar 2017 21:05:14 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v2HJ5E1m053910 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v2HJ5Ege053908; Fri, 17 Mar 2017 21:05:14 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 17 Mar 2017 21:05:14 +0200 From: Konstantin Belousov To: Bryan Drewery Cc: freebsd-current@freebsd.org Subject: Re: r314708: panic: tdsendsignal: ksi on queue Message-ID: <20170317190514.GV16105@kib.kiev.ua> References: <20170309144646.GB16105@kib.kiev.ua> <20170309231151.GA49720@stack.nl> <20170310104429.GH16105@kib.kiev.ua> <21c8bb48-df8e-f126-e664-e963274f6d48@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <21c8bb48-df8e-f126-e664-e963274f6d48@FreeBSD.org> User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Mar 2017 19:05:25 -0000 On Fri, Mar 17, 2017 at 11:26:43AM -0700, Bryan Drewery wrote: > On 3/10/2017 2:44 AM, Konstantin Belousov wrote: > > On Fri, Mar 10, 2017 at 12:11:51AM +0100, Jilles Tjoelker wrote: > >> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo > >> should always be the zombie's p_ksi; otherwise, the siginfo may be lost > >> if there are too many signals pending for the target process or in the > >> system. If the siginfo is lost and the reaper normally passes si_pid to > >> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie > >> will remain until the reaper terminates. > >> > >> Conceptually the siginfo is sent to one process at a time only, so the > >> bug is an artifact of the implementation. Perhaps the piece of code > >> added in r309886 can be moved or the ksiginfo can be removed from the > >> parent's queue. > >> > >> If such a fix is not possible, it may be better to send a bare SIGCHLD > >> (si_code is SI_KERNEL or 0, depending on how many signals are pending) > >> in this situation and document that reapers must use WAIT_ANY or P_ALL. > >> (However, compared to the pre-r309886 situation they can still use > >> SIGCHLD to get notified when to call waitpid() or similar.) > >> > > > > IMO it is acceptable for reaper to know and handle the case of lost > > siginfo. But also it seems to be not too hard to guarantee the queueing > > of the SIGCHLD for zombie re-parerning. > > > > diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c > > index c524fe5df37..1a1dcc3a4c7 100644 > > --- a/sys/kern/kern_exit.c > > +++ b/sys/kern/kern_exit.c > > @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo) > > { > > struct proc *p, *nq, *q, *t; > > struct thread *tdt; > > + ksiginfo_t *ksi, *ksi1; > > > > mtx_assert(&Giant, MA_NOTOWNED); > > KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo)); > > @@ -449,14 +450,23 @@ exit1(struct thread *td, int rval, int signo) > > wakeup(q->p_reaper); > > for (; q != NULL; q = nq) { > > nq = LIST_NEXT(q, p_sibling); > > + ksi = ksiginfo_alloc(TRUE); > > PROC_LOCK(q); > > q->p_sigparent = SIGCHLD; > > > > if (!(q->p_flag & P_TRACED)) { > > proc_reparent(q, q->p_reaper); > > if (q->p_state == PRS_ZOMBIE) { > > + if (q->p_ksi == NULL) { > > + ksi1 = NULL; > > + } else { > > + ksiginfo_copy(q->p_ksi, ksi); > > + ksi->ksi_flags |= KSI_INS; > > + ksi1 = ksi; > > + ksi = NULL; > > + } > > PROC_LOCK(q->p_reaper); > > - pksignal(q->p_reaper, SIGCHLD, q->p_ksi); > > + pksignal(q->p_reaper, SIGCHLD, ksi1); > > PROC_UNLOCK(q->p_reaper); > > } > > } else { > > @@ -489,6 +499,8 @@ exit1(struct thread *td, int rval, int signo) > > kern_psignal(q, SIGKILL); > > } > > PROC_UNLOCK(q); > > + if (ksi != NULL) > > + ksiginfo_free(ksi); > > } > > > > /* > > > Ping? Is this still progressing to be committed? r315159 ?