From owner-freebsd-current@freebsd.org Thu Mar 9 23:11:55 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1FFD5D055E2 for ; Thu, 9 Mar 2017 23:11:55 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 0DA74AAB for ; Thu, 9 Mar 2017 23:11:55 +0000 (UTC) (envelope-from jilles@stack.nl) Received: by mailman.ysv.freebsd.org (Postfix) id 0CB4DD055E1; Thu, 9 Mar 2017 23:11:55 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C56DD055E0 for ; Thu, 9 Mar 2017 23:11:55 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mailout.stack.nl (mailout05.stack.nl [IPv6:2001:610:1108:5010::202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mailout.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CE672AA9; Thu, 9 Mar 2017 23:11:54 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mailout.stack.nl (Postfix) with ESMTP id 614A839; Fri, 10 Mar 2017 00:11:51 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 4A93C28497; Fri, 10 Mar 2017 00:11:51 +0100 (CET) Date: Fri, 10 Mar 2017 00:11:51 +0100 From: Jilles Tjoelker To: Konstantin Belousov Cc: Bryan Drewery , current@FreeBSD.org Subject: Re: r314708: panic: tdsendsignal: ksi on queue Message-ID: <20170309231151.GA49720@stack.nl> References: <20170309144646.GB16105@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170309144646.GB16105@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 23:11:55 -0000 On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote: > Yes, there is a race, apparently, with the child zombie still not finishing > sending the SIGCHLD to the parent and parent exiting. The following should > fix the issue, but I do not think that reproducing the problem is easy. > diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c > index c524fe5df37..ba5ff84e9de 100644 > --- a/sys/kern/kern_exit.c > +++ b/sys/kern/kern_exit.c > @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo) > { > struct proc *p, *nq, *q, *t; > struct thread *tdt; > + ksiginfo_t ksi; > > mtx_assert(&Giant, MA_NOTOWNED); > KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo)); > @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo) > proc_reparent(q, q->p_reaper); > if (q->p_state == PRS_ZOMBIE) { > PROC_LOCK(q->p_reaper); > - pksignal(q->p_reaper, SIGCHLD, q->p_ksi); > + if (q->p_ksi != NULL) { > + ksiginfo_init(&ksi); > + ksiginfo_copy(q->p_ksi, &ksi); > + } > + pksignal(q->p_reaper, SIGCHLD, q->p_ksi != > + NULL ? &ksi : NULL); > PROC_UNLOCK(q->p_reaper); > } > } else { This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo should always be the zombie's p_ksi; otherwise, the siginfo may be lost if there are too many signals pending for the target process or in the system. If the siginfo is lost and the reaper normally passes si_pid to waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie will remain until the reaper terminates. Conceptually the siginfo is sent to one process at a time only, so the bug is an artifact of the implementation. Perhaps the piece of code added in r309886 can be moved or the ksiginfo can be removed from the parent's queue. If such a fix is not possible, it may be better to send a bare SIGCHLD (si_code is SI_KERNEL or 0, depending on how many signals are pending) in this situation and document that reapers must use WAIT_ANY or P_ALL. (However, compared to the pre-r309886 situation they can still use SIGCHLD to get notified when to call waitpid() or similar.) -- Jilles Tjoelker