From owner-freebsd-hackers@freebsd.org Sat Apr 14 18:04:29 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4BE0F8DB70 for ; Sat, 14 Apr 2018 18:04:29 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E9E636F16B for ; Sat, 14 Apr 2018 18:04:28 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id w3EI4FOn086082 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 14 Apr 2018 21:04:19 +0300 (EEST) (envelope-from kib@freebsd.org) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua w3EI4FOn086082 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id w3EI4FVY086081; Sat, 14 Apr 2018 21:04:15 +0300 (EEST) (envelope-from kib@freebsd.org) X-Authentication-Warning: tom.home: kostik set sender to kib@freebsd.org using -f Date: Sat, 14 Apr 2018 21:04:15 +0300 From: Konstantin Belousov To: Thomas Munro Cc: freebsd-hackers@freebsd.org Subject: Re: PROC_SET_PDEATHSIG to get a signal when your parent exits Message-ID: <20180414180415.GC1774@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Apr 2018 18:04:30 -0000 On Sat, Apr 14, 2018 at 04:44:51PM +1200, Thomas Munro wrote: > Hello, > > In PostgreSQL we are planning to make use of Linux's > prctl(PR_SET_PDEATHSIG) facility which tells the kernel to signal you > when your parent process exits. This will cut down on polling and > contention on a pipe in certain busy loops, where we need to be able > to exit quickly if something kills the main supervisor process but we > aren't waiting in kevent/epoll/poll-type primitive. We considered > SIGIO but that'd require either one pipe per child process or a common > process group, neither of which works for us. So I decided to try > adding a new command PROC_SET_PDEATHSIG to procctl() to work exactly > like Linux. > > Please see attached early draft patch. This is my first attempt at > writing a patch for FreeBSD, so I expect that it contains newbie > mistakes and style problems and I would be grateful for any feedback. > Is this design sane? Is that the best syscall to use for this? Did I > get the ptrace/orphan stuff right? Should this somehow share > something with the Linux compat support for prctl(PR_SET_PDEATHSIG, > ...)? Would people want this? Do we support prctl(2) in linuxolator ? Even if yes, I doubt that there is PR_SET_PDEATHSIG feature. > > Here's a quick and dirty userspace program I came up with while > developing the patch: > > https://github.com/macdice/test-deathsig/blob/master/test-deathsig.c > > That tests a few interesting cases I thought of, but note that I > haven't yet tested 32 bit support or the setuid/getuid logic. Compile the test program using "cc -m32 -o test-deathsig-32 ...." to easily get binary for compat32 testing. We have test framework, and several procctl(2) tests already. Please look at the tests/sys/kern/reaper.c. You might consider converting your test to the framework and also adding it to the base system. Other comments inline. You may create the account at https://reviews.freebsd.org and putting patch there. > From b9a6730697ba09f7ee0f81129f3f976df826d1cd Mon Sep 17 00:00:00 2001 > From: Thomas Munro > Date: Fri, 13 Apr 2018 23:41:30 +0100 > Subject: [PATCH] Support PROC_SET_PDEATHSIG as a command to procctl. > > Allow processes to request the delivery of a signal upon death > of their parent process. This provides approximately the same > behaviour as prctl(PR_SET_PDEATHSIG, ...) on Linux. > > Thomas Munro > --- > lib/libc/sys/procctl.2 | 29 +++++++++++++++++++++++++++++ > sys/kern/kern_exec.c | 4 ++++ > sys/kern/kern_exit.c | 13 +++++++++++++ > sys/kern/kern_procctl.c | 33 ++++++++++++++++++++++++++++++++- > sys/kern/kern_thread.c | 4 ++-- > sys/sys/proc.h | 1 + > sys/sys/procctl.h | 2 ++ > 7 files changed, 83 insertions(+), 3 deletions(-) > > diff --git a/lib/libc/sys/procctl.2 b/lib/libc/sys/procctl.2 > index af96ae6e04c..e49ef36995f 100644 > --- a/lib/libc/sys/procctl.2 > +++ b/lib/libc/sys/procctl.2 > @@ -391,6 +391,24 @@ otherwise. > See the note about sysctl > .Dv kern.trap_enotcap > above, which gives independent global control of signal delivery. > +.It Dv PROC_SET_PDEATHSIG > +Request the delivery of a signal when the parent of the calling > +process exits. > +Use id type P_PID and id 0. Use .Fa id type .Dv P_PID and .Fa id 0. In fact, '0' is ok as an alias for the current pid, but not allowing current pid as the argument is strange. I can see why you only allow the process to set the parent signal only on itself, but also can see that it does not cost anything to allow arbitrary pid. But I do not see how would pfind(0) return curproc. Don't you need to add some code for this in kern_procctl() ? > +The value is cleared for the children > +of fork() and when executing set-user-ID or set-group-ID binaries. .Xr fork 2 > +.Fa arg > +must point to a value of type int indicating the signal > +that should be delivered to the caller. Use zero to cancel the Start new sentence from the new line. > +delivery of signals. > +.It Dv PROC_GET_PDEATHSIG > +Query the current signal number that will be delivered when the parent > +of the calling process exits. > +Use id type P_PID and id 0. Same markup. > +.Fa arg > +must point to a memory location that can hold a value of type int. > +If signal delivery has not been requested, it will contain zero > +on return. > .El > .Sh NOTES > Disabling tracing on a process should not be considered a security > @@ -487,6 +505,12 @@ parameter for the > or > .Dv PROC_TRAPCAP_CTL > request is invalid. > +.It Bq Er EINVAL > +The > +.Dv PROC_SET_PDEATHSIG > +or > +.Dv PROC_GET_PDEATHSIG > +request referenced an unsupported id, id_type or invalid signal number. > .El > .Sh SEE ALSO > .Xr dtrace 1 , > @@ -506,3 +530,8 @@ function appeared in > The reaper facility is based on a similar feature of Linux and > DragonflyBSD, and first appeared in > .Fx 10.2 . > +The > +.Dv PROC_SET_PDEATHSIG > +facility is based on the prctl(PR_SET_PDEATHSIG, ...) feature of Linux, > +and first appeared in > +.Fx 12.0 . > diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c > index 21b25e0148c..679915a6e6e 100644 > --- a/sys/kern/kern_exec.c > +++ b/sys/kern/kern_exec.c > @@ -522,6 +522,10 @@ do_execve(struct thread *td, struct image_args *args, struct mac *mac_p) > credential_changing |= will_transition; > #endif > > + /* Don't inherit PROC_SET_PDEATHSIG value if setuid/setgid. */ > + if (credential_changing) > + imgp->proc->p_pdeathsig = 0; > + What is the Linux behaviour on exec ? It seems of negative value to receive a signal which disposition is reset to default. In other words, for the non-suid image, you typicall get the process terminated and perhaps core dumped when parent is exiting. > if (credential_changing && > #ifdef CAPABILITY_MODE > ((oldcred->cr_flags & CRED_FLAG_CAPMODE) == 0) && > diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c > index f672a2c7358..55f08a004eb 100644 > --- a/sys/kern/kern_exit.c > +++ b/sys/kern/kern_exit.c > @@ -480,6 +480,12 @@ exit1(struct thread *td, int rval, int signo) > PROC_LOCK(q->p_reaper); > pksignal(q->p_reaper, SIGCHLD, ksi1); > PROC_UNLOCK(q->p_reaper); > + } else if (q->p_pdeathsig > 0) { > + /* > + * The child asked to received a signal > + * when we exit. > + */ > + kern_psignal(q, q->p_pdeathsig); Don't you need to lock q around kern_psignal() ? Test your patch with WITNESS and INVARIANTS kernel options enabled. > } > } else { > /* > @@ -520,6 +526,13 @@ exit1(struct thread *td, int rval, int signo) > */ > while ((q = LIST_FIRST(&p->p_orphans)) != NULL) { > PROC_LOCK(q); > + /* > + * If we are the real parent of this process > + * but it has been reparented to a debugger, then > + * check if it asked for a signal when we exit. > + */ > + if (q->p_pdeathsig > 0) > + kern_psignal(q, q->p_pdeathsig); > CTR2(KTR_PTRACE, "exit: pid %d, clearing orphan %d", p->p_pid, > q->p_pid); > clear_orphan(q); > diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c > index dddc6b9c1d3..c0adeed570e 100644 > --- a/sys/kern/kern_procctl.c > +++ b/sys/kern/kern_procctl.c > @@ -431,7 +431,7 @@ sys_procctl(struct thread *td, struct procctl_args *uap) > struct procctl_reaper_pids rp; > struct procctl_reaper_kill rk; > } x; > - int error, error1, flags; > + int error, error1, flags, signum; > > switch (uap->com) { > case PROC_SPROTECT: > @@ -467,6 +467,15 @@ sys_procctl(struct thread *td, struct procctl_args *uap) > case PROC_TRAPCAP_STATUS: > data = &flags; > break; > + case PROC_SET_PDEATHSIG: > + error = copyin(uap->data, &signum, sizeof(signum)); > + if (error != 0) > + return (error); > + data = &signum; > + break; > + case PROC_GET_PDEATHSIG: > + data = &signum; > + break; > default: > return (EINVAL); > } > @@ -486,6 +495,10 @@ sys_procctl(struct thread *td, struct procctl_args *uap) > if (error == 0) > error = copyout(&flags, uap->data, sizeof(flags)); > break; > + case PROC_GET_PDEATHSIG: > + if (error == 0) > + error = copyout(&signum, uap->data, sizeof(signum)); > + break; > } > return (error); > } > @@ -527,6 +540,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) > struct pgrp *pg; > struct proc *p; > int error, first_error, ok; > + int signum; > bool tree_locked; > > switch (com) { > @@ -560,6 +574,23 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) > case PROC_TRAPCAP_STATUS: > tree_locked = false; > break; > + case PROC_SET_PDEATHSIG: > + signum = *(int *)data; > + if (idtype != P_PID || id != 0 || > + (signum != 0 && !_SIG_VALID(signum))) > + return (EINVAL); > + PROC_LOCK(td->td_proc); > + td->td_proc->p_pdeathsig = signum; > + PROC_UNLOCK(td->td_proc); > + return (0); > + case PROC_GET_PDEATHSIG: > + if (idtype != P_PID || id != 0) > + return (EINVAL); > + PROC_LOCK(td->td_proc); > + signum = td->td_proc->p_pdeathsig; > + PROC_UNLOCK(td->td_proc); > + *(int *)data = signum; Why do you need to mediate assignment through the signum ? Note that the locking around the lone integer assignment is excessive, but it does not hurt. > + return (0); > default: > return (EINVAL); > } > diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c > index 4d2f020063e..b01cd99e597 100644 > --- a/sys/kern/kern_thread.c > +++ b/sys/kern/kern_thread.c > @@ -91,7 +91,7 @@ _Static_assert(offsetof(struct proc, p_pid) == 0xbc, > "struct proc KBI p_pid"); > _Static_assert(offsetof(struct proc, p_filemon) == 0x3d0, > "struct proc KBI p_filemon"); > -_Static_assert(offsetof(struct proc, p_comm) == 0x3e0, > +_Static_assert(offsetof(struct proc, p_comm) == 0x3e4, > "struct proc KBI p_comm"); > _Static_assert(offsetof(struct proc, p_emuldata) == 0x4b8, > "struct proc KBI p_emuldata"); > @@ -111,7 +111,7 @@ _Static_assert(offsetof(struct proc, p_pid) == 0x74, > "struct proc KBI p_pid"); > _Static_assert(offsetof(struct proc, p_filemon) == 0x27c, > "struct proc KBI p_filemon"); > -_Static_assert(offsetof(struct proc, p_comm) == 0x288, > +_Static_assert(offsetof(struct proc, p_comm) == 0x28c, > "struct proc KBI p_comm"); > _Static_assert(offsetof(struct proc, p_emuldata) == 0x314, > "struct proc KBI p_emuldata"); > diff --git a/sys/sys/proc.h b/sys/sys/proc.h > index d63362d7232..19e99439188 100644 > --- a/sys/sys/proc.h > +++ b/sys/sys/proc.h > @@ -624,6 +624,7 @@ struct proc { > u_int p_treeflag; /* (e) P_TREE flags */ > int p_pendingexits; /* (c) Count of pending thread exits. */ > struct filemon *p_filemon; /* (c) filemon-specific data. */ > + int p_pdeathsig; /* (c) Signal from parent on exit. */ > /* End area that is zeroed on creation. */ > #define p_endzero p_magic > > diff --git a/sys/sys/procctl.h b/sys/sys/procctl.h > index 17144e9f0c6..cf0a69475e4 100644 > --- a/sys/sys/procctl.h > +++ b/sys/sys/procctl.h > @@ -51,6 +51,8 @@ > #define PROC_TRACE_STATUS 8 /* query tracing status */ > #define PROC_TRAPCAP_CTL 9 /* trap capability errors */ > #define PROC_TRAPCAP_STATUS 10 /* query trap capability status */ > +#define PROC_SET_PDEATHSIG 11 /* set parent death signal */ > +#define PROC_GET_PDEATHSIG 12 /* get parent death signal */ Note that other commands names follow the pattern PROC__, in your case it would be PROC_PDEATHSIG_SET/GET. Please use tab after #define. > > /* Operations for PROC_SPROTECT (passed in integer arg). */ > #define PPROT_OP(x) ((x) & 0xf) > -- > 2.16.2 >