From owner-freebsd-standards@FreeBSD.ORG Fri Aug 3 08:30:13 2012 Return-Path: Delivered-To: freebsd-standards@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15CE9106564A for ; Fri, 3 Aug 2012 08:30:13 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E01FD8FC18 for ; Fri, 3 Aug 2012 08:30:12 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q738UCNF014828 for ; Fri, 3 Aug 2012 08:30:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q738UC3s014824; Fri, 3 Aug 2012 08:30:12 GMT (envelope-from gnats) Resent-Date: Fri, 3 Aug 2012 08:30:12 GMT Resent-Message-Id: <201208030830.q738UC3s014824@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-standards@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, "Jukka A. Ukkonen" Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 201641065672 for ; Fri, 3 Aug 2012 08:25:21 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 097948FC12 for ; Fri, 3 Aug 2012 08:25:21 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q738PKnO033113 for ; Fri, 3 Aug 2012 08:25:20 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id q738PKeF033109; Fri, 3 Aug 2012 08:25:20 GMT (envelope-from nobody) Message-Id: <201208030825.q738PKeF033109@red.freebsd.org> Date: Fri, 3 Aug 2012 08:25:20 GMT From: "Jukka A. Ukkonen" To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: standards/170346: Changes to support waitid() and related stuff X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Aug 2012 08:30:13 -0000 >Number: 170346 >Category: standards >Synopsis: Changes to support waitid() and related stuff >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-standards >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Fri Aug 03 08:30:12 UTC 2012 >Closed-Date: >Last-Modified: >Originator: Jukka A. Ukkonen >Release: FreeBSD 9.1-PRERELEASE >Organization: ----- >Environment: FreeBSD sleipnir 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Tue Jul 31 15:39:12 EEST 2012 root@sleipnir:/usr/obj/usr/src/sys/Sleipnir amd64 >Description: The attached patch adds waitid() to the C library. It also brings in a new system call wait6() to support the functionality needed by waitid(). The new wait6() is actually an extended version of wait4() with the first pid argument split into two separate arguments idtype and id, and a new siginfo_t pointer added to the end of the argument list. The new setup understands also two new options flags WEXITED and WTRAPPED. The older wait*() functions always behaved as if these two flags were implicitly set. That is still the case for the older wait*() entry points. For the new waitid() and wait6() at least one of WEXITED, WTRAPPED, WCONTINUED or WSTOPPED (a.k.a. WUNTRACED) must be set for the call to make sense. So, as a result now more detailed filtering of the processes to wait for is available in both options and using also other idtype flavours than the old PID or PGID. Previously the treatment of WNOWAIT was faulty because it avoided removing the zombies just fine, but it removed any signal state. Now the same signal state can also be waited again. This patch also quite intentionally removes the restriction on getting the rusage data only for zombies. Sometimes intermediate rusage snapshots for stopped processes might exactly what is needed. Linux does not set any explicit limitations for getting rusage snapshots. Solaris makes the reservation that only the time fields are useful. Anyhow the traditional interpretation seemed unreasonably limiting. Inside the kernel the old kern_wait() is still used anywhere where it was before my changes. Now it is a wrapper for a new kern_wait6() and implicitly sets option flags WEXITED and WTRAPPED, converts the old wpid to either idtype = P_PID or idtype = P_PGID, and passes NULL as the siginfo_t pointer to kern_wait6(). If you decide to try this patch, please, remember to run the following command ( cd /usr/src/sys/kern ; make sysent ) before attempting the build. This patch changes the system call vector, and without the command above your build is guaranteed to fail. I also wish to remind that this implementation exceeds standard requirements and supports multiple idtype alternatives which are not really required by any standard. OTOH for PID and PGID only there would be no actual need for a separate idtype argument either. Consider support for the alternate idtypes as enabling technology only for special purposes keeping in mind that using non-standard features may cause portability problems. You can use P_UID and P_GID for selecting target processes based on their effective UID or effective GID. This might be handy when a parent process starts SUID or SGID binaries. When a child might start a session of its own, waiting using P_SID could be useful. (The SID can only be equal to child's own PID or equal to the parent's SID.) P_ZONEID (Solaris terminology) tries to facilitate waiting for child processes started in a certain jail. Waiting on a certain CPU set ID (P_PSETID) might also be sometimes handy, but for the moment that info lives only in the thread structures which seem to be dropped before a terminated process becomes waitable. Similarly the scheduling priority class (P_CID) might sometimes be a useful tool for filtering the processes to wait. It seems quite plausible that a parent process might only wish to know e.g. about real-time processes which have stopped. As the CPU set ID also the scheduling class info lives for the time being only inside the threads which causes it to be lost before it is possible to wait for the zombie. >How-To-Repeat: No problem! Just extended functionality and improved support for standards. >Fix: Apply the attached patch. If you decide to try this patch, please, remember to run the following command ( cd /usr/src/sys/kern ; make sysent ) before attempting the build. This patch changes the system call vector, and without the command above your build is guaranteed to fail. Patch attached with submission follows: --- sys/sys/wait.h.orig 2011-09-23 03:51:37.000000000 +0300 +++ sys/sys/wait.h 2012-07-31 10:29:42.000000000 +0300 @@ -80,6 +80,8 @@ #define WSTOPPED WUNTRACED /* SUS compatibility */ #define WCONTINUED 4 /* Report a job control continued process. */ #define WNOWAIT 8 /* Poll only. Don't delete the proc entry. */ +#define WEXITED 16 /* Wait for exited processes. (SUS) */ +#define WTRAPPED 32 /* Wait for a process to hit a trap or a breakpoint. (Solaris) */ #if __BSD_VISIBLE #define WLINUXCLONE 0x80000000 /* Wait for kthread spawned from linux_clone. */ @@ -95,14 +97,17 @@ #ifndef _KERNEL #include +#include __BEGIN_DECLS pid_t wait(int *); pid_t waitpid(pid_t, int *, int); +int waitid(idtype_t, id_t, siginfo_t *, int); #if __BSD_VISIBLE struct rusage; pid_t wait3(int *, int, struct rusage *); pid_t wait4(pid_t, int *, int, struct rusage *); +pid_t wait6(idtype_t, id_t, int *, int, struct rusage *, siginfo_t *); #endif __END_DECLS #endif /* !_KERNEL */ --- sys/sys/syscallsubr.h.orig 2012-01-06 21:29:16.000000000 +0200 +++ sys/sys/syscallsubr.h 2012-07-31 10:29:42.000000000 +0300 @@ -233,6 +233,8 @@ enum uio_seg pathseg, struct timeval *tptr, enum uio_seg tptrseg); int kern_wait(struct thread *td, pid_t pid, int *status, int options, struct rusage *rup); +int kern_wait6(struct thread *td, idtype_t idtype, id_t id, int *status, + int options, struct rusage *rup, siginfo_t *sip); int kern_writev(struct thread *td, int fd, struct uio *auio); int kern_socketpair(struct thread *td, int domain, int type, int protocol, int *rsv); --- sys/bsm/audit_kevents.h.orig 2011-09-23 03:51:37.000000000 +0300 +++ sys/bsm/audit_kevents.h 2012-07-31 10:29:42.000000000 +0300 @@ -602,6 +602,7 @@ #define AUE_PDKILL 43198 /* FreeBSD. */ #define AUE_PDGETPID 43199 /* FreeBSD. */ #define AUE_PDWAIT 43200 /* FreeBSD. */ +#define AUE_WAIT6 43201 /* FreeBSD. */ /* * Darwin BSM uses a number of AUE_O_* definitions, which are aliased to the --- sys/kern/syscalls.master.orig 2012-01-06 21:29:16.000000000 +0200 +++ sys/kern/syscalls.master 2012-07-31 10:29:42.000000000 +0300 @@ -72,7 +72,7 @@ 6 AUE_CLOSE STD { int close(int fd); } 7 AUE_WAIT4 STD { int wait4(int pid, int *status, \ int options, struct rusage *rusage); } \ - wait4 wait_args int + wait4 wait4_args int 8 AUE_CREAT COMPAT { int creat(char *path, int mode); } 9 AUE_LINK STD { int link(char *path, char *link); } 10 AUE_UNLINK STD { int unlink(char *path); } @@ -368,7 +368,11 @@ 190 AUE_LSTAT STD { int lstat(char *path, struct stat *ub); } 191 AUE_PATHCONF STD { int pathconf(char *path, int name); } 192 AUE_FPATHCONF STD { int fpathconf(int fd, int name); } -193 AUE_NULL UNIMPL nosys +193 AUE_WAIT6 STD { int wait6(int idtype, int pid, \ + int *status, int options, \ + struct rusage *rusage, \ + siginfo_t *info); } \ + wait6 wait6_args int 194 AUE_GETRLIMIT STD { int getrlimit(u_int which, \ struct rlimit *rlp); } getrlimit \ __getrlimit_args int --- lib/libc/include/namespace.h.orig 2011-09-23 03:51:37.000000000 +0300 +++ lib/libc/include/namespace.h 2012-07-31 10:29:42.000000000 +0300 @@ -229,6 +229,7 @@ #define socketpair _socketpair #define usleep _usleep #define wait4 _wait4 +#define wait6 _wait6 #define waitpid _waitpid #define write _write #define writev _writev --- lib/libc/include/un-namespace.h.orig 2011-09-23 03:51:37.000000000 +0300 +++ lib/libc/include/un-namespace.h 2012-07-31 10:29:42.000000000 +0300 @@ -210,6 +210,7 @@ #undef socketpair #undef usleep #undef wait4 +#undef wait6 #undef waitpid #undef write #undef writev --- lib/libc/gen/Makefile.inc.orig 2012-03-05 13:43:27.000000000 +0200 +++ lib/libc/gen/Makefile.inc 2012-07-31 10:29:42.000000000 +0300 @@ -34,7 +34,7 @@ syslog.c telldir.c termios.c time.c times.c timezone.c tls.c \ ttyname.c ttyslot.c ualarm.c ulimit.c uname.c unvis.c \ usleep.c utime.c utxdb.c valloc.c vis.c wait.c wait3.c waitpid.c \ - wordexp.c + waitid.c wordexp.c CANCELPOINTS_SRCS=sem.c sem_new.c .for src in ${CANCELPOINTS_SRCS} --- sys/cddl/contrib/opensolaris/uts/common/sys/procset.h.orig 2008-03-29 00:16:13.000000000 +0200 +++ sys/cddl/contrib/opensolaris/uts/common/sys/procset.h 2012-07-31 10:29:42.000000000 +0300 @@ -51,6 +51,7 @@ #define P_INITUID 0 #define P_INITPGID 0 +#ifndef _IDTYPE_T_DECLARED /* * The following defines the values for an identifier type. It @@ -79,8 +80,12 @@ P_CTID, /* A (process) contract identifier. */ P_CPUID, /* CPU identifier. */ P_PSETID /* Processor set identifier */ + } idtype_t; +#define _IDTYPE_T_DECLARED + +#endif /* * The following defines the operations which can be performed to --- sys/sys/proc.h.orig 2012-07-03 11:40:20.000000000 +0300 +++ sys/sys/proc.h 2012-07-31 10:29:42.000000000 +0300 @@ -879,8 +879,7 @@ void procinit(void); void proc_linkup0(struct proc *p, struct thread *td); void proc_linkup(struct proc *p, struct thread *td); -void proc_reap(struct thread *td, struct proc *p, int *status, int options, - struct rusage *rusage); +void proc_reap(struct thread *td, struct proc *p, int *status, int options); void proc_reparent(struct proc *child, struct proc *newparent); struct pstats *pstats_alloc(void); void pstats_fork(struct pstats *src, struct pstats *dst); --- lib/libc/gen/Symbol.map.orig 2012-02-21 23:18:59.000000000 +0200 +++ lib/libc/gen/Symbol.map 2012-07-31 10:29:42.000000000 +0300 @@ -384,6 +384,7 @@ fdlopen; __FreeBSD_libc_enter_restricted_mode; getcontextx; + waitid; }; FBSDprivate_1.0 { --- sys/sys/types.h.orig 2012-01-02 18:14:52.000000000 +0200 +++ sys/sys/types.h 2012-07-31 10:29:42.000000000 +0300 @@ -142,6 +142,45 @@ #define _ID_T_DECLARED #endif +#ifndef _IDTYPE_T_DECLARED + +typedef enum +#if !defined(_XPG4_2) || defined(__EXTENSIONS__) + idtype /* pollutes XPG4.2 namespace */ +#endif + { + /* + * These names were mostly lifted from Solaris source code + * and still use Solaris style naming to avoid breaking any + * OpenSolaris code which has been ported to FreeBSD. + * There is no clear FreeBSD counterpart for all of the names. + * OTOH some have a clear correspondence to FreeBSD entities. + */ + + P_PID, /* A process identifier. */ + P_PPID, /* A parent process identifier. */ + P_PGID, /* A process group (job control group) */ + /* identifier. */ + P_SID, /* A session identifier. */ + P_CID, /* A scheduling class identifier. */ + P_UID, /* A user identifier. */ + P_GID, /* A group identifier. */ + P_ALL, /* All processes. */ + P_LWPID, /* An LWP identifier. */ + P_TASKID, /* A task identifier. */ + P_PROJID, /* A project identifier. */ + P_POOLID, /* A pool identifier. */ + P_ZONEID, /* A zone identifier. */ + P_CTID, /* A (process) contract identifier. */ + P_CPUID, /* CPU identifier. */ + P_PSETID /* Processor set identifier */ + +} idtype_t; /* The type of id_t we are using. */ + +#define _IDTYPE_T_DECLARED +#endif + + #ifndef _INO_T_DECLARED typedef __ino_t ino_t; /* inode number */ #define _INO_T_DECLARED --- lib/libc/sys/wait.2.orig 2011-09-23 03:51:37.000000000 +0300 +++ lib/libc/sys/wait.2 2012-07-31 16:45:18.000000000 +0300 @@ -34,9 +34,11 @@ .Sh NAME .Nm wait , .Nm waitpid , +.Nm waitid , +.Nm wait3 , .Nm wait4 , -.Nm wait3 -.Nd wait for process termination +.Nm wait6 +.Nd wait for processes to change status .Sh LIBRARY .Lb libc .Sh SYNOPSIS @@ -46,12 +48,17 @@ .Fn wait "int *status" .Ft pid_t .Fn waitpid "pid_t wpid" "int *status" "int options" +.In sys/signal.h +.Ft int +.Fn waitid "idtype_t idtype" "id_t id" "siginfo_t *info" "int options" .In sys/time.h .In sys/resource.h .Ft pid_t .Fn wait3 "int *status" "int options" "struct rusage *rusage" .Ft pid_t .Fn wait4 "pid_t wpid" "int *status" "int options" "struct rusage *rusage" +.Ft pid_t +.Fn wait6 "idtype_t idtype" "id_t id" "int *status" "int options" "struct rusage *rusage" "siginfo_t *infop" .Sh DESCRIPTION The .Fn wait @@ -89,25 +96,190 @@ The other wait functions are implemented using .Fn wait4 . .Pp +The broadest interface of all functions in this family is +.Fn wait6 +which is otherwise very much like +.Fn wait4 +but with a few very important distinctions. +.br +It will not wait for existed processes unless the option flag +.Dv WEXITED +is explicitly specified. +This allows for waiting for processes which have experienced other +status changes without having to handle also the exit status from +the terminated processes. +Another important difference is the additional fifth argument +which must be either +.Dv NULL +or a pointer to a +.Fa siginfo_t +structure. +Additionally the old +.Fq pid_t +argument has been split into two separate +.Fa idtype_t +and +.Fa id_t . +.br +Allowing for the distinction in how the +PID or PGID +is passed to the routine, calling +.Fn wait6 +with the bits +.Dv WEXITED +and +.Dv WTRAPPED +set in the +.Fa options +and with +.Fa infop +set to +.Dv NULL , +is still functionally equivalent to calling +.Fn wait4 . +The separation of +.Fa idtype +and +.Fa id +arguments has the benefit, though, that many other types of +IDs can be supported as well in addition to PID and PGID. +.sp +Notice that +.Fn wait6 +is not required by any standard nor is it common in other +operating system. +It is simply a generalized API to support in one function call +interface any and all of the functionality available through +any of the other +.Fn wait* +functions. +Do not use it unless you fully accept the implied +limitations to the portability of your code. +.Pp The +.Fa idtype +and +.Fa id +arguments specify which processes +.Fn waitid +and +.Fn wait6 +shall wait for. +.Bl -tag -width Ds +.It Dv + +If +.Fa idtype +is +.Dv P_PID , +.Fn waitid +and +.Fn wait6 +wait for the child process with a process ID equal to +.Dv (pid_t)id . +.It Dv + +If +.Fa idtype +is +.Dv P_PGID , +.Fn waitid +and +.Fn wait6 +wait for the child process with a process group ID equal to +.Dv (pid_t)id . +.It Dv + +If +.Fa idtype +is +.Dv P_ALL , +.Fn waitid +and +.Fn wait6 +wait for any child process and the +.Dv id +is ignored. +.It Dv + +If +.Fa idtype +is +.Dv P_PID +or +.Dv P_PGID +and the +.Dv id +is zero, +.Fn waitid +and +.Fn wait6 +wait for any child process in the same process group as the caller. +.It Dv + +While no standard actually requires such functionality, +this implementation supports also other types of IDs to wait. +.br +Notice anyhow that using any of these non-standard features will +most likely seriously degrade the portability of your code. +Consider such use only as enabling technology for new creative +experimentation locked into its original environment. +.br +Use +.Fa idtype +value +.Dv P_UID +to filter processes based on their effective UID, +.Dv P_GID +to filter processes based on their effective GID. +.br +.Dv P_SID +could be used to filter based on the session ID. +In case the child process started its own new session, +SID will be the same as its own PID. +Otherwise the SID of a child process will match the caller's SID. +.br +.Dv P_ZONEID +facilitates waiting for processes within a certain jail. +.br +There could be still more meaningful ID types to wait for +like +.Dv P_PSETID +for processes restricted to a certain set of CPUs, +.Dv P_CID +to wait for processes in a certain scheduling class or +.Dv P_CPUID +to wait for processes nailed to a certain CPU. +These three +have not been implemented at the time of this writing, +because the data stored in the thread structures seems +to be zeroed when a process terminates before the parent +gets to wait for the zombie. +They are mentioned here as potentially useful extensions. +.El +.Pp +For all the other +.Fn wait* +variants the .Fa wpid argument specifies the set of child processes for which to wait. +.Bl -tag -width Ds +.It Dv + If .Fa wpid is -1, the call waits for any child process. +.It Dv + If .Fa wpid is 0, the call waits for any child process in the process group of the caller. +.It Dv + If .Fa wpid is greater than zero, the call waits for the process with process id .Fa wpid . +.It Dv + If .Fa wpid is less than -1, the call waits for any process whose process group id equals the absolute value of .Fa wpid . +.El .Pp The .Fa status @@ -116,41 +288,106 @@ The .Fa options argument contains the bitwise OR of any of the following options. -The -.Dv WCONTINUED -option indicates that children of the current process that +.Bl -tag -width Ds +.It Dv WCONTINUED +indicates that children of the current process that have continued from a job control stop, by receiving a .Dv SIGCONT signal, should also have their status reported. -The -.Dv WNOHANG -option -is used to indicate that the call should not block if -there are no processes that wish to report status. -If the -.Dv WUNTRACED -option is set, -children of the current process that are stopped +.It Dv WNOHANG +is used to indicate that the call should not block when +there are no processes wishing to report status. +.It Dv WUNTRACED +indicates that children of the current process which are stopped due to a .Dv SIGTTIN , SIGTTOU , SIGTSTP , or .Dv SIGSTOP -signal also have their status reported. -The -.Dv WSTOPPED -option is an alias for +signal shall have their status reported. +.It Dv WSTOPPED +is an alias for .Dv WUNTRACED . -The -.Dv WNOWAIT -option keeps the process whose status is returned in a waitable state. +.It Dv WTRAPPED +allows waiting for processes which have trapped or reached a breakpoint. +.It Dv WEXITED +indicates that the caller is wants to receive status reports from +terminated processes. +.br +This bit is implicitly set for the older functions +.Fn wait , +.Fn waitpid , +.Fn wait3 , +and +.Fn wait4 +to avoid changing their traditional functionality. +.br +For the more recent new APIs +.Fn waitid +and +.Fn wait6 +this bit has to be explicitly included in the +.Fa options , +if status reports from terminated processes are expected. +.br +This has the benefit that while using the latter two APIs +it is possible to request status reports only for processes +which have expereinced some other status change, but which +have not terminated. +So, it is possible to avoid receiving reports for terminated +processes, in those parts of a program which are not able +to properly handle zombies and delay zombie processing to +other parts which can handle them properly. +.It Dv WNOWAIT +keeps the process whose status is returned in a waitable state. The process may be waited for again after this call completes. +.El +.sp +For the more recent APIs +.Fn waitid +and +.Fn wait6 +at least one of the options +.Dv WEXITED , +.Dv WUNTRACED , +.Dv WSTOPPED , +.Dv WTRAPPED , +or +.Dv WCONTINUED +must be specified. Otherwise there will be no data for the call to +return. +To avoid hanging indefinitely in such a case these functions currently +behave as if WNOHANG had been specified. .Pp If .Fa rusage is non-zero, a summary of the resources used by the terminated process and all its -children is returned (this information is currently not available -for stopped or continued processes). +children is returned. +.Pp +If +.Fa infop +is non-null, it must point to a +.Dv siginfo_t +structure which will be filled such that the +.Dv si_signo +field will always be +.Dv SIGCHLD +and the field +.Dv si_pid +will be non-zero, if there is a status change to report. +If there are no status changes to report and WNOHANG is applied, +both of these fields will be zero. +.br +When using the +.Fn waitid +API with the +.Dv WNOHANG +option set checking these fields is the only way to know whether +there were any status changes to report, because the return value +from +.Fn waitid +will be zero as it is for any successful return from +.Fn waitid . .Pp When the .Dv WNOHANG @@ -306,6 +543,18 @@ is returned and .Va errno is set to indicate the error. +.Pp +If +.Fn waitid +returns because one or more processes have a state change to report, +0 is returned. +To indicate an error, -1 will be returned and +.Dv errno +set to an appropriate value. +If +.Dv WNOHANG +was used, 0 can be returned indicating no error, but no processes +may have changed state either, if si_signo and/or si_pid are zero. .Sh ERRORS The .Fn wait @@ -335,6 +584,14 @@ or the signal did not have the .Dv SA_RESTART flag set. +.It Bq Er EINVAL +An invalid value as specified for +.Fa options , +or +.Fa idtype +and +.Fa id +do not specify a valid set of processes. .El .Sh SEE ALSO .Xr _exit 2 , --- sys/kern/kern_exit.c.orig 2012-04-05 13:33:39.000000000 +0300 +++ sys/kern/kern_exit.c 2012-07-31 16:39:30.000000000 +0300 @@ -674,6 +674,7 @@ int error, status; error = kern_wait(td, WAIT_ANY, &status, 0, NULL); + if (error == 0) td->td_retval[1] = status; return (error); @@ -684,7 +685,7 @@ * The dirty work is handled by kern_wait(). */ int -sys_wait4(struct thread *td, struct wait_args *uap) +sys_wait4(struct thread *td, struct wait4_args *uap) { struct rusage ru, *rup; int error, status; @@ -693,11 +694,63 @@ rup = &ru; else rup = NULL; + error = kern_wait(td, uap->pid, &status, uap->options, rup); + + if (uap->status != NULL && error == 0) + error = copyout(&status, uap->status, sizeof(status)); + if (uap->rusage != NULL && error == 0) + error = copyout(&ru, uap->rusage, sizeof(struct rusage)); + return (error); +} + +int +sys_wait6(struct thread *td, struct wait6_args *uap) +{ + struct rusage ru, *rup; + siginfo_t si, *sip; + int error, status; + pid_t pid; + idtype_t idtype; + id_t id; + + pid = uap->pid; + + if (pid == WAIT_ANY) { + idtype = P_ALL; + id = 0; + } + else if (pid <= 0) { + idtype = P_PGID; + id = (id_t) -pid; + } + else { + idtype = P_PID; + id = (id_t) pid; + } + + if (uap->rusage != NULL) + rup = &ru; + else + rup = NULL; + + if (uap->info != NULL) + sip = &si; + else + sip = NULL; + + /* + * We expect all callers of wait6() + * to know about WEXITED & WTRAPPED! + */ + error = kern_wait6(td, idtype, id, &status, uap->options, rup, sip); + if (uap->status != NULL && error == 0) error = copyout(&status, uap->status, sizeof(status)); if (uap->rusage != NULL && error == 0) error = copyout(&ru, uap->rusage, sizeof(struct rusage)); + if (uap->info != NULL && error == 0) + error = copyout(&si, uap->info, sizeof(siginfo_t)); return (error); } @@ -707,8 +760,7 @@ * lock as part of its work. */ void -proc_reap(struct thread *td, struct proc *p, int *status, int options, - struct rusage *rusage) +proc_reap(struct thread *td, struct proc *p, int *status, int options) { struct proc *q, *t; @@ -718,10 +770,7 @@ KASSERT(p->p_state == PRS_ZOMBIE, ("proc_reap: !PRS_ZOMBIE")); q = td->td_proc; - if (rusage) { - *rusage = p->p_ru; - calcru(p, &rusage->ru_utime, &rusage->ru_stime); - } + PROC_SUNLOCK(p); td->td_retval[0] = p->p_pid; if (status) @@ -834,8 +883,10 @@ } static int -proc_to_reap(struct thread *td, struct proc *p, pid_t pid, int *status, - int options, struct rusage *rusage) +proc_to_reap(struct thread *td, struct proc *p, + idtype_t idtype, id_t id, + int *status, int options, + struct rusage *rusage, siginfo_t *siginfo) { struct proc *q; @@ -843,15 +894,121 @@ q = td->td_proc; PROC_LOCK(p); - if (pid != WAIT_ANY && p->p_pid != pid && p->p_pgid != -pid) { + + switch (idtype) { + case P_ALL: + break; + + case P_PID: + if (p->p_pid != (pid_t) id) { + PROC_UNLOCK(p); + return (0); + } + break; + + case P_PGID: + if (p->p_pgid != (pid_t) id) { + PROC_UNLOCK(p); + return (0); + } + break; + + case P_SID: + if (p->p_session->s_sid != (pid_t) id) { + PROC_UNLOCK(p); + return (0); + } + break; + + case P_UID: + if (p->p_ucred->cr_uid != (uid_t) id) { + PROC_UNLOCK(p); + return (0); + } + break; + + case P_GID: + if (p->p_ucred->cr_gid != (gid_t) id) { + PROC_UNLOCK(p); + return (0); + } + break; + + case P_ZONEID: /* jail */ + if (! p->p_ucred->cr_prison || + (p->p_ucred->cr_prison->pr_id != (int) id)) { + PROC_UNLOCK(p); + return (0); + } + break; + +#if 0 + /* + * It seems that the first thread structure gets zeroed out + * at process exit. + * This makes toast of all useful info related to CPU set and + * scheduling priority class. + */ + + case P_PSETID: + { + struct thread *td1; + + td1 = FIRST_THREAD_IN_PROC(p); + if (td1->td_cpuset->cs_id != (cpusetid_t) id) { + PROC_UNLOCK(p); + return (0); + } + } + break; + + case P_CID: + { + struct thread *td1; + + td1 = FIRST_THREAD_IN_PROC(p); + if (td1->td_pri_class != (unsigned) id) { + PROC_UNLOCK(p); + return (0); + } + } + break; + + + /* + * Is there a good place for this? + * Supposedly also zeroed before it can be used, right? + */ + + case P_CPUID: + { + struct thread *td1; + + td1 = FIRST_THREAD_IN_PROC(p); + if (td1->td_lastcpu != (unsigned) id) { + PROC_UNLOCK(p); + return (0); + } + } + break; +#endif + + default: PROC_UNLOCK(p); return (0); + break; } + if (p_canwait(td, p)) { PROC_UNLOCK(p); return (0); } + if (((options & WEXITED) == 0) && (p->p_state == PRS_ZOMBIE)) { + PROC_UNLOCK(p); + return (0); + } + /* * This special case handles a kthread spawned by linux_clone * (see linux_misc.c). The linux_wait4 and linux_waitpid @@ -867,8 +1024,57 @@ } PROC_SLOCK(p); + + /* New siginfo stuff... */ + + if (siginfo) { + bzero (siginfo, sizeof (*siginfo)); + siginfo->si_signo = SIGCHLD; + siginfo->si_errno = 0; + + /* + * Right, this is still a rough estimate. + * We will fix the cases TRAPPED, STOPPED, + * and CONTINUED later. + */ + + if (WCOREDUMP(p->p_xstat)) + siginfo->si_code = CLD_DUMPED; + else if (WIFSIGNALED(p->p_xstat)) + siginfo->si_code = CLD_KILLED; + else + siginfo->si_code = CLD_EXITED; + + siginfo->si_pid = p->p_pid; + siginfo->si_uid = p->p_ucred->cr_uid; + siginfo->si_status = p->p_xstat; + + /* + * The si_addr field would be useful + * additional detail, but apparently + * the PC value may be lost when we + * reach this point. + */ + siginfo->si_addr = NULL; /* XXX */ + } + + /* + * There should be no reason to limit resources usage info + * to exited processes only. + * A snapshot about any resources used by a stopped process + * may be exactly what is needed. + * (1) Solaris limits available info to times only. + * (2) Linux does not declare any limitations. + * (3) Now we within the same PROC_SLOCK anyway. + */ + + if (rusage) { + *rusage = p->p_ru; + calcru(p, &rusage->ru_utime, &rusage->ru_stime); + } + if (p->p_state == PRS_ZOMBIE) { - proc_reap(td, p, status, options, rusage); + proc_reap(td, p, status, options); return (-1); } PROC_SUNLOCK(p); @@ -877,24 +1083,75 @@ } int -kern_wait(struct thread *td, pid_t pid, int *status, int options, - struct rusage *rusage) +kern_wait(struct thread *td, pid_t pid, + int *status, int options, struct rusage *rusage) +{ + idtype_t idtype; + id_t id; + + if (pid == WAIT_ANY) { + idtype = P_ALL; + id = 0; + } + else if (pid <= 0) { + idtype = P_PGID; + id = (id_t) -pid; + } + else { + idtype = P_PID; + id = (id_t) pid; + } + + /* + * For backward compatibility we implicitly add + * flags WEXITED & WTRAPPED here. + */ + + options |= (WEXITED | WTRAPPED); + + return (kern_wait6 (td, idtype, id, status, options, rusage, NULL)); +} + +int +kern_wait6(struct thread *td, idtype_t idtype, id_t id, + int *status, int options, + struct rusage *rusage, siginfo_t *siginfo) { struct proc *p, *q; int error, nfound, ret; - AUDIT_ARG_PID(pid); +#if 0 + AUDIT_ARG_VALUE((int) idtype); /* XXX - This is likely wrong! */ +#endif + AUDIT_ARG_PID((pid_t) id); /* XXX - This may be wrong! */ AUDIT_ARG_VALUE(options); q = td->td_proc; - if (pid == 0) { + + if (((pid_t) id == WAIT_MYPGRP) && + ((idtype == P_PID) || (idtype == P_PGID))) { PROC_LOCK(q); - pid = -q->p_pgid; + id = (id_t) q->p_pgid; PROC_UNLOCK(q); + idtype = P_PGID; } + /* If we don't know the option, just return. */ - if (options & ~(WUNTRACED|WNOHANG|WCONTINUED|WNOWAIT|WLINUXCLONE)) + if (options & ~(WUNTRACED|WNOHANG|WCONTINUED|WNOWAIT|WEXITED|WTRAPPED|WLINUXCLONE)) return (EINVAL); + + if ((options & (WEXITED|WUNTRACED|WCONTINUED|WTRAPPED)) == 0) { + /* + * We will be unable to find any matching processes. + * Simply behave as WHOHANG were specified, because + * waiting for real will not help. + */ + if (siginfo) + bzero (siginfo, sizeof (*siginfo)); + td->td_retval[0] = 0; + return (0); + } + loop: if (q->p_flag & P_STATCHILD) { PROC_LOCK(q); @@ -904,7 +1161,8 @@ nfound = 0; sx_xlock(&proctree_lock); LIST_FOREACH(p, &q->p_children, p_sibling) { - ret = proc_to_reap(td, p, pid, status, options, rusage); + ret = proc_to_reap(td, p, idtype, id, + status, options, rusage, siginfo); if (ret == 0) continue; else if (ret == 1) @@ -914,20 +1172,65 @@ PROC_LOCK(p); PROC_SLOCK(p); - if ((p->p_flag & P_STOPPED_SIG) && + + if ((options & WTRAPPED) && + (p->p_flag & P_TRACED) && + (p->p_flag & (P_STOPPED_TRACE | P_STOPPED_SIG)) && (p->p_suspcount == p->p_numthreads) && - (p->p_flag & P_WAITED) == 0 && - (p->p_flag & P_TRACED || options & WUNTRACED)) { + ((p->p_flag & P_WAITED) == 0)) { PROC_SUNLOCK(p); - p->p_flag |= P_WAITED; + + if ((options & WNOWAIT) == 0) + p->p_flag |= P_WAITED; + sx_xunlock(&proctree_lock); td->td_retval[0] = p->p_pid; + if (status) *status = W_STOPCODE(p->p_xstat); - PROC_LOCK(q); - sigqueue_take(p->p_ksi); - PROC_UNLOCK(q); + if (siginfo) { + siginfo->si_status = W_STOPCODE(p->p_xstat); + siginfo->si_code = CLD_TRAPPED; + } + + if ((options & WNOWAIT) == 0) { + PROC_LOCK(q); + sigqueue_take(p->p_ksi); + PROC_UNLOCK(q); + } + + PROC_UNLOCK(p); + + return (0); + } + + if ((options & WUNTRACED) && + (p->p_flag & P_STOPPED_SIG) && + (p->p_suspcount == p->p_numthreads) && + ((p->p_flag & P_WAITED) == 0)) { + PROC_SUNLOCK(p); + + if ((options & WNOWAIT) == 0) + p->p_flag |= P_WAITED; + + sx_xunlock(&proctree_lock); + td->td_retval[0] = p->p_pid; + + if (status) + *status = W_STOPCODE(p->p_xstat); + + if (siginfo) { + siginfo->si_status = W_STOPCODE(p->p_xstat); + siginfo->si_code = CLD_STOPPED; + } + + if ((options & WNOWAIT) == 0) { + PROC_LOCK(q); + sigqueue_take(p->p_ksi); + PROC_UNLOCK(q); + } + PROC_UNLOCK(p); return (0); @@ -936,15 +1239,25 @@ if (options & WCONTINUED && (p->p_flag & P_CONTINUED)) { sx_xunlock(&proctree_lock); td->td_retval[0] = p->p_pid; - p->p_flag &= ~P_CONTINUED; - PROC_LOCK(q); - sigqueue_take(p->p_ksi); - PROC_UNLOCK(q); + if ((options & WNOWAIT) == 0) { + p->p_flag &= ~P_CONTINUED; + + PROC_LOCK(q); + sigqueue_take(p->p_ksi); + PROC_UNLOCK(q); + } + PROC_UNLOCK(p); if (status) *status = SIGCONT; + + if (siginfo) { + siginfo->si_status = SIGCONT; + siginfo->si_code = CLD_CONTINUED; + } + return (0); } PROC_UNLOCK(p); @@ -963,7 +1276,8 @@ * to successfully wait until the child becomes a zombie. */ LIST_FOREACH(p, &q->p_orphans, p_orphan) { - ret = proc_to_reap(td, p, pid, status, options, rusage); + ret = proc_to_reap(td, p, idtype, id, + status, options, rusage, siginfo); if (ret == 0) continue; else if (ret == 1) @@ -977,6 +1291,8 @@ } if (options & WNOHANG) { sx_xunlock(&proctree_lock); + if (siginfo) + bzero (siginfo, sizeof (*siginfo)); td->td_retval[0] = 0; return (0); } --- lib/libc/sys/Makefile.inc.orig 2012-01-06 21:29:16.000000000 +0200 +++ lib/libc/sys/Makefile.inc 2012-07-31 10:29:42.000000000 +0300 @@ -210,5 +210,5 @@ MLINKS+=truncate.2 ftruncate.2 MLINKS+=unlink.2 unlinkat.2 MLINKS+=utimes.2 futimes.2 utimes.2 futimesat.2 utimes.2 lutimes.2 -MLINKS+=wait.2 wait3.2 wait.2 wait4.2 wait.2 waitpid.2 +MLINKS+=wait.2 wait3.2 wait.2 wait4.2 wait.2 waitpid.2 wait.2 waitid.2 wait.2 wait6.2 MLINKS+=write.2 pwrite.2 write.2 pwritev.2 write.2 writev.2 >Release-Note: >Audit-Trail: >Unformatted: