Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Aug 2002 04:58:17 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Xu <bsddiy@yahoo.com>
Cc:        "Andrey A. Chernov" <ache@nagual.pp.ru>, Julian Elischer <julian@elischer.org>, FreeBSD CURRENT <freebsd-current@FreeBSD.ORG>
Subject:   Re: cvs commit: src/sys/kern kern_sig.c (fwd)
Message-ID:  <20020813031722.T25520-100000@gamplex.bde.org>
In-Reply-To: <20020812055315.7682.qmail@web20907.mail.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 11 Aug 2002, David Xu wrote:

> following is patch for su, I can type "suspend" and stop $$ without the
> problem you described, I have tested it under tcsh and bash, all works
> for me.
>
> --- su.c	Mon Aug 12 13:08:01 2002
> +++ su.c.new	Mon Aug 12 13:16:14 2002
> @@ -329,10 +329,13 @@
>  	default:
>  		while ((ret_pid = waitpid(child_pid, &statusp, WUNTRACED)) != -1) {
>  			if (WIFSTOPPED(statusp)) {
> -				child_pgrp = tcgetpgrp(1);
>  				kill(getpid(), SIGSTOP);
> -				tcsetpgrp(1, child_pgrp);
> -				kill(child_pid, SIGCONT);
> +				child_pgrp = getpgid(child_pid);
> +				if (tcgetpgrp(1) == getpgrp())
> +				{
> +					tcsetpgrp(1, child_pgrp);
> +					kill(child_pid, SIGCONT);
> +				}
>  				statusp = 1;
>  				continue;
>  			}

Explanation of this patch:

(1) su has shot itself in the foot using PAM.  Normally the parent shell
    waits for children and handles them when they stop.  The extra process
    for PAM is now in between the parent shell and the su shell, so the
    parent shell can't do this directly.  The above code attempts to
    relay some aspects of job control back to the parent shell.  It is
    not clear that it can do this properly without duplicating lots of
    shell specific job control, but I think it can do this in principle.

    There are related problems for propagation of SIGHUP to indirect
    descendants of login shells when the shell exits.  Here there is
    at least there is an intermediate process that can relay the signals
    if necessary.  I think propagation of SIGHUP is automatic if the
    intermediate process doesn't exit first and it doesn't change its
    job control stuff too much, so the SIGHUP problem doesn't affect
    PAMmed applications.

(2) To relay SIGSTOP, the intermediate su just needs stop itself.  To
    relay SIGCONT, the intermediate su needs to switch to enough of
    its child's job control environment before starting the child.
    Switching only fd 1's process group seems to be sufficient, but
    it is not easy to determine even that and the broken version got
    it wrong.

    The child's environment is very shell-dependent.  Some of the following
    may depend on the initial shell being bash:
    (a) sh, csh and bash start a new process group (equal to their pid).
        zsh stays in the process group of the intermediate su process.
    (b) "kill -STOP $$ ... fg" worked in most (all?) cases because
	fd 1's pgrp is still the child's pgid when the child is killed
	in that way.  For zsh the child's pgid is the same as the
	intermediate shell's so the pgrps can't be different, and for
	the other shells I think the pgrp hasn't been changed because
	the child can'tcontrol it (SIGSTOP is uncatchable) and the
	kernel doesn't.  Later, switching fd 1's pgrp back to the
	child's pgid works except possbly for zsh because it is correct
	and different.

    (c) "suspend ... fg" failed for several reasons.  First, something
	(presumably the child) sets fd 1's pgrp to the intermediate
	su's pgid, so tcgetpgrp(1) gives a wrong pgrp for restoring
	later.  The patch fixes this by not getting the pgrp in this
	way.  It uses getpgid(child_pid) instead.  I think this works
	for at least normal shells.  Second when the pgrp is restored,
	something (presumably the shell above the intermediate su, or
	the kernel) has already switched fd 1's pgrp to child's pgid
	instead of to the intermediate su's pgid (despite the intermediate
	su's being correct at SIGSTOP time for suspend but not for
	kill -STOP).  Setting fd 1's pgrp to the value that it alread
	has is then fatal for reasons that I don't completely understand
	yet.  The patch avoids the problem by not doing apparently-null
	tcsetpgrp()'s.  Sending the SIGCONT seems to have no affect in
	this case, so I think shell above the su's has already started
	both the child su and the intermediate one and this isn't a
	problem until the su's get in each other's way.  Putting printfs
	in the above code seems to make the problem easier to debug by
	ensuring that they get in each other's way :-).

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020813031722.T25520-100000>