From owner-freebsd-arch  Fri Oct 11  5:53:38 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A442F37B404; Fri, 11 Oct 2002 05:53:36 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 2539A43E91; Fri, 11 Oct 2002 05:53:34 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.12.4/8.12.4) with SMTP id g9BCr0Oo049306;
	Fri, 11 Oct 2002 08:53:00 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Fri, 11 Oct 2002 08:52:59 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: Juli Mallett <jmallett@FreeBSD.org>
Cc: Don Lewis <dl-freebsd@catspoiler.org>, wollman@lcs.mit.edu,
	arch@FreeBSD.org
Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]]
In-Reply-To: <20021011053720.A2431@FreeBSD.org>
Message-ID: <Pine.NEB.3.96L.1021011083816.42071C-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG


On Fri, 11 Oct 2002, Juli Mallett wrote:

> > Solaris returns an EAGAIN to the caller and the target is unaffected. If
> > the caller really wants to nuke the target, it could retry with kill().
> > The same error will be returned if there are too many signals in the
> > target's queue, which should prevent the signal queue for a wedged
> > process from consuming all of kmem.
> 
> Uhm, not really.  Retrying with SIGKILL won't result in the signal being
> queued.

I think you may be missing the thrust: there are two sources of signals in
the world:

(1) User processes signalling each other or themselves.

(2) Kernel services signalling user processes in response to a trap or an
    event.

In both cases, we're talking about an EAGAIN error getting returned if
insufficient resources are available to the source of the signal, and in
both cases, we may be interested in a fail-stop approach.  The case I
believe Don is talking about specifically is the:

  Application boomctl tries to deliver SIGUSR1 to boomd, the reliable boom
  daemon.  boomctl gets back EAGAIN because the kernel does not have the
  resources to reliably deliver the signal, and boomd has a handler for
  SIGUSR1.  boomctl/boomd have fail-stop semantics, so boomctl calls
  kill(boomd_pid, SIGKILL).  Or, if it doesn't care about the failure very
  much, it queues the instance delivery via some other sort of
  non-asynchronous-delivery IPC. 

This permits fail-stop semantics where they are needed, but doesn't force
them on applications that would rather not stop.

Another case to consider is that of init.  Init may be interested in
SIGCHLD with process information, but not so interested that it wants to
be terminated if the pid can't be delivered with a siginfo; it can always
call wait().  You care a lot about reliable init behavior in a memory
constraint situation because if init dies, your system either halts or
panics, depending on the circumstance.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message