From owner-freebsd-arch Fri Oct 11 5:53:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A442F37B404; Fri, 11 Oct 2002 05:53:36 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2539A43E91; Fri, 11 Oct 2002 05:53:34 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.4/8.12.4) with SMTP id g9BCr0Oo049306; Fri, 11 Oct 2002 08:53:00 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Fri, 11 Oct 2002 08:52:59 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Juli Mallett Cc: Don Lewis , wollman@lcs.mit.edu, arch@FreeBSD.org Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] In-Reply-To: <20021011053720.A2431@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 11 Oct 2002, Juli Mallett wrote: > > Solaris returns an EAGAIN to the caller and the target is unaffected. If > > the caller really wants to nuke the target, it could retry with kill(). > > The same error will be returned if there are too many signals in the > > target's queue, which should prevent the signal queue for a wedged > > process from consuming all of kmem. > > Uhm, not really. Retrying with SIGKILL won't result in the signal being > queued. I think you may be missing the thrust: there are two sources of signals in the world: (1) User processes signalling each other or themselves. (2) Kernel services signalling user processes in response to a trap or an event. In both cases, we're talking about an EAGAIN error getting returned if insufficient resources are available to the source of the signal, and in both cases, we may be interested in a fail-stop approach. The case I believe Don is talking about specifically is the: Application boomctl tries to deliver SIGUSR1 to boomd, the reliable boom daemon. boomctl gets back EAGAIN because the kernel does not have the resources to reliably deliver the signal, and boomd has a handler for SIGUSR1. boomctl/boomd have fail-stop semantics, so boomctl calls kill(boomd_pid, SIGKILL). Or, if it doesn't care about the failure very much, it queues the instance delivery via some other sort of non-asynchronous-delivery IPC. This permits fail-stop semantics where they are needed, but doesn't force them on applications that would rather not stop. Another case to consider is that of init. Init may be interested in SIGCHLD with process information, but not so interested that it wants to be terminated if the pid can't be delivered with a siginfo; it can always call wait(). You care a lot about reliable init behavior in a memory constraint situation because if init dies, your system either halts or panics, depending on the circumstance. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message