From owner-freebsd-arch Tue Oct 8 7:14:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3490B37B408 for ; Tue, 8 Oct 2002 07:14:28 -0700 (PDT) Received: from mail.speakeasy.net (mail14.speakeasy.net [216.254.0.214]) by mx1.FreeBSD.org (Postfix) with ESMTP id B72EE43E42 for ; Tue, 8 Oct 2002 07:14:27 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 26005 invoked from network); 8 Oct 2002 14:14:27 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail14.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 8 Oct 2002 14:14:27 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g98EEPn5006134; Tue, 8 Oct 2002 10:14:25 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200210080322.g983MIvU034090@gw.catspoiler.org> Date: Tue, 08 Oct 2002 10:14:29 -0400 (EDT) From: John Baldwin To: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., Cc: arch@FreeBSD.org, jmallett@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 08-Oct-2002 Don Lewis wrote: > On 7 Oct, John Baldwin wrote: >> >> On 07-Oct-2002 Don Lewis wrote: > >>> Probably, but the list is also modified in the exit code. All those >>> processes that we are sending SIGKILL to are removing themselves from >>> the list. >> >> Processes dieing from SIGKILL that we send them aren't a problem since >> we have already read their p_peers member before we kill them. That's >> the point of 'nq'. The problem is that 'nq' could exit and could be >> an invalid pointer. If a process later in the list after 'nq' died >> that is not a problem either. Well, how about this: > > I missed your use of nq, even though this is a fairly common way of > handling similar problems if there is only a single thread. > >> http://www.FreeBSD.org/~jhb/patches/ppeers.patch > > That's pretty much what I had envisioned. I have a little bit of a > concern that funnelling a single mutex could be a bottleneck in some > cases, but it is simple, safe, and otherwise low overhead. Well, the mutex is only used in the RFTHREAD case most of the time. The only time it is uncondtionally acquired it is almost immediately released in the !RFTHREAD case. > It looks like we've got a potential lock order reversal problem, though. > In fork1() we grab ppeers_lock while holding a couple of PROC_LOCKs, > while in the first part of exit1() we grab ppeers_lock before PROC_LOCK. > My caffeine level is insufficient to judge whether P_WEXIT checking > would save us in practice. Bah, fixed the reversal, thanks. We still need the P_WEXIT check in fork1() since otherwise a new peer or child could be added after we have finished going through the entire list. Hmm, adding this is ugly though b/c we really need to check after we acquire the ppeers_lock and do the actual hookup. Hmm, we can move the RFTHREAD stuff a lot earlier and then this isn't such a big deal. Ok, I've updated the patch again. One note: I've got a question about how to handle the error condition in that case in fork1(). I'm really starting to think that instead of returning an error, the peer process should just go ahead and call exit1() in this case since it is about to be killed anyways. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message