From owner-freebsd-bugs@FreeBSD.ORG  Tue Nov  6 10:03:32 2007
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6653116A421;
	Tue,  6 Nov 2007 10:03:32 +0000 (UTC)
	(envelope-from iedowse@iedowse.com)
Received: from nowhere.iedowse.com (nowhere.iedowse.com [IPv6:2001:770:13b::1])
	by mx1.freebsd.org (Postfix) with SMTP id 8786C13C49D;
	Tue,  6 Nov 2007 10:03:31 +0000 (UTC)
	(envelope-from iedowse@iedowse.com)
Received: from localhost  ([127.0.0.1] helo=iedowse.com)
	by nowhere.iedowse.com via local-iedowse id <aa74949@nowhere>;
	6 Nov 2007 10:03:30 +0000 (GMT)
To: Danny Braniss <danny@cs.huji.ac.il>
In-reply-to: <E1IoyTE-0004u8-HK@cs1.cs.huji.ac.il> 
References: <200711041127.lA4BRpg8049484@freefall.freebsd.org>
	<E1IoyTE-0004u8-HK@cs1.cs.huji.ac.il>
Comments: In-reply-to Danny Braniss <danny@cs.huji.ac.il>
	message dated "Mon, 05 Nov 2007 11:42:12 +0200."
Date: Tue, 06 Nov 2007 10:03:27 +0000
From: Ian Dowse <iedowse@iedowse.com>
Message-ID: <200711061003.aa74949@nowhere.iedowse.com>
Cc: freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org
Subject: Re: bin/117603: [patch] dump(8) hangs on SMP - 4way and higher. 
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Nov 2007 10:03:32 -0000

In message <E1IoyTE-0004u8-HK@cs1.cs.huji.ac.il>, Danny Braniss writes:
>I didn't get your 2nd. message, but i'm now looking at the pr :-)
>what if after
>	sigemptyset(&mask);
>we add
>	sigaddset(&mask, SIGUSR2);
>the sigsupend() should only return iff a SIGUSR2 was received.
>would'nt that solve the ^T et.all issue?

I think you'd need to use sigfillset() + sigdelset() instead, but
this would obviously block all other signals, which quite possibly
has unwanted side-effects. E.g. would the slave processes get left
behind if you interrupted the dump with Ctrl-C?

>at the moment only one host has this problem, and it's very unsettling, since
>I can't reproduce it on another similar host. On the other hand someone else
>reported the same issue, and my fix worked for him too.
>Anyways, I see no harm in a little cleanup/upgrade :-)
>also, my feeling is that the problem might be in the kernel, but I got lost
>following the code.

It's important to track down the actual cause of this, especially
if it is a kernel bug. Have you tried the version of the patch I
gave you yet? Its use of the "while (!caught)" loop should in theory
help to narrow down whether this is a race condition or some other
kind of signal loss problem. Also, further details would be helpful,
such as whether the issue generally happens as dump is starting up
or if it can happen after many megabytes of data have been written.

Ian