From owner-freebsd-bugs@FreeBSD.ORG Tue Nov 6 10:03:32 2007 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6653116A421; Tue, 6 Nov 2007 10:03:32 +0000 (UTC) (envelope-from iedowse@iedowse.com) Received: from nowhere.iedowse.com (nowhere.iedowse.com [IPv6:2001:770:13b::1]) by mx1.freebsd.org (Postfix) with SMTP id 8786C13C49D; Tue, 6 Nov 2007 10:03:31 +0000 (UTC) (envelope-from iedowse@iedowse.com) Received: from localhost ([127.0.0.1] helo=iedowse.com) by nowhere.iedowse.com via local-iedowse id ; 6 Nov 2007 10:03:30 +0000 (GMT) To: Danny Braniss In-reply-to: References: <200711041127.lA4BRpg8049484@freefall.freebsd.org> Comments: In-reply-to Danny Braniss message dated "Mon, 05 Nov 2007 11:42:12 +0200." Date: Tue, 06 Nov 2007 10:03:27 +0000 From: Ian Dowse Message-ID: <200711061003.aa74949@nowhere.iedowse.com> Cc: freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org Subject: Re: bin/117603: [patch] dump(8) hangs on SMP - 4way and higher. X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Nov 2007 10:03:32 -0000 In message , Danny Braniss writes: >I didn't get your 2nd. message, but i'm now looking at the pr :-) >what if after > sigemptyset(&mask); >we add > sigaddset(&mask, SIGUSR2); >the sigsupend() should only return iff a SIGUSR2 was received. >would'nt that solve the ^T et.all issue? I think you'd need to use sigfillset() + sigdelset() instead, but this would obviously block all other signals, which quite possibly has unwanted side-effects. E.g. would the slave processes get left behind if you interrupted the dump with Ctrl-C? >at the moment only one host has this problem, and it's very unsettling, since >I can't reproduce it on another similar host. On the other hand someone else >reported the same issue, and my fix worked for him too. >Anyways, I see no harm in a little cleanup/upgrade :-) >also, my feeling is that the problem might be in the kernel, but I got lost >following the code. It's important to track down the actual cause of this, especially if it is a kernel bug. Have you tried the version of the patch I gave you yet? Its use of the "while (!caught)" loop should in theory help to narrow down whether this is a race condition or some other kind of signal loss problem. Also, further details would be helpful, such as whether the issue generally happens as dump is starting up or if it can happen after many megabytes of data have been written. Ian