From owner-freebsd-stable@FreeBSD.ORG Thu Oct 12 14:40:50 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9715716A415; Thu, 12 Oct 2006 14:40:50 +0000 (UTC) (envelope-from ducrot@poupinou.org) Received: from poup.poupinou.org (poup.poupinou.org [195.101.94.96]) by mx1.FreeBSD.org (Postfix) with ESMTP id 91BF343D68; Thu, 12 Oct 2006 14:40:49 +0000 (GMT) (envelope-from ducrot@poupinou.org) Received: from ducrot by poup.poupinou.org with local (Exim) id 1GY1jS-0001qp-00; Thu, 12 Oct 2006 16:40:22 +0200 Date: Thu, 12 Oct 2006 16:40:22 +0200 To: Bill Moran Message-ID: <20061012144022.GV4945@poupinou.org> References: <200610101022.33761.jhb@freebsd.org> <200610101720.k9AHKdMI099668@ambrisko.com> <20061010145315.cefa9e19.wmoran@collaborativefusion.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20061010145315.cefa9e19.wmoran@collaborativefusion.com> User-Agent: Mutt/1.5.9i From: Bruno Ducrot Cc: freebsd-stable@freebsd.org, stable@freebsd.org, John Baldwin Subject: Re: Dell 1950 does not properly respond to reboot and shutdown -p X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Oct 2006 14:40:50 -0000 On Tue, Oct 10, 2006 at 02:53:15PM -0400, Bill Moran wrote: > In response to Doug Ambrisko : > > > John Baldwin writes: > > | On Tuesday 10 October 2006 08:54, Bill Moran wrote: > > | > In response to Doug Ambrisko : > > | > > Bruno Ducrot writes: > > | > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote: > > | > > | > In response to Bruno Ducrot : > > | > > | > > Hi, > > | > > | > > > > | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote: > > | > > | > > > > > | > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the > > | > > | > > > shutdown screen. > > | > > | > > > > > | > > | > > > A shutdown -p does the same. > > | > > | > > > > | > > | > > What exactly are the last few lines? > > | > > | > > > | > > | > (manually copied) > > | > > | > > > | > > | > ... > > | > > | > All buffers synced. > > | > > | > Uptime: 1m16s > > | > > | > > > | > > | > > | > > | Thanks. Then this happen after print_uptime(). > > | > > | > > | > > | I believe one of the drivers register a shutdown_final (or > > | > > | shutdown_post_sync) event that hang your system. I think (though I > > | > > | may be wrong) mfi may be that one. > > | > > | > > | > > | It would help if you can add some printf in dev/mfi/mfi.c into the > > | > > | mfi_shutdown() function in order to check if that assumption > > | > > | is correct. > > | > > > > | > > Some what related to this we have a local hack: > > | > > > > | > > --- sys/kern/subr_bus.c.orig Tue Jun 27 15:49:39 2006 > > | > > +++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006 > > | > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev) > > | > > device_t child; > > | > > > > | > > TAILQ_FOREACH(child, &dev->children, link) { > > | > > + DELAY(1000); > > | > > device_shutdown(child); > > | > > } > > | > > > | > This patch seems to "fix" the problem. I'm going to replace it with > > | > some printfs and see if I can determine which driver is actually > > | > causing the problem (hopefully it's only one). > > | > > > | > Am I wrong in saying that the correct solution would be to identify the > > | > driver that needs more time and implementing some sort of polling > > | > mechanism to ensure the hardware is ready when the driver wants to > > | > shut down? > > | > > | Well, first let's see which driver it is. :) You might be able to just > > | remove the DELAY and add a printf and see which device is printed last. > > > > I think it was in a different ones. One of our configs has the base > > HW + bge NIC the other has base HW + 2 x 2 port em NICs. The more > > NIC's the better chance for a problem. > > > > I've removed the hack from our kernel and I'm going to run the reboot > > cycle. I don't think a printf will work since I recall trying that > > it "fixed" the problem so I put the DELAY in :-( It could be generic > > problem to the system with a sufficiently fast CPU to beat the > > HW at shutting down. I'm not sure if his system is Dempsey or Woodcrest. > > We use Woodcrest and they are really faster. Other machines might be > > "slow" enough that it's not a a problem! We haven't seen it on our older > > platforms with the same kernel and similar HW configs. > > Well, I already did this. The only printf is the > device_printf(child, "shutdown\n") that Bruno suggested. With this > single change, I'm unable to reproduce the problem. > > Have any commits been made to 6-STABLE that might have inadvertently > fixed this in the last week or so? > The device_printf() function take too much time I think, so you get the same behaviour as the DELAY(). -- Bruno Ducrot -- Which is worse: ignorance or apathy? -- Don't know. Don't care.