Date: Thu, 12 Oct 2006 16:40:22 +0200 From: Bruno Ducrot <ducrot@poupinou.org> To: Bill Moran <wmoran@collaborativefusion.com> Cc: freebsd-stable@freebsd.org, stable@freebsd.org, John Baldwin <jhb@freebsd.org> Subject: Re: Dell 1950 does not properly respond to reboot and shutdown -p Message-ID: <20061012144022.GV4945@poupinou.org> In-Reply-To: <20061010145315.cefa9e19.wmoran@collaborativefusion.com> References: <200610101022.33761.jhb@freebsd.org> <200610101720.k9AHKdMI099668@ambrisko.com> <20061010145315.cefa9e19.wmoran@collaborativefusion.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Oct 10, 2006 at 02:53:15PM -0400, Bill Moran wrote: > In response to Doug Ambrisko <ambrisko@ambrisko.com>: > > > John Baldwin writes: > > | On Tuesday 10 October 2006 08:54, Bill Moran wrote: > > | > In response to Doug Ambrisko <ambrisko@ambrisko.com>: > > | > > Bruno Ducrot writes: > > | > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote: > > | > > | > In response to Bruno Ducrot <ducrot@poupinou.org>: > > | > > | > > Hi, > > | > > | > > > > | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote: > > | > > | > > > > > | > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the > > | > > | > > > shutdown screen. > > | > > | > > > > > | > > | > > > A shutdown -p does the same. > > | > > | > > > > | > > | > > What exactly are the last few lines? > > | > > | > > > | > > | > (manually copied) > > | > > | > > > | > > | > ... > > | > > | > All buffers synced. > > | > > | > Uptime: 1m16s > > | > > | > > > | > > | > > | > > | Thanks. Then this happen after print_uptime(). > > | > > | > > | > > | I believe one of the drivers register a shutdown_final (or > > | > > | shutdown_post_sync) event that hang your system. I think (though I > > | > > | may be wrong) mfi may be that one. > > | > > | > > | > > | It would help if you can add some printf in dev/mfi/mfi.c into the > > | > > | mfi_shutdown() function in order to check if that assumption > > | > > | is correct. > > | > > > > | > > Some what related to this we have a local hack: > > | > > > > | > > --- sys/kern/subr_bus.c.orig Tue Jun 27 15:49:39 2006 > > | > > +++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006 > > | > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev) > > | > > device_t child; > > | > > > > | > > TAILQ_FOREACH(child, &dev->children, link) { > > | > > + DELAY(1000); > > | > > device_shutdown(child); > > | > > } > > | > > > | > This patch seems to "fix" the problem. I'm going to replace it with > > | > some printfs and see if I can determine which driver is actually > > | > causing the problem (hopefully it's only one). > > | > > > | > Am I wrong in saying that the correct solution would be to identify the > > | > driver that needs more time and implementing some sort of polling > > | > mechanism to ensure the hardware is ready when the driver wants to > > | > shut down? > > | > > | Well, first let's see which driver it is. :) You might be able to just > > | remove the DELAY and add a printf and see which device is printed last. > > > > I think it was in a different ones. One of our configs has the base > > HW + bge NIC the other has base HW + 2 x 2 port em NICs. The more > > NIC's the better chance for a problem. > > > > I've removed the hack from our kernel and I'm going to run the reboot > > cycle. I don't think a printf will work since I recall trying that > > it "fixed" the problem so I put the DELAY in :-( It could be generic > > problem to the system with a sufficiently fast CPU to beat the > > HW at shutting down. I'm not sure if his system is Dempsey or Woodcrest. > > We use Woodcrest and they are really faster. Other machines might be > > "slow" enough that it's not a a problem! We haven't seen it on our older > > platforms with the same kernel and similar HW configs. > > Well, I already did this. The only printf is the > device_printf(child, "shutdown\n") that Bruno suggested. With this > single change, I'm unable to reproduce the problem. > > Have any commits been made to 6-STABLE that might have inadvertently > fixed this in the last week or so? > The device_printf() function take too much time I think, so you get the same behaviour as the DELAY(). -- Bruno Ducrot -- Which is worse: ignorance or apathy? -- Don't know. Don't care.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061012144022.GV4945>