From owner-freebsd-stable@FreeBSD.ORG Tue Oct 10 19:46:25 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C9CA016A407; Tue, 10 Oct 2006 19:46:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC24E43D78; Tue, 10 Oct 2006 19:46:15 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (zion.baldwin.cx [192.168.0.7]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id k9AJjrIK089790; Tue, 10 Oct 2006 15:46:01 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-stable@freebsd.org Date: Tue, 10 Oct 2006 15:44:43 -0400 User-Agent: KMail/1.9.1 References: <200610101720.k9AHKdMI099668@ambrisko.com> In-Reply-To: <200610101720.k9AHKdMI099668@ambrisko.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200610101544.43903.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [192.168.0.1]); Tue, 10 Oct 2006 15:46:02 -0400 (EDT) X-Virus-Scanned: ClamAV 0.88.3/2020/Tue Oct 10 14:11:22 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: stable@freebsd.org, Bruno Ducrot , Bill Moran Subject: Re: Dell 1950 does not properly respond to reboot and shutdown -p X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Oct 2006 19:46:26 -0000 On Tuesday 10 October 2006 13:20, Doug Ambrisko wrote: > John Baldwin writes: > | On Tuesday 10 October 2006 08:54, Bill Moran wrote: > | > In response to Doug Ambrisko : > | > > Bruno Ducrot writes: > | > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote: > | > > | > In response to Bruno Ducrot : > | > > | > > Hi, > | > > | > > > | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote: > | > > | > > > > | > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the > | > > | > > > shutdown screen. > | > > | > > > > | > > | > > > A shutdown -p does the same. > | > > | > > > | > > | > > What exactly are the last few lines? > | > > | > > | > > | > (manually copied) > | > > | > > | > > | > ... > | > > | > All buffers synced. > | > > | > Uptime: 1m16s > | > > | > > | > > | > | > > | Thanks. Then this happen after print_uptime(). > | > > | > | > > | I believe one of the drivers register a shutdown_final (or > | > > | shutdown_post_sync) event that hang your system. I think (though I > | > > | may be wrong) mfi may be that one. > | > > | > | > > | It would help if you can add some printf in dev/mfi/mfi.c into the > | > > | mfi_shutdown() function in order to check if that assumption > | > > | is correct. > | > > > | > > Some what related to this we have a local hack: > | > > > | > > --- sys/kern/subr_bus.c.orig Tue Jun 27 15:49:39 2006 > | > > +++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006 > | > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev) > | > > device_t child; > | > > > | > > TAILQ_FOREACH(child, &dev->children, link) { > | > > + DELAY(1000); > | > > device_shutdown(child); > | > > } > | > > | > This patch seems to "fix" the problem. I'm going to replace it with > | > some printfs and see if I can determine which driver is actually > | > causing the problem (hopefully it's only one). > | > > | > Am I wrong in saying that the correct solution would be to identify the > | > driver that needs more time and implementing some sort of polling > | > mechanism to ensure the hardware is ready when the driver wants to > | > shut down? > | > | Well, first let's see which driver it is. :) You might be able to just > | remove the DELAY and add a printf and see which device is printed last. > > I think it was in a different ones. One of our configs has the base > HW + bge NIC the other has base HW + 2 x 2 port em NICs. The more > NIC's the better chance for a problem. > > I've removed the hack from our kernel and I'm going to run the reboot > cycle. I don't think a printf will work since I recall trying that > it "fixed" the problem so I put the DELAY in :-( It could be generic > problem to the system with a sufficiently fast CPU to beat the > HW at shutting down. I'm not sure if his system is Dempsey or Woodcrest. > We use Woodcrest and they are really faster. Other machines might be > "slow" enough that it's not a a problem! We haven't seen it on our older > platforms with the same kernel and similar HW configs. Can you break into the debugger when it is broken? If so, then change the printf to a KTR trace and enable just that KTR trace and do 'show ktr' in ddb to see which devices were shutdown. -- John Baldwin