From owner-freebsd-current@FreeBSD.ORG Fri Oct 15 12:06:54 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B57216A527 for ; Fri, 15 Oct 2004 12:06:54 +0000 (GMT) Received: from mail21.syd.optusnet.com.au (mail21.syd.optusnet.com.au [211.29.133.158]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8577643D41 for ; Fri, 15 Oct 2004 12:06:53 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c211-30-75-229.belrs2.nsw.optusnet.com.au [211.30.75.229]) i9FC6jcR028966 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 15 Oct 2004 22:06:46 +1000 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])i9FC6jxP046220; Fri, 15 Oct 2004 22:06:45 +1000 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost)i9FC6j4T046219; Fri, 15 Oct 2004 22:06:45 +1000 (EST) (envelope-from pjeremy) Date: Fri, 15 Oct 2004 22:06:45 +1000 From: Peter Jeremy To: Doug White Message-ID: <20041015120645.GA46183@cirb503493.alcatel.com.au> References: <20041011074219.GA39251@cirb503493.alcatel.com.au> <20041011191812.A34886@carver.gumbysoft.com> <20041013071704.GQ83620@cirb503493.alcatel.com.au> <20041013182133.A55834@carver.gumbysoft.com> <20041015082618.GR83620@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041015082618.GR83620@cirb503493.alcatel.com.au> User-Agent: Mutt/1.4.2i cc: current@freebsd.org cc: Andrew Li Subject: Re: HP DL380 hangs on reboot X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Oct 2004 12:06:54 -0000 [Upon reflection, my previous e-mail was somewhat brusque. My apologies] On Fri, 2004-Oct-15 18:26:18 +1000, Peter Jeremy wrote: >On Wed, 2004-Oct-13 18:21:56 -0700, Doug White wrote: >>On Wed, 13 Oct 2004, Peter Jeremy wrote: >> >>> On Mon, 2004-Oct-11 19:18:52 -0700, Doug White wrote: >>> >On Mon, 11 Oct 2004, Peter Jeremy wrote: >>> >> I have an HP DL380 running 5.3 and it will not reboot from multi-user >>> >> mode - it hangs after printing "Rebooting..." and needs to be power- >>> >> cycled (since there's no reset button). > >>> I've narrowed it down to loading kernel modules - the problem does not > >>How about building them into your kernel instead? > >That seems to work. But it doesn't solve the underlying problem. Compiling digi(4) into the kernel does appear to solve my immediate problem. Thanks for the suggestion Doug. My remaining concern is that I have been unable to identify the root cause of the problem. These machines will be going into customer sites as the remote access servers and requiring site access to reboot it is very undesirable. Since I don't know the real cause of the problem, I can't be sure that normal activity will not cause the problem to recur. > (I >have been kldload'ing digi because it originally didn't work when it >was compiled into the kernel). When digi(4) was originally added to the tree, it could not be compiled into the kernel because its attach routines were not compatible with the kernel initialisation environment. This problem was resolved a couple of years ago but I have continued to kldload digi because: - it worked and saved me from making the (trivial) changes needed to build it into the kernel. - it needs access to a number of Digi BIOS files which are normally loaded/unloaded as KLDs. If Digi is built in, the BIOS file(s) need to be built in as well. (Though in my case, the wasted KVA and RAM is irrelevant). - to date, all the machines are running 4.x and having digi as a kld made it easier to fix back-porting errors. (Re-compiling a module is a lot faster than re-compiling the kernel). >>This could be just stale modules... > >Nope. Kernel and modules were compiled and installed together. Also >there was no problem running and the hang was when the kernel asked >the system to reboot - which is well after any modules have been unloaded. Adding some printf's shows that the code is getting into cpu_reset_real(). By this time, all modules and subsystems have been shutdown. All that's left to do is ask the hardware for a reset. The only problem is that the hardware doesn't want to play ball. Presumably something the kernel is doing (apparently associated with loading kernel modules) is disturbing the hardware state so that reset no longer works. -- Peter Jeremy