From owner-freebsd-questions@FreeBSD.ORG Fri May 21 00:15:14 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B262A16A4CF for ; Fri, 21 May 2004 00:15:14 -0700 (PDT) Received: from www.wcborstel.nl (node-c-0ab6.a2000.nl [62.194.10.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id A860E43D1D for ; Fri, 21 May 2004 00:15:13 -0700 (PDT) (envelope-from jorn@wcborstel.nl) Received: from [127.0.0.1] (unknown [172.16.1.4]) by www.wcborstel.nl (Postfix) with ESMTP id 9011D84D9; Fri, 21 May 2004 09:07:48 +0200 (CEST) Message-ID: <40ADAA85.4000204@wcborstel.nl> Date: Fri, 21 May 2004 09:06:45 +0200 From: Jorn Argelo User-Agent: Mozilla Thunderbird 0.6 (Windows/20040502) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Nicholas Bernstein References: <1085099914.11375.222.camel@nick.docmagic.com> In-Reply-To: <1085099914.11375.222.camel@nick.docmagic.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-questions Subject: Re: System Hang X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 07:15:14 -0000 Nicholas Bernstein wrote: >hello all, >I'm hoping someone can give me a hand with this. I have a suspicion as >to what is causing this, but I don't want to "taint" any replies I get. >If any of knowledgeable folks out there could help me out, offer >possible areas to look into, better places to contact, or anything that >could possibly be helpful, I would really, really appreciate it. > > thanks in advance, > Nick >Info follows: > > >System: >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >HP Proliant DL140 >http://h18004.www1.hp.com/products/servers/proliantdl140/index.html >FreeBSD 5.2-CURRENT #0 standard kernel >2 xeon (hyperthreaded) processors > >Problem Description: >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >System becomes unresponsive and "hangs". System does not respond to >keyboard, network or any other type of input. > > >Error Message: >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >panic: Assertion TD_ON_SLEEPQ(td) failed at >/usr/src/sys/kern/subr_sleepqueue.c:783 at line 783 in file: >/usr/src/sys/kern/subr_sleepqueue.c >cpuid=1 >Debugger("panic") >Spin lick sched lock held by 0x617eb00 for > 5 > >Related info: >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >/usr/src/sys/kern/subr_sleepqueue.c: >... > 770 /* > 771 * Abort a thread as if an interrupt had occured. Only abort > 772 * interruptable waits (unfortunately it isn't safe to abort >others). > 773 * > 774 * XXX: What in the world does the comment below mean? > 775 * Also, whatever the signal code does... > 776 */ > 777 void > 778 sleepq_abort(struct thread *td) > 779 { > 780 void *wchan; > 781 > 782 mtx_assert(&sched_lock, MA_OWNED); > 783 MPASS(TD_ON_SLEEPQ(td)); > 784 MPASS(td->td_flags & TDF_SINTR); > 785 > 786 /* > 787 * If the TDF_TIMEOUT flag is set, just leave. A > 788 * timeout is scheduled anyhow. > 789 */ > 790 if (td->td_flags & TDF_TIMEOUT) > 791 return; > 792 > 793 CTR3(KTR_PROC, "sleepq_abort: thread %p (pid %d, %s)", >td, > 794 td->td_proc->p_pid, td->td_proc->p_comm); > 795 wchan = td->td_wchan; > 796 mtx_unlock_spin(&sched_lock); > 797 sleepq_remove(td, wchan); > 798 mtx_lock_spin(&sched_lock); > 799 } > > >Also, in order for the machine to detect it's broadcom 5700 network >cards, I had to the line > acpi_load="no" >to my /boot/loader.conf. Upon reboot the network cards show up in an >ifconfig and work perfectly. > > >Possible references: >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >This isn't the exact same error, but it's the closest thing I could find >to my error: > >http://lists.freebsd.org/pipermail/freebsd-current/2004-March/022633.html > >This error is also pretty close, but not the same thing: > >http://groups.google.com/groups?q=%27panic:+Assertion+TD_ON_SLEEPQ(td)+failed+at%27&hl=en&lr=&ie=UTF-8&safe=off&selm=200405182105.04275.thierry%40herbelot.com&rnum=1 > > > . . . > Thanks for taking the time to read this. > . . . > > > I wonder ... why do want to run CURRENT on a machine like that? It's the bleeding edge source code, which is unstable most of the times. You might want to consider running 4.9 on that machine, which is the production release. You can try 5.2.1 as well, but it still falls under the unstable branch. So in other words, post your error at the CURRENT mailing list, and switch back to 4.9. I think you will solve many problems with that. Cheers, Jorn