Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Jan 2005 16:17:49 -0500
From:      John Baldwin <jhb@FreeBSD.org>
To:        freebsd-current@FreeBSD.org
Subject:   Re: Assertion td->td_sleepqueue != NULL failed at kern/subr_sleepqueue.c:270
Message-ID:  <200501061617.49967.jhb@FreeBSD.org>
In-Reply-To: <20050105122636.GA31684@peter.osted.lan>
References:  <20050105122636.GA31684@peter.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 05 January 2005 07:26 am, Peter Holm wrote:
> With GENERIC HEAD from Dec 31 09:28 UTC + bmilekic@'s uma_core
> patch + alc's patch I got the following strange assert:
>
> panic(c0827c46,c082dd18,c082dc8d,10e,c08f4660) at panic+0x190
> sleepq_add(c08eec90,c08ee6e8,c082a9bf,1,c08ee6e8,0,c0827ca9,7d)
>    at sleepq_add+0x156
> cv_wait(c08eec90,c08ee6e8,c151de30,0,ffffffff) at cv_wait+0x100
> _sx_xlock(c08eec60,c0828867,247,0,c151ddc8) at _sx_xlock+0x59
> kern_wait(c151e450,ffffffff,cbc67c90,0,0) at kern_wait+0x4b
> wait4(c151e450,cbc67d14,4,3f8,282) at wait4+0x29
> syscall(2f,2f,bfbf002f,2,0) at syscall+0x128
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (7, FreeBSD ELF32, wait4), eip = 0x805170b, esp =
> 0xbfbfedbc, ebp = 0xbfbfedd8 ---
>
> Looks like td->td_sleepqueue is NULL!
>
> Details at http://www.holm.cc/stress/log/cons100.html

This is a truly odd panic.  The basic theory of operation with sleep queues is 
that every thread that is not already queued on a sleep queue carries a sleep 
queue structure around that they donate to a wait channel when they block on 
it.  Once they are resumed, they reclaim a sleep queue from the waitchannel.  
This resuming bit happens in sleepq_remove_thread() in subr_sleepqueue.c.  As 
you can see, in addition to assigning a sleepqueue to the thread being 
removed from a queue, it also clears td_wchan and td_wmesg.  The thread in 
question has both fields set (as if it were asleep on "proctree", which is 
what it is trying to back to sleep on now).  However, it is not on a sleep 
queue (td_slpq.tqe_next is NULL).  So, apparently, it seems that a thread was 
removed from the sleep queue and resumed (made runnable) but 
sleepq_remove_thread() wasn't called.  Do you have any local patches that 
might affect this btw?  I notice you get a lot of trap 9's in your dmesg 
which is somewhat unsettling.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200501061617.49967.jhb>