From owner-freebsd-current@FreeBSD.ORG Thu Jan 6 21:21:24 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1478616A56E for ; Thu, 6 Jan 2005 21:21:24 +0000 (GMT) Received: from mail2.speakeasy.net (mail2.speakeasy.net [216.254.0.202]) by mx1.FreeBSD.org (Postfix) with ESMTP id C4C5B43D2D for ; Thu, 6 Jan 2005 21:21:23 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 30971 invoked from network); 6 Jan 2005 21:21:23 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 6 Jan 2005 21:21:23 -0000 Received: from [10.50.41.243] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id j06LKeBF083753; Thu, 6 Jan 2005 16:21:18 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-current@FreeBSD.org Date: Thu, 6 Jan 2005 16:17:49 -0500 User-Agent: KMail/1.6.2 References: <20050105122636.GA31684@peter.osted.lan> In-Reply-To: <20050105122636.GA31684@peter.osted.lan> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200501061617.49967.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx Subject: Re: Assertion td->td_sleepqueue != NULL failed at kern/subr_sleepqueue.c:270 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Jan 2005 21:21:24 -0000 On Wednesday 05 January 2005 07:26 am, Peter Holm wrote: > With GENERIC HEAD from Dec 31 09:28 UTC + bmilekic@'s uma_core > patch + alc's patch I got the following strange assert: > > panic(c0827c46,c082dd18,c082dc8d,10e,c08f4660) at panic+0x190 > sleepq_add(c08eec90,c08ee6e8,c082a9bf,1,c08ee6e8,0,c0827ca9,7d) > at sleepq_add+0x156 > cv_wait(c08eec90,c08ee6e8,c151de30,0,ffffffff) at cv_wait+0x100 > _sx_xlock(c08eec60,c0828867,247,0,c151ddc8) at _sx_xlock+0x59 > kern_wait(c151e450,ffffffff,cbc67c90,0,0) at kern_wait+0x4b > wait4(c151e450,cbc67d14,4,3f8,282) at wait4+0x29 > syscall(2f,2f,bfbf002f,2,0) at syscall+0x128 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip = 0x805170b, esp = > 0xbfbfedbc, ebp = 0xbfbfedd8 --- > > Looks like td->td_sleepqueue is NULL! > > Details at http://www.holm.cc/stress/log/cons100.html This is a truly odd panic. The basic theory of operation with sleep queues is that every thread that is not already queued on a sleep queue carries a sleep queue structure around that they donate to a wait channel when they block on it. Once they are resumed, they reclaim a sleep queue from the waitchannel. This resuming bit happens in sleepq_remove_thread() in subr_sleepqueue.c. As you can see, in addition to assigning a sleepqueue to the thread being removed from a queue, it also clears td_wchan and td_wmesg. The thread in question has both fields set (as if it were asleep on "proctree", which is what it is trying to back to sleep on now). However, it is not on a sleep queue (td_slpq.tqe_next is NULL). So, apparently, it seems that a thread was removed from the sleep queue and resumed (made runnable) but sleepq_remove_thread() wasn't called. Do you have any local patches that might affect this btw? I notice you get a lot of trap 9's in your dmesg which is somewhat unsettling. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org