From owner-freebsd-current@FreeBSD.ORG Tue Feb 8 20:24:36 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 29E1316A4CF for ; Tue, 8 Feb 2005 20:24:36 +0000 (GMT) Received: from relay01.pair.com (relay01.pair.com [209.68.5.15]) by mx1.FreeBSD.org (Postfix) with SMTP id 5DF8043D41 for ; Tue, 8 Feb 2005 20:24:35 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 44890 invoked from network); 8 Feb 2005 20:24:34 -0000 Received: from unknown (HELO peter.osted.lan) (unknown) by unknown with SMTP; 8 Feb 2005 20:24:34 -0000 X-pair-Authenticated: 80.161.118.233 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.13.1/8.13.1) with ESMTP id j18KOXjp046218; Tue, 8 Feb 2005 21:24:33 +0100 (CET) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.13.1/8.13.1/Submit) id j18KOXfx046217; Tue, 8 Feb 2005 21:24:33 +0100 (CET) (envelope-from pho) Date: Tue, 8 Feb 2005 21:24:33 +0100 From: Peter Holm To: John Baldwin Message-ID: <20050208202433.GB46141@peter.osted.lan> References: <20050105122636.GA31684@peter.osted.lan> <200501061617.49967.jhb@FreeBSD.org> <20050106214554.GA45533@peter.osted.lan> <200502081440.49979.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200502081440.49979.jhb@FreeBSD.org> User-Agent: Mutt/1.4.2.1i cc: freebsd-current@FreeBSD.org Subject: Re: Assertion td->td_sleepqueue != NULL failed at kern/subr_sleepqueue.c:270 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Feb 2005 20:24:36 -0000 On Tue, Feb 08, 2005 at 02:40:49PM -0500, John Baldwin wrote: > On Thursday 06 January 2005 04:45 pm, Peter Holm wrote: > > On Thu, Jan 06, 2005 at 04:17:49PM -0500, John Baldwin wrote: > > > On Wednesday 05 January 2005 07:26 am, Peter Holm wrote: > > > > With GENERIC HEAD from Dec 31 09:28 UTC + bmilekic@'s uma_core > > > > patch + alc's patch I got the following strange assert: > > > > > > > > panic(c0827c46,c082dd18,c082dc8d,10e,c08f4660) at panic+0x190 > > > > sleepq_add(c08eec90,c08ee6e8,c082a9bf,1,c08ee6e8,0,c0827ca9,7d) > > > > at sleepq_add+0x156 > > > > cv_wait(c08eec90,c08ee6e8,c151de30,0,ffffffff) at cv_wait+0x100 > > > > _sx_xlock(c08eec60,c0828867,247,0,c151ddc8) at _sx_xlock+0x59 > > > > kern_wait(c151e450,ffffffff,cbc67c90,0,0) at kern_wait+0x4b > > > > wait4(c151e450,cbc67d14,4,3f8,282) at wait4+0x29 > > > > syscall(2f,2f,bfbf002f,2,0) at syscall+0x128 > > > > Xint0x80_syscall() at Xint0x80_syscall+0x1f > > > > --- syscall (7, FreeBSD ELF32, wait4), eip = 0x805170b, esp = > > > > 0xbfbfedbc, ebp = 0xbfbfedd8 --- > > > > > > > > Looks like td->td_sleepqueue is NULL! > > > > > > > > Details at http://www.holm.cc/stress/log/cons100.html > > > > > > This is a truly odd panic. The basic theory of operation with sleep > > > queues is that every thread that is not already queued on a sleep queue > > > carries a sleep queue structure around that they donate to a wait channel > > > when they block on it. Once they are resumed, they reclaim a sleep queue > > > from the waitchannel. This resuming bit happens in sleepq_remove_thread() > > > in subr_sleepqueue.c. As you can see, in addition to assigning a > > > sleepqueue to the thread being removed from a queue, it also clears > > > td_wchan and td_wmesg. The thread in question has both fields set (as if > > > it were asleep on "proctree", which is what it is trying to back to sleep > > > on now). However, it is not on a sleep queue (td_slpq.tqe_next is NULL). > > > So, apparently, it seems that a thread was removed from the sleep queue > > > and resumed (made runnable) but > > > sleepq_remove_thread() wasn't called. Do you have any local patches that > > > might affect this btw? I notice you get a lot of trap 9's in your dmesg > > > which is somewhat unsettling. > > > > These are the modifications: > > http://www.holm.cc/stress/log/mods.html > > > > The trap 9 are not uncommon for the test suite. > > I'm still thinking about this FYI as I've seen this at least once or twice, > but I still don't understand how it is happend. In the other case I've > looked at, it is as if the thread has been awakened by someone outside of the > sleep queue code because td_wchan and td_wmesg are still set > (sleepq_remove_thread() clears them) and the associated wait channel > (proctree, which is another common theme) has a sleep queue with no waiters > attached to it. That is, the sleep queue that curthread should have is still > sitting on a sleep queue chain, which is consistent with the thread being > made runnable without going through sleepq_remove_thread(). Are you able to > reproduce this at all? If so, can you do it with KTR enabled and KTR_PROC > tracing turned on perhaps? Thanks. > No, I have only seen this problem once. But if I succeed in provoking this problem again I will try out with KTR. Thank you for your reply. > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org -- Peter Holm