From owner-freebsd-arch@FreeBSD.ORG Sun Oct 3 07:24:42 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1543E16A4CF for ; Sun, 3 Oct 2004 07:24:42 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 965DC43D46 for ; Sun, 3 Oct 2004 07:24:41 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 21261 invoked from network); 3 Oct 2004 07:24:40 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 3 Oct 2004 07:24:40 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i937OdXh004818; Sun, 3 Oct 2004 09:24:39 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i937OchI004817; Sun, 3 Oct 2004 09:24:38 +0200 (CEST) (envelope-from pho) Date: Sun, 3 Oct 2004 09:24:38 +0200 From: Peter Holm To: Julian Elischer Message-ID: <20041003072438.GA4734@peter.osted.lan> References: <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001192551.GA3381@peter.osted.lan> <20041002053351.GA6259@peter.osted.lan> <415EEFFE.5080309@elischer.org> <20041002183120.GA1202@peter.osted.lan> <1096760257.34527.14.camel@palm.tree.com> <415F677B.5030108@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <415F677B.5030108@elischer.org> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: "freebsd-arch@freebsd.org" cc: Stephan Uphoff Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Oct 2004 07:24:42 -0000 On Sat, Oct 02, 2004 at 07:44:11PM -0700, Julian Elischer wrote: > Stephan Uphoff wrote: > >On Sat, 2004-10-02 at 14:31, Peter Holm wrote: > > I have now stress tested for more than 16 hours without seeing any freezes. This is *good* news. However, I did get a panic. I'll try to recreate it and hopefully get a dump: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x14c fault code = supervisor read, page not present instruction pointer = 0x8:0xc0618ade stack pointer = 0x10:0xcfcbabfc frame pointer = 0x10:0xcfcbac0c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 30600 (java_vm) [thread 100801] Stopped at sched_add+0x16: movl 0x14c(%esi),%ebx db> where sched_add(0,0) at sched_add+0x16 setrunqueue(c3545a80,0,c181ca80,c3545a80,cfcbac54) at setrunqueue+0x199 sched_wakeup(c3545a80) at sched_wakeup+0x43 setrunnable(c3545a80,c3545a80,cfcbac78,cfcbac8c,c0625cbf) at setrunnable+0x92 sleepq_resume_thread(c3545a80,ffffffff) at sleepq_resume_thread+0x82 sleepq_broadcast(c370be40,0,ffffffff,cfcbaccc,c05fae01) at sleepq_broadcast+0xf7 wakeup(c370be40) at wakeup+0xf thread_userret(c1a93d80,cfcbad48) at thread_userret+0x121 userret(c1a93d80,cfcbad48,1,3,1) at userret+0x57 syscall(837002f,80d002f,bf69002f,8215000,2850c6e0) at syscall+0x2d9 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (1, FreeBSD ELF32, sys_exit), eip = 0x280e667f, esp = 0xbf8e5b50, ebp = 0xbf8e5b7c --- db> call doadump Dumping 255 MB panic: blockable sleep lock (sleep mutex) taskqueue @ kern/subr_taskqueue.c:132 cpuid = 0 Uptime: 16h5m17s (kgdb) l *0xc0618ade 0xc0618ade is in sched_add (../../../kern/sched_4bsd.c:960). 955 #ifdef SMP 956 int forwarded = 0; 957 int cpu; 958 #endif 959 960 ke = td->td_kse; 961 mtx_assert(&sched_lock, MA_OWNED); 962 KASSERT(ke->ke_state != KES_ONRUNQ, 963 ("sched_add: kse %p (%s) already in run queue", ke, 964 ke->ke_proc->p_comm)); > >>OK, right now I'm testing with all of Stephan's patches + the > >>MUTEX_WAKE_ALL flag. Uptime is 3 3/4 hour and looking good. > > I've just resurfaced after a week of hell at work and home > not BAD hell but more like "busy as hell". > > I'm just integrating your fixes into my tree to understand what they do.. > expect more mail from me later. > > > > > > >Great. > > > >Your attached diff contained all the fixes needed and I don't see the > >need to post a cumulative patch. > > > >The only thing left to do is migrate a critical sections from > >kern_mutex.c to subr_turnstile.c for readability. > >(no functional changes) > > > >Maybe it would also better to just force MUTEX_WAKE_ALL in > >kern_mutex.c (#ifndef MUTEX_WAKE_ALL \n#define MUTEX_WAKE_ALL\n#endif) > >to avoid temporary configuration file pollution? > > > > Stephan > > > > -- Peter Holm