From owner-freebsd-hackers Sun Jun 1 22:05:08 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id WAA26965 for hackers-outgoing; Sun, 1 Jun 1997 22:05:08 -0700 (PDT) Received: from genesis.atrad.adelaide.edu.au (genesis.atrad.adelaide.edu.au [129.127.96.120]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id WAA26959 for ; Sun, 1 Jun 1997 22:05:03 -0700 (PDT) Received: (from msmith@localhost) by genesis.atrad.adelaide.edu.au (8.8.5/8.7.3) id OAA18291 for hackers@freebsd.org; Mon, 2 Jun 1997 14:34:54 +0930 (CST) From: Michael Smith Message-Id: <199706020504.OAA18291@genesis.atrad.adelaide.edu.au> Subject: weird scheduler crash (2.2) To: hackers@freebsd.org Date: Mon, 2 Jun 1997 14:34:54 +0930 (CST) X-Mailer: ELM [version 2.4ME+ PL28 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hmm. We've been trying for several weeks now to find a cause for the occasional crashes we're seeing on our radar controllers. We've finally managed to reproduce one here in the lab, but as luck has it, I can't make sense of its complaint. : Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x8:0xf01c8310 stack pointer = 0x10:0xefbffd7c frame pointer = 0x10:0xefbffd8c code segment = base 0x0, limit 0xfffff, type 0x1b DPL 0, PRES 1, DEF32 1, gran 1 processor eflags = resume, IOPL = 3 current process = 690 (exptd) interrupt mask = net, tty, bio kernel: type 12 trap, code = 0 Stopped at set_nort+0x25 movl %eax,0(%ecx) db> trace set_nort(f0ca8a00) at set_nort+0x25 _selwakeup(f0204330) at _selwakeup+0x69 _logwakeup(2,efbffe48,5,0,efbffdf4) at _logwakeup+0x16 _printf(f01c8e2c,c,f01c871f,f01c8e25) at _printf+0x50 _trap_fatal(efbffe48,0,f0d0cc00,c,f0d20700) at _trap_fatal+0x5f _trap_pfault(efbffe48,0,ffffffff,278,3) at _trap_pfault+0x11c _trap(10,10,3,278,efbffe88) at _trap+0x2ab calltrap() at calltrap+0x15 --- trap 0xc, eip = 0xf0117408, esp = 0xefbffe84, ebp = 0xefbffe88 --- _unsleep(f0d0cc00) at _unsleep+0x48 _selwakeup(f0214348) at _selwakeup+0x76 _mdsiointr(0,10,f020f9dc,118,ffffffff) at _mdsiointr+0x184 _Xfastintr10(f020f9dc,118,f011cb84,b,f01f5748) at _Xfastintr10+0x17 _select(f0d0cc00,efbfff94,efbfff84) at _select+0x2e2 _syscall(27,27,4,4,efbf77d4) at _syscall+0x127 _Xsyscall() at _Xsyscall+0x35 --- syscall 0x5d, eip = 0x7c945, esp = 0xefbf7568, ebp = efbf77d4 --- The kernel couldn't be convinced to do a dump either, so this is all I have. It looks like the driver (mdsio) took an interrupt during a select syscall which in turn resulted in the driver trying to wake the selecting process up again. Is the set_nort stuff relevant? Is this, perhaps, a screwup in the select code in (my) mdsio driver? If so, how? select+0x2e2 is 0x9ee in (this) sys_generic.o, which looks like : 617:../../kern/sys_generic.c **** error = tsleep((caddr_t)&selwait, PSOCK | PCATCH, "select", timo); 1910 .stabd 68,0,617 1911 09d7 FF75D8 pushl -40(%ebp) 1912 09da 68040700 pushl $LC0 1912 00 1913 09df 68180100 pushl $280 1913 00 1914 09e4 68000000 pushl $_selwait 1914 00 1915 09e9 E812F6FF call _tsleep 1915 FF 1916 09ee 89C3 movl %eax,%ebx so I think it was actually asleep at the time. -- ]] Mike Smith, Software Engineer msmith@gsoft.com.au [[ ]] Genesis Software genesis@gsoft.com.au [[ ]] High-speed data acquisition and (GSM mobile) 0411-222-496 [[ ]] realtime instrument control. (ph) +61-8-8267-3493 [[ ]] Unix hardware collector. "Where are your PEZ?" The Tick [[