Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Jun 1997 14:34:54 +0930 (CST)
From:      Michael Smith <msmith@atrad.adelaide.edu.au>
To:        hackers@freebsd.org
Subject:   weird scheduler crash (2.2)
Message-ID:  <199706020504.OAA18291@genesis.atrad.adelaide.edu.au>

next in thread | raw e-mail | index | archive | help

Hmm.  We've been trying for several weeks now to find a cause for the
occasional crashes we're seeing on our radar controllers.

We've finally managed to reproduce one here in the lab, but as luck 
has it, I can't make sense of its complaint.  :

Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x0
fault code		= supervisor write, page not present
instruction pointer	= 0x8:0xf01c8310
stack pointer		= 0x10:0xefbffd7c
frame pointer		= 0x10:0xefbffd8c
code segment		= base 0x0, limit 0xfffff, type 0x1b
			  DPL 0, PRES 1, DEF32 1, gran 1
processor eflags	= resume, IOPL = 3
current process		= 690 (exptd)
interrupt mask		= net, tty, bio
kernel: type 12 trap, code = 0
Stopped at	set_nort+0x25	movl	%eax,0(%ecx)
db> trace
set_nort(f0ca8a00) at set_nort+0x25
_selwakeup(f0204330) at _selwakeup+0x69
_logwakeup(2,efbffe48,5,0,efbffdf4) at _logwakeup+0x16
_printf(f01c8e2c,c,f01c871f,f01c8e25) at _printf+0x50
_trap_fatal(efbffe48,0,f0d0cc00,c,f0d20700) at _trap_fatal+0x5f
_trap_pfault(efbffe48,0,ffffffff,278,3) at _trap_pfault+0x11c
_trap(10,10,3,278,efbffe88) at _trap+0x2ab
calltrap() at calltrap+0x15
--- trap 0xc, eip = 0xf0117408, esp = 0xefbffe84, ebp = 0xefbffe88 ---
_unsleep(f0d0cc00) at _unsleep+0x48
_selwakeup(f0214348) at _selwakeup+0x76
_mdsiointr(0,10,f020f9dc,118,ffffffff) at _mdsiointr+0x184
_Xfastintr10(f020f9dc,118,f011cb84,b,f01f5748) at _Xfastintr10+0x17
_select(f0d0cc00,efbfff94,efbfff84) at _select+0x2e2
_syscall(27,27,4,4,efbf77d4) at _syscall+0x127
_Xsyscall() at _Xsyscall+0x35
--- syscall 0x5d, eip = 0x7c945, esp = 0xefbf7568, ebp = efbf77d4 ---

The kernel couldn't be convinced to do a dump either, so this is all I
have.  It looks like the driver (mdsio) took an interrupt during a
select syscall which in turn resulted in the driver trying to wake the
selecting process up again.

Is the set_nort stuff relevant?  Is this, perhaps, a screwup in the
select code in (my) mdsio driver?  If so, how?

select+0x2e2 is 0x9ee in (this) sys_generic.o, which looks like :

 617:../../kern/sys_generic.c ****      error = tsleep((caddr_t)&selwait, PSOCK 
| PCATCH, "select", timo);
 1910                           .stabd 68,0,617
 1911 09d7 FF75D8               pushl -40(%ebp)
 1912 09da 68040700             pushl $LC0
 1912      00
 1913 09df 68180100             pushl $280
 1913      00
 1914 09e4 68000000             pushl $_selwait
 1914      00
 1915 09e9 E812F6FF             call _tsleep
 1915      FF
 1916 09ee 89C3                 movl %eax,%ebx

so I think it was actually asleep at the time.

-- 
]] Mike Smith, Software Engineer        msmith@gsoft.com.au             [[
]] Genesis Software                     genesis@gsoft.com.au            [[
]] High-speed data acquisition and      (GSM mobile)     0411-222-496   [[
]] realtime instrument control.         (ph)          +61-8-8267-3493   [[
]] Unix hardware collector.             "Where are your PEZ?" The Tick  [[



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199706020504.OAA18291>