From owner-freebsd-bugs Tue Oct 8 22:10:03 1996 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id WAA06338 for bugs-outgoing; Tue, 8 Oct 1996 22:10:03 -0700 (PDT) Received: (from gnats@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id WAA06332; Tue, 8 Oct 1996 22:10:02 -0700 (PDT) Resent-Date: Tue, 8 Oct 1996 22:10:02 -0700 (PDT) Resent-Message-Id: <199610090510.WAA06332@freefall.freebsd.org> Resent-From: gnats (GNATS Management) Resent-To: freebsd-bugs Resent-Reply-To: FreeBSD-gnats@freefall.FreeBSD.org, peter@newton.dialix.com.au Received: from newton.dialix.com.au (newton.dialix.com.au [192.203.228.8]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id WAA05985 for ; Tue, 8 Oct 1996 22:03:29 -0700 (PDT) Received: (from peter@localhost) by newton.dialix.com.au (8.7.6/8.7.3) id NAA02004; Wed, 9 Oct 1996 13:03:19 +0800 (WST) Message-Id: <199610090503.NAA02004@newton.dialix.com.au> Date: Wed, 9 Oct 1996 13:03:19 +0800 (WST) From: Peter Wemm Reply-To: peter@newton.dialix.com.au To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.2 Subject: kern/1744: run queue or proc list smashed 4 times in 2 days Sender: owner-bugs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >Number: 1744 >Category: kern >Synopsis: run queue or proc list smashed 4 times in 2 days >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Oct 8 22:10:01 PDT 1996 >Last-Modified: >Originator: Peter Wemm >Organization: What, here? :-) >Release: FreeBSD 2.2-961004-SNAP i386 >Environment: Vanilla i486 box, 16M, 2 IDE drives and one slow SCSI drive on an AHA1542CF. FreeBSD newton.dialix.com.au 2.2-961004-SNAP FreeBSD 2.2-961004-SNAP #30: Tue Oct 8 06:34:52 WST 1996 peter@newton.dialix.com.au:/home2/src/sys/compile/NEWTON i386 >Description: Normally, this is a quiet machine, but it's taken a nose-dive in stability in the last two days. It's been faulting like this: WARNING: / was not properly dismounted. Fatal trap 12: page fault while in kernel mode fault virtual address = 0x4 fault code = supervisor write, page not present instruction pointer = 0x8:0xf01aa108 stack pointer = 0x10:0xefbffe0c frame pointer = 0x10:0xefbffe30 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = Idle interrupt mask = net tty bio panic: page fault Syncing disks... Fatal trap 12: page fault while in kernel mode fault virtual address = 0x10 fault code = supervisor read, page not present instruction pointer = 0x8:0xf012925a stack pointer = 0x10:0xefbffc88 frame pointer = 0x10:0xefbffc98 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = Idle interrupt mask = net tty bio panic: page fault dumping to dev 20001, offset 32768 dump 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 In this particular case, it died in cpu_switch about line 364: /* XX update whichqs? */ btrl %ebx,%edi /* clear q full status */ leal _qs(,%ebx,8),%eax /* select q */ movl %eax,%esi movl P_FORW(%eax),%ecx /* unlink from front of process q */ movl P_FORW(%ecx),%edx movl %edx,P_FORW(%eax) movl P_BACK(%ecx),%eax movl %eax,P_BACK(%edx) ^^^^^^^^^^^^^^^^^^^^^^^^^ cmpl P_FORW(%ecx),%esi /* q empty */ je 3f The backtrace looks like this: [.. rest of trap processing ..] #13 0xf01a2ce1 in calltrap () #14 0xf010e6bd in tsleep () #15 0xf0120327 in sbwait () #16 0xf011f0e3 in soreceive () #17 0xf0121b90 in recvit () #18 0xf0121dff in recvfrom () #19 0xf01ab0d3 in syscall () #20 0xf01a2d35 in Xsyscall () The process that was running was either of: UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 1 176 4 0 2 0 208 0 sbwait SWs ?? 0:00.00 (rwhod) 0 27386 4 1 2 0 148 0 sbwait Ss ?? 0:00.00 (comsat) This particular kernel is not running any modified code. The other three dumps were quite similar, but I din't have the disk space at the time to save them for analysis. >How-To-Repeat: I don't think this box is doing anything unusual, apart from cvsup which makes it sweat a fair bit. (a 6.5MB process on a 16M machine that's doing other things is hard work :-) >Fix: >Audit-Trail: >Unformatted: