From owner-freebsd-hackers  Mon Oct 21 14: 2:55 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9311237B401
	for <hackers@freebsd.org>; Mon, 21 Oct 2002 14:02:51 -0700 (PDT)
Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123])
	by mx1.FreeBSD.org (Postfix) with ESMTP id EE64A43E4A
	for <hackers@freebsd.org>; Mon, 21 Oct 2002 14:02:50 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0018.cvx40-bradley.dialup.earthlink.net ([216.244.42.18] helo=mindspring.com)
	by swan.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 183jhH-0003zY-00; Mon, 21 Oct 2002 14:02:48 -0700
Message-ID: <3DB46B19.EC096B5F@mindspring.com>
Date: Mon, 21 Oct 2002 14:01:13 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Diego Wentz Antunes <devlware@terra.com.br>
Cc: hackers@FreeBSD.org
Subject: Re: Kernel Panic Problems
References: <3DB44E09.4090405@terra.com.br>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Diego Wentz Antunes wrote:
> >>     I have been experiencing  several kernel panics from differents
> >> situations, since a ls to just boot the kernel.
> >> I configured all the options in rc.conf to save the core dump from
> >> memory to HD and some of the results are
> >> here in the file panics. Above all I search at internet some information
> >> to try to explain this recursive panics
> >> and found that it could be some memory problem. Is there a way to make a
> >> hard test with memory?
> >>    I'm uncertainty if it is the memory because the PC stayed turned on
> >> for 6 days without any problem!
> >>    Any comments will be welcome!

Panic #1:
---
#0  dumpsys () at ../../kern/kern_shutdown.c:487
487             if (dumping++) {
(kgdb) where
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc0164d4b in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2  0xc0165189 in panic (fmt=0xc02ae96c "%s") at ../../kern/kern_shutdown.c:595
#3  0xc02623ab in trap_fatal (frame=0xc3e8be4c, eva=0) at
../../i386/i386/trap.c:966
#4  0xc0262059 in trap_pfault (frame=0xc3e8be4c, usermode=0, eva=0) at
../../i386/i386/trap.c:859
#5  0xc0261bff in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi =
671703040, tf_esi = 0, 
        tf_ebp = 0, tf_isp = -1008157064, tf_ebx = -1008183320, tf_edx =
-1087061161, tf_ecx = -1008183320, 
        tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags
= 66118, 
        tf_esp = -1071632535, tf_ss = 8}) at ../../i386/i386/trap.c:458
(kgdb) quit
---

Is this a full backtrace?  I don't see any way that the stack
could have started with "trap_pfault"... it had to be running
something to cause a page fault.


Panic #2:
---
...
#8  0xc0261bff in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi =
-1064400440, tf_esi = -1007800320, 
      tf_ebp = -1007805296, tf_isp = -1007805336, tf_ebx = 47288, tf_edx =
-1690778642, tf_ecx = 821789308, 
      tf_eax = -56600120, tf_trapno = 12, tf_err = 0, tf_eip = -1071249494,
tf_cs = 8, tf_eflags = 66070, 
      tf_esp = -1064405504, tf_ss = 2606062}) at ../../i386/i386/trap.c:458
#9  0xc02607aa in generic_bcopy ()
#10 0xc0247c30 in scstart (tp=0xc0879b00) at ../../dev/syscons/syscons.c:1285
#11 0xc017c1e4 in ttstart (tp=0xc0879b00) at ../../kern/tty.c:1401
#12 0xc017ccb9 in ttwrite (tp=0xc0879b00, uio=0xc3ee1ed4, flag=8323073) at
../../kern/tty.c:1957
...
---

This one stops being possible at #9; specifically, there is no
version of syscons.c that, in scstart, calls generic_bcopy()
directly.  The only functions it calls directly are q_to_b(),
which is a copy, but the function which does it is not static,
and has a global definition, and therefore should show up in
the stack trace.  Similarly, the sc_puts() is also called.

None of this really matches 4.4, 4.6, or -current syscons.c,
so more information is needed, but it's unlikely that syscons
has changed and changed back, so significantly.  You need to
look at the code at dev/syscons/syscons.c:1285 in your own
source tree, which seems to differ significantly from the source
tree the rest of us are using.


Panic #3:
---
#4  0xc0262059 in trap_pfault (frame=0xc3e6ce60, usermode=0, eva=198) at
../../i386/i386/trap.c:859
#5  0xc0261bff in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi =
135077888, tf_esi = -25115817, 
      tf_ebp = -1008283996, tf_isp = -1008284020, tf_ebx = 158, tf_edx =
1153435399, tf_ecx = -1008314576, tf_eax = 0, 
      tf_trapno = 12, tf_err = 2, tf_eip = -1071660533, tf_cs = 8, tf_eflags =
66050, tf_esp = 16560, tf_ss = -1008283852})
    at ../../i386/i386/trap.c:458
#6  0xc01fc20b in vm_object_reference (object=0x9e) at ../../vm/vm_object.c:243
#7  0xc01f5f6c in vm_fault (map=0xc357fe80, vaddr=135077888, fault_type=3
'\003', fault_flags=8) at ../../vm/vm_fault.c:254
#8  0xc0261fee in trap_pfault (frame=0xc3e6cfa8, usermode=1, eva=135077892) at
../../i386/i386/trap.c:839
#9  0xc0261ab3 in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0,
tf_esi = 135055736, tf_ebp = -1077939980, 
      tf_isp = -1008283692, tf_ebx = 135077880, tf_edx = 15, tf_ecx = 135055786,
tf_eax = 0, tf_trapno = 12, tf_err = 6, 
      tf_eip = 134584827, tf_cs = 31, tf_eflags = 66118, tf_esp = -1077940020,
tf_ss = 47}) at ../../i386/i386/trap.c:369
#10 0x80599fb in ?? ()
#11 0x80599d5 in ?? ()
---

You should run "ps" in the kernel debugger, to determine what
program was active at the time, and then debug that program to
find out what source code was being referenced at 0x80599fb that
caused the trap in the first place.

The trap in this case is a page fault on a user space address,
which, during lookup, caused an attempt to call vm_obect_reference(),
which then caused an unexpected page fault.

Most likely this is a page dirty of a memory mapped object, for
which there is no remaining memory in the system to handle the
page being dirtied.

Again, your source code does not match 4.4, 4.6, or -current,
since the line number is way off in vm_object.c.  You will need
to list the source code at the fault address on your own, or
provide us with a way to match your source code (e.g. a CVS tag
that you used to check out, which was not a moving target -- a
release tag or some other tag, rather than a RELENG tag).

Just from a completeness standpoint, it's pretty obvious that
you should uncomment the KASSERT() in vm_object_reference(), to
see if it traps the problem earlier than in a second fault handler.


--

As a general note, you should have reported these problems
seperately, even if you thought they were related, since they
most likely have different root causes, unless you are doing
something to cause them yourself, like overclocking your CPU
or memory.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message