From owner-freebsd-hackers Mon Oct 21 14: 2:55 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9311237B401 for ; Mon, 21 Oct 2002 14:02:51 -0700 (PDT) Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id EE64A43E4A for ; Mon, 21 Oct 2002 14:02:50 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0018.cvx40-bradley.dialup.earthlink.net ([216.244.42.18] helo=mindspring.com) by swan.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 183jhH-0003zY-00; Mon, 21 Oct 2002 14:02:48 -0700 Message-ID: <3DB46B19.EC096B5F@mindspring.com> Date: Mon, 21 Oct 2002 14:01:13 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Diego Wentz Antunes Cc: hackers@FreeBSD.org Subject: Re: Kernel Panic Problems References: <3DB44E09.4090405@terra.com.br> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Diego Wentz Antunes wrote: > >> I have been experiencing several kernel panics from differents > >> situations, since a ls to just boot the kernel. > >> I configured all the options in rc.conf to save the core dump from > >> memory to HD and some of the results are > >> here in the file panics. Above all I search at internet some information > >> to try to explain this recursive panics > >> and found that it could be some memory problem. Is there a way to make a > >> hard test with memory? > >> I'm uncertainty if it is the memory because the PC stayed turned on > >> for 6 days without any problem! > >> Any comments will be welcome! Panic #1: --- #0 dumpsys () at ../../kern/kern_shutdown.c:487 487 if (dumping++) { (kgdb) where #0 dumpsys () at ../../kern/kern_shutdown.c:487 #1 0xc0164d4b in boot (howto=256) at ../../kern/kern_shutdown.c:316 #2 0xc0165189 in panic (fmt=0xc02ae96c "%s") at ../../kern/kern_shutdown.c:595 #3 0xc02623ab in trap_fatal (frame=0xc3e8be4c, eva=0) at ../../i386/i386/trap.c:966 #4 0xc0262059 in trap_pfault (frame=0xc3e8be4c, usermode=0, eva=0) at ../../i386/i386/trap.c:859 #5 0xc0261bff in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = 671703040, tf_esi = 0, tf_ebp = 0, tf_isp = -1008157064, tf_ebx = -1008183320, tf_edx = -1087061161, tf_ecx = -1008183320, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66118, tf_esp = -1071632535, tf_ss = 8}) at ../../i386/i386/trap.c:458 (kgdb) quit --- Is this a full backtrace? I don't see any way that the stack could have started with "trap_pfault"... it had to be running something to cause a page fault. Panic #2: --- ... #8 0xc0261bff in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = -1064400440, tf_esi = -1007800320, tf_ebp = -1007805296, tf_isp = -1007805336, tf_ebx = 47288, tf_edx = -1690778642, tf_ecx = 821789308, tf_eax = -56600120, tf_trapno = 12, tf_err = 0, tf_eip = -1071249494, tf_cs = 8, tf_eflags = 66070, tf_esp = -1064405504, tf_ss = 2606062}) at ../../i386/i386/trap.c:458 #9 0xc02607aa in generic_bcopy () #10 0xc0247c30 in scstart (tp=0xc0879b00) at ../../dev/syscons/syscons.c:1285 #11 0xc017c1e4 in ttstart (tp=0xc0879b00) at ../../kern/tty.c:1401 #12 0xc017ccb9 in ttwrite (tp=0xc0879b00, uio=0xc3ee1ed4, flag=8323073) at ../../kern/tty.c:1957 ... --- This one stops being possible at #9; specifically, there is no version of syscons.c that, in scstart, calls generic_bcopy() directly. The only functions it calls directly are q_to_b(), which is a copy, but the function which does it is not static, and has a global definition, and therefore should show up in the stack trace. Similarly, the sc_puts() is also called. None of this really matches 4.4, 4.6, or -current syscons.c, so more information is needed, but it's unlikely that syscons has changed and changed back, so significantly. You need to look at the code at dev/syscons/syscons.c:1285 in your own source tree, which seems to differ significantly from the source tree the rest of us are using. Panic #3: --- #4 0xc0262059 in trap_pfault (frame=0xc3e6ce60, usermode=0, eva=198) at ../../i386/i386/trap.c:859 #5 0xc0261bff in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = 135077888, tf_esi = -25115817, tf_ebp = -1008283996, tf_isp = -1008284020, tf_ebx = 158, tf_edx = 1153435399, tf_ecx = -1008314576, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = -1071660533, tf_cs = 8, tf_eflags = 66050, tf_esp = 16560, tf_ss = -1008283852}) at ../../i386/i386/trap.c:458 #6 0xc01fc20b in vm_object_reference (object=0x9e) at ../../vm/vm_object.c:243 #7 0xc01f5f6c in vm_fault (map=0xc357fe80, vaddr=135077888, fault_type=3 '\003', fault_flags=8) at ../../vm/vm_fault.c:254 #8 0xc0261fee in trap_pfault (frame=0xc3e6cfa8, usermode=1, eva=135077892) at ../../i386/i386/trap.c:839 #9 0xc0261ab3 in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = 135055736, tf_ebp = -1077939980, tf_isp = -1008283692, tf_ebx = 135077880, tf_edx = 15, tf_ecx = 135055786, tf_eax = 0, tf_trapno = 12, tf_err = 6, tf_eip = 134584827, tf_cs = 31, tf_eflags = 66118, tf_esp = -1077940020, tf_ss = 47}) at ../../i386/i386/trap.c:369 #10 0x80599fb in ?? () #11 0x80599d5 in ?? () --- You should run "ps" in the kernel debugger, to determine what program was active at the time, and then debug that program to find out what source code was being referenced at 0x80599fb that caused the trap in the first place. The trap in this case is a page fault on a user space address, which, during lookup, caused an attempt to call vm_obect_reference(), which then caused an unexpected page fault. Most likely this is a page dirty of a memory mapped object, for which there is no remaining memory in the system to handle the page being dirtied. Again, your source code does not match 4.4, 4.6, or -current, since the line number is way off in vm_object.c. You will need to list the source code at the fault address on your own, or provide us with a way to match your source code (e.g. a CVS tag that you used to check out, which was not a moving target -- a release tag or some other tag, rather than a RELENG tag). Just from a completeness standpoint, it's pretty obvious that you should uncomment the KASSERT() in vm_object_reference(), to see if it traps the problem earlier than in a second fault handler. -- As a general note, you should have reported these problems seperately, even if you thought they were related, since they most likely have different root causes, unless you are doing something to cause them yourself, like overclocking your CPU or memory. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message