From owner-freebsd-hackers@FreeBSD.ORG Tue Oct 24 23:10:28 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C267216A416 for ; Tue, 24 Oct 2006 23:10:28 +0000 (UTC) (envelope-from spork@fasttrackmonkey.com) Received: from angryfist.fasttrackmonkey.com (angryfist.fasttrackmonkey.com [216.220.107.230]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17A2243D5A for ; Tue, 24 Oct 2006 23:10:26 +0000 (GMT) (envelope-from spork@fasttrackmonkey.com) Received: (qmail 78826 invoked by uid 2003); 24 Oct 2006 23:11:16 -0000 Received: from spork@fasttrackmonkey.com by angryfist.fasttrackmonkey.com by uid 1001 with qmail-scanner-1.20 (clamscan: 0.65. Clear:RC:1(216.220.116.154):. Processed in 0.096044 secs); 24 Oct 2006 23:11:16 -0000 Received: from unknown (HELO white.nat.fasttrackmonkey.com) (216.220.116.154) by 0 with (DHE-RSA-AES256-SHA encrypted) SMTP; 24 Oct 2006 23:11:16 -0000 Date: Tue, 24 Oct 2006 19:10:18 -0400 (EDT) From: Charles Sprickman X-X-Sender: spork@white.nat.fasttrackmonkey.com To: freebsd-hackers@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Panic caused by bad memory? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Oct 2006 23:10:28 -0000 Hello all, Without a full dump are there any telltale signs from the panic message that can give me some sign of whether I'm dealing with a hardware or software issue? I have a box that has been running 4.11-p10 for quite some time with no problems. I upgraded a number of ports (apache/php/mysql) and since then I've had two panics. Of course userland apps shouldn't cause this, but that's the only change I see. I can't get a kernel dump since it fails like this each time: dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error The meat of my question though, what are these lines telling me: (panic 1) instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c (panic 2) instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c Are those physical memory addresses where the code that caused the panic resides? If so, does that point to bad RAM? Thanks, Charles Here's more info if anyone is curious: [-- MARK -- Mon Oct 23 06:00:00 2006] Fatal trap 12: page fault while in kernel mode mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 fault virtual address = 0xc327c614 fault code = supervisor read, page not present instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 8 (syncer) interrupt mask = none <- SMP: XXX trap number = 12 panic: page fault mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 syncing disks... panic: rslock: cpu: 0, addr: 0xc0391ccc, lock: 0x00000001 mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 Uptime: 441d9h31m5s dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset called on cpu#0 cpu_reset: Stopping other CPUs [-- MARK -- Tue Oct 24 09:00:00 2006] Fatal trap 12: page fault while in kernel mode mp_lock = 01000002; cpuid = 1; lapic.id = 01000000 fault virtual address = 0xc29d2b94 fault code = supervisor read, page not present instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 8 (syncer) interrupt mask = none <- SMP: XXX trap number = 12 panic: page fault mp_lock = 01000002; cpuid = 1; lapic.id = 01000000 boot() called on cpu#1 syncing disks... panic: rslock: cpu: 1, addr: 0xc0391ccc, lock: 0x01000001 mp_lock = 01000002; cpuid = 1; lapic.id = 01000000 boot() called on cpu#1 Uptime: 1d2h55m38s dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset called on cpu#1 cpu_reset: Stopping other CPUs cpu_reset: Restarting BSP cpu_reset_proxy: Grabbed mp locckp uf_re sBeStP: BSP did not grab mp lock