From owner-freebsd-hardware@FreeBSD.ORG Thu Mar 18 10:56:36 2004 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D6B7A16A4CE for ; Thu, 18 Mar 2004 10:56:36 -0800 (PST) Received: from agena.meridian-enviro.com (agena.meridian-enviro.com [207.109.235.133]) by mx1.FreeBSD.org (Postfix) with ESMTP id AECF043D31 for ; Thu, 18 Mar 2004 10:56:35 -0800 (PST) (envelope-from rand@meridian-enviro.com) Received: from delta.meridian-enviro.com (delta.meridian-enviro.com [10.10.10.43])i2IIuXe52741 for ; Thu, 18 Mar 2004 12:56:33 -0600 (CST) (envelope-from rand@meridian-enviro.com) Date: Thu, 18 Mar 2004 12:56:32 -0600 Message-ID: <87fzc6gf1b.wl@delta.meridian-enviro.com> From: "Douglas K. Rand" To: freebsd-hardware@freebsd.org User-Agent: Wanderlust/2.10.1 (Watching The Wheels) SEMI/1.14.5 (Awara-Onsen) FLIM/1.14.5 (Demachiyanagi) APEL/10.6 MULE XEmacs/21.4 (patch 14) (Reasonable Discussion) (i386--freebsd) X-Face: $L%T~#'9fAQ])o]A][d7EH`V;"_;2K;TEPQB=v]rDf_2s% List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Mar 2004 18:56:37 -0000 --Multipart_Thu_Mar_18_12:56:32_2004-1 Content-Type: text/plain; charset=US-ASCII I'm having what is probably a hardware problem on a system that just hangs every 6-36 hours, and I'm wondering if anybody has any ideas for things I could try. Its a RELENG_4_8 system with DDB, DDB_UNATTENDED, and ALT_BREAK_TO_DEBUGGER kernel options set. (Its on a serial console, thats why the ALT_BREAK_TO_DEBUGGER option.) Its an Athlon 3200+ on a Gigabyte GA-7N400-L mobo, with two 512MB PC3200 DDR DIMMs, and a 2 port 3ware controller and 2 Deskstar 180 GXP disks. The power supply is an Antec TruePower 380W. The system ran perfectly for about 60 days, and then started having this problem. In almost all cases the system will simply hang, there is no response from the console or network, and the CR ~ ^B sequence will not get me to the kernel debugger. (I've tested this when the system is running fine and I do get the kernel debugger.) The only solution is to reset or power cycle the system. It has crashed 3 times with a Fatal trap 12: page fault while in kernel mode panic, and one time it simply rebooted as if someone pressed the reset button. But it has simply hung 18 times. I've tried running with only one DIMM, and when the system died 3 times with that DIMM, I tried running with only the other DIMM, and it still dies. I've replaced the power supply with an Antec 400W, and the system still dies. I even replaced the power cord. I've tried both the stock 4.8 twe driver and 3ware's beta driver, both still die. I replaced the onboard NIC with an Intel Etherexpress Pro, and the system still dies. I don't think its temperature related, I've run the system with the case open and on its side, and a continous mbmon output shows no temperature increases just before the system hangs. (A representative output from mbmon is: Temp.= 75.2, 113.0, 86.0; Rot.= 4821, 2636, 0 Vcore = 1.70, 2.74; Volt. = 3.31, 4.14, 11.55, -5.29, -2.05 I've got a ThermalTake Volcano 11+ cooler on the CPU. I don't think the problems are load related, as it carries very high loads with out hanging, and I've had it hang with fairly light loads. I've attached the dmesg and kernel config files. If anybody has any suggestions I'd be thrilled. I'm up to replacing either the CPU or the mobo, neither of which I'm looking forward too. --Multipart_Thu_Mar_18_12:56:32_2004-1 Content-Type: application/octet-stream Content-Disposition: attachment; filename="dmesg" Content-Transfer-Encoding: quoted-printable Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.8-RELEASE-p16 #6: Wed Mar 17 14:46:41 CST 2004 rand@snow.meridian-enviro.com:/usr/obj/usr/src/sys/SNOW Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 2191242163 Hz CPU: AMD Athlon(tm) XP 3200+ (2191.24-MHz 686-class CPU) Origin =3D "AuthenticAMD" Id =3D 0x6a0 Stepping =3D 0 Features=3D0x383fbff AMD Features=3D0xc0400000 real memory =3D 536805376 (524224K bytes) avail memory =3D 519462912 (507288K bytes) Preloaded elf kernel "kernel" at 0xc02db000. Pentium Pro MTRR support enabled Using $PIR table, 11 entries at 0xc00fcda0 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pci0: (vendor=3D0x10de, dev=3D0x01eb) at 0.1 pci0: (vendor=3D0x10de, dev=3D0x01ee) at 0.2 pci0: (vendor=3D0x10de, dev=3D0x01ed) at 0.3 pci0: (vendor=3D0x10de, dev=3D0x01ec) at 0.4 pci0: (vendor=3D0x10de, dev=3D0x01ef) at 0.5 isab0: at device 1.0 on p= ci0 isa0: on isab0 pci0: (vendor=3D0x10de, dev=3D0x0064) at 1.1 irq 11 pcib1: at device 8.0 on p= ci0 pci1: on pcib1 pci1: <3Dfx Voodoo 3 graphics accelerator> at 6.0 irq 12 fxp0: port 0xd400-0xd43f mem 0xe7800000-0= xe781ffff,0xe7821000-0xe7821fff irq 10 at device 7.0 on pci1 fxp0: Ethernet address 00:02:b3:e7:ab:6e inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto twe0: <3ware Storage Controller> port 0xd800-0xd80f mem 0xe7000000-0xe77fff= ff,0xe7820000-0xe782000f irq 11 at device 9.0 on pci1 twe0: 2 ports, Firmware FE7X 1.05.00.050, BIOS BE7X 1.08.00.046 atapci0: port 0xf000-0xf00f at device 9.0 on p= ci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pcib2: at device 30.0 on = pci0 pci2: on pcib2 orm0: