From owner-freebsd-stable@FreeBSD.ORG Fri May 25 09:07:47 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ABC7F16A400 for ; Fri, 25 May 2007 09:07:47 +0000 (UTC) (envelope-from volker@vwsoft.com) Received: from frontmail.ipactive.de (frontmail.maindns.de [85.214.95.103]) by mx1.freebsd.org (Postfix) with ESMTP id 368E713C447 for ; Fri, 25 May 2007 09:07:46 +0000 (UTC) (envelope-from volker@vwsoft.com) Received: from mail.vtec.ipme.de (Q7da4.q.ppp-pool.de [89.53.125.164]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by frontmail.ipactive.de (Postfix) with ESMTP id 11714128846; Fri, 25 May 2007 11:07:40 +0200 (CEST) Received: from epeios.sz.vwsoft.com (epeios.sz.vwsoft.com [192.168.16.5]) by mail.vtec.ipme.de (Postfix) with ESMTP id 99DED3FA01; Fri, 25 May 2007 11:07:09 +0200 (CEST) Message-ID: <4656A73E.9040109@vwsoft.com> Date: Fri, 25 May 2007 11:07:10 +0200 From: Volker User-Agent: Thunderbird 2.0.0.0 (X11/20070521) MIME-Version: 1.0 To: Kris Kennaway References: <200705230717.l4N7HuPW010071@lurza.secnetix.de> <465408F9.6080302@vwsoft.com> <4654C0C4.2030405@vwsoft.com> <20070523215818.GB64723@xor.obsecurity.org> In-Reply-To: <20070523215818.GB64723@xor.obsecurity.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-VWSoft-MailScanner: Found to be clean X-MailScanner-From: volker@vwsoft.com X-ipactive-MailScanner-Information: Please contact the ISP for more information X-ipactive-MailScanner: Found to be clean X-ipactive-MailScanner-From: volker@vwsoft.com Cc: rmiranda@digitalrelay.ca, freebsd-stable@FreeBSD.ORG Subject: Re: ghosthunting: machine freeze 6.2R X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2007 09:07:47 -0000 Kris, Roger & all, On 05/23/07 23:58, Kris Kennaway wrote: > On Wed, May 23, 2007 at 11:31:32PM +0100, Volker wrote: >> talking to myself... ;) >> >> On 2007-05-23 10:27, Volker wrote: >> Unfortunately three hours later, the machine died completely. It has >> been a hardware failure which came quietly. >> >> Sorry for the noise I've put on this list but when experiencing >> things like that, one has to think in all possible directions (I >> first thought about a DoS attack). > > Even though it turned out to be a hardware failure, it was helpful to > publicize this fact. It is often difficult to convince users to > accept the possibility that hardware failure may be the cause of weird > system behaviour, because "it has always been fine". It is worth > remembering that if your hardware is going to fail, then there is > going to be a first time. well, we replaced the broken machine (totally different hardware), took one of the mirrored hard disks into this replacement machine and took this replacement into production. Unfortunately it took less than 16 hours for this replacement machine to also freeze. My assumption is, the freeze itself has nothing to do with bad hardware, as it's now happening on two different machines. This replacement doesn't have em NICs but gives the same bad behavior (so I also think, it's not em related). As I really do want to know what's going on, I'm now compiling a new world + kernel with WITNESS and INVARIANTS support and see if I can catch something. I'm using the following additional kernel options: makeoptions DEBUG=-g options KDB options KDB_UNATTENDED options KDB_TRACE options DDB options WITNESS options WITNESS_SKIPSPIN options INVARIANTS options INVARIANT_SUPPORT options DIAGNOSTIC options PANIC_REBOOT_WAIT_TIME=60 Suggestions on these options? Anything more to enable with massive performace loss? `uname -v': FreeBSD 6.2-RELEASE-p1 #0: Sun Feb 11 22:35:18 CET 2007 While it's now in a box with an Athlon XP, it's still i386 binary. Anything else I can additionally do to debug these freezes? Thx Volker dmesg: Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE-p1 #0: Sun Feb 11 22:35:18 CET 2007 root@GwMbg.elbekies.net:/usr/obj/usr/src/sys/GwMbg Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) XP (1198.83-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x681 Stepping = 1 Features=0x383fbff AMD Features=0xc0480800 real memory = 1073676288 (1023 MB) avail memory = 1041526784 (993 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe0000000-0xe3ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) rl0: port 0xd400-0xd4ff mem 0xdfffbf00-0xdfffbfff irq 16 at device 12.0 on pci0 miibus0: on rl0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl0: Ethernet address: 00:80:48:15:3f:26 atapci0: port 0xec00-0xec07,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc0f,0xd800-0xd8ff irq 20 at device 15.0 on pci0 ata2: on atapci0 ata3: on atapci0 atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0 ata0: on atapci1 ata1: on atapci1 uhci0: port 0xc400-0xc41f irq 21 at device 16.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xc800-0xc81f irq 21 at device 16.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xcc00-0xcc1f irq 21 at device 16.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: port 0xd000-0xd01f irq 21 at device 16.3 on pci0 uhci3: [GIANT-LOCKED] usb3: on uhci3 usb3: USB revision 1.0 uhub3: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered ehci0: mem 0xdfffbd00-0xdfffbdff irq 21 at device 16.4 on pci0 ehci0: [GIANT-LOCKED] usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: on ehci0 usb4: USB revision 2.0 uhub4: VIA EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub4: 8 ports with 8 removable, self powered isab0: at device 17.0 on pci0 isa0: on isab0 pci0: at device 17.5 (no driver attached) vr0: port 0xbc00-0xbcff mem 0xdfffbc00-0xdfffbcff irq 23 at device 18.0 on pci0 miibus1: on vr0 ukphy0: on miibus1 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vr0: Ethernet address: 00:13:8f:0f:14:d2 acpi_button1: on acpi0 fdc0: port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: port 0x378-0x37f,0x778-0x77b irq 7 drq 0 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse, device ID 3 pmtimer0 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <12 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1198831745 Hz quality 800 Timecounters tick every 1.000 msec Fast IPsec: Initialized Security Association Processing. acd0: DVDROM at ata1-master UDMA33 ad4: 76293MB at ata2-master SATA150 ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode ar0: 76293MB status: DEGRADED ar0: disk0 DOWN no device found for this subdisk ar0: disk1 READY (mirror) using ad4 at ata2-master cd0 at ata1 bus 0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers