From owner-freebsd-hackers@FreeBSD.ORG Wed Aug 31 17:03:50 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 74CE216A41F for ; Wed, 31 Aug 2005 17:03:50 +0000 (GMT) (envelope-from killing@multiplay.co.uk) Received: from multiplay.co.uk (www1.multiplay.co.uk [212.42.16.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 62A5C43D58 for ; Wed, 31 Aug 2005 17:03:49 +0000 (GMT) (envelope-from killing@multiplay.co.uk) Received: from vader ([212.135.219.179]) by multiplay.co.uk (multiplay.co.uk [212.42.16.7]) (MDaemon.PRO.v8.1.0.R) with ESMTP id md50001837733.msg for ; Wed, 31 Aug 2005 17:56:38 +0100 Message-ID: <02db01c5ae4d$e38a1780$b3db87d4@multiplay.co.uk> From: "Steven Hartland" To: Date: Wed, 31 Aug 2005 18:03:13 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2670 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670 X-Spam-Processed: multiplay.co.uk, Wed, 31 Aug 2005 17:56:38 +0100 (not processed: message from valid local sender) X-MDRemoteIP: 212.135.219.179 X-Return-Path: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-hackers@freebsd.org X-MDAV-Processed: multiplay.co.uk, Wed, 31 Aug 2005 17:56:38 +0100 Subject: Debugging an unknown reboot (disk / io related) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2005 17:03:50 -0000 When running a large rsync on one of our machines here it constantly ditches and reboots leaving no traces in the logs or anything. It looks like it could be a driver error but with no crash log or panic message to go on I dont know where to start. The machine is running 5.4-RELEASE-p2 and the latest driver set downloaded and compiled locally. The only error I have to go on is the errors displayed in the ssh session running the rsync. 35111 files to consider rsync: readdir(games/fps/sof2/server): Input/output error (5) rsync: readdir(games/fps/soldner): Input/output error (5) ... ... rsync: mkstemp "/usr/home/ftp/pub/apps/3dmark/win32/.3DMark03.exe.NhcgGA" failed: Input/output error (5) rsync: connection unexpectedly closed (1667283 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(365) rsync: connection unexpectedly closed (1667263 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(365) Segmentation fault root@backup1> I've tried running with witness enabled but it fails to boot with a message about hpt_lock. I also tried originally with the default hptmv driver and no joy. When it crashes it takes the RAID5 with it always dropping the same disk. I've replaced the cable, disk and even plugged the disk direct to the raid controller on a different channel to eliminate the supermicro hotswap bay the disks are mounted in and still no changes the same disk always gets dropped. So the question is what can I try to get more info on what's happening? [dmesg] Aug 31 17:56:28 backup1 syslogd: kernel boot file is /boot/kernel/kernel Aug 31 17:56:28 backup1 kernel: Copyright (c) 1992-2005 The FreeBSD Project. Aug 31 17:56:28 backup1 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Aug 31 17:56:28 backup1 kernel: The Regents of the University of California. All rights reserved. Aug 31 17:56:28 backup1 kernel: FreeBSD 5.4-RELEASE-p2 #6: Thu Jun 23 00:23:54 UTC 2005 Aug 31 17:56:28 backup1 kernel: root@backup1:/.usr/i386/src/sys/i386/compile/MPUK_SMP_200HZ Aug 31 17:56:28 backup1 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Aug 31 17:56:28 backup1 kernel: CPU: AMD Opteron(tm) Processor 244 (1794.41-MHz 686-class CPU) Aug 31 17:56:28 backup1 kernel: Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 Aug 31 17:56:28 backup1 kernel: Features=0x78bfbff Aug 31 17:56:28 backup1 kernel: AMD Features=0xe0500000 Aug 31 17:56:28 backup1 kernel: AMD Features=0xe0500000 Aug 31 17:56:28 backup1 kernel: real memory = 2146893824 (2047 MB) Aug 31 17:56:28 backup1 kernel: avail memory = 2099625984 (2002 MB) Aug 31 17:56:28 backup1 kernel: ACPI APIC Table: Aug 31 17:56:28 backup1 kernel: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs Aug 31 17:56:28 backup1 kernel: cpu0 (BSP): APIC ID: 0 Aug 31 17:56:28 backup1 kernel: cpu1 (AP): APIC ID: 1 Aug 31 17:56:28 backup1 kernel: MADT: Forcing active-low polarity and level trigger for SCI Aug 31 17:56:28 backup1 kernel: ioapic0 irqs 0-23 on motherboard Aug 31 17:56:28 backup1 kernel: ioapic1 irqs 24-27 on motherboard Aug 31 17:56:28 backup1 kernel: ioapic2 irqs 28-31 on motherboard Aug 31 17:56:28 backup1 kernel: npx0: on motherboard Aug 31 17:56:28 backup1 kernel: npx0: INT 16 interface Aug 31 17:56:28 backup1 kernel: acpi0: on motherboard Aug 31 17:56:28 backup1 kernel: acpi0: Power Button (fixed) Aug 31 17:56:28 backup1 kernel: acpi0: Sleep Button (fixed) Aug 31 17:56:28 backup1 kernel: acpi_bus_number: can't get _ADR Aug 31 17:56:28 backup1 last message repeated 2 times Aug 31 17:56:28 backup1 kernel: unknown: I/O range not supported Aug 31 17:56:28 backup1 kernel: unknown: I/O range not supported Aug 31 17:56:28 backup1 kernel: ACPI-1304: *** Error: Method execution failed [\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xc30937a0), AE_AML_BUFFER_LIMIT Aug 31 17:56:28 backup1 kernel: ACPI-0239: *** Error: Method execution failed [\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xc30937a0), AE_AML_BUFFER_LIMIT Aug 31 17:56:28 backup1 kernel: can't fetch resources for \_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT Aug 31 17:56:28 backup1 kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Aug 31 17:56:28 backup1 kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 Aug 31 17:56:28 backup1 kernel: cpu0: on acpi0 Aug 31 17:56:28 backup1 kernel: cpu1: on acpi0 Aug 31 17:56:28 backup1 kernel: acpi_button0: on acpi0 Aug 31 17:56:28 backup1 kernel: pcib0: port 0xcf8-0xcff on acpi0 Aug 31 17:56:28 backup1 kernel: pci0: on pcib0 Aug 31 17:56:28 backup1 kernel: pcib1: at device 1.0 on pci0 Aug 31 17:56:28 backup1 kernel: pci1: on pcib1 Aug 31 17:56:28 backup1 kernel: pci1: at device 0.0 (no driver attached) Aug 31 17:56:28 backup1 kernel: pcib2: at device 6.0 on pci0 Aug 31 17:56:28 backup1 kernel: pci2: on pcib2 Aug 31 17:56:28 backup1 kernel: bge0: mem 0xe8100000-0xe810ffff irq 19 at device 5.0 on pci2 Aug 31 17:56:28 backup1 kernel: miibus0: on bge0 Aug 31 17:56:28 backup1 kernel: brgphy0: on miibus0 Aug 31 17:56:28 backup1 kernel: brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto Aug 31 17:56:28 backup1 kernel: bge0: Ethernet address: 00:0f:ea:7a:50:08 Aug 31 17:56:28 backup1 kernel: atapci0: port 0x3000-0x300f,0x3010-0x3013,0x3018-0x301f,0x3014-0x3017,0x3020-0x3027 mem 0xe8110000-0xe81103ff irq 18 at device 6.0 on pci2 Aug 31 17:56:28 backup1 kernel: ata2: channel #0 on atapci0 Aug 31 17:56:28 backup1 kernel: ata3: channel #1 on atapci0 Aug 31 17:56:28 backup1 kernel: ata4: channel #2 on atapci0 Aug 31 17:56:28 backup1 kernel: ata5: channel #3 on atapci0 Aug 31 17:56:28 backup1 kernel: isab0: at device 7.0 on pci0 Aug 31 17:56:28 backup1 kernel: isa0: on isab0 Aug 31 17:56:28 backup1 kernel: atapci1: port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0 Aug 31 17:56:28 backup1 kernel: ata0: channel #0 on atapci1 Aug 31 17:56:28 backup1 kernel: ata1: channel #1 on atapci1 Aug 31 17:56:28 backup1 kernel: pci0: at device 7.3 (no driver attached) Aug 31 17:56:28 backup1 kernel: pcib3: on acpi0 Aug 31 17:56:28 backup1 kernel: pci8: on pcib3 Aug 31 17:56:28 backup1 kernel: pcib4: at device 3.0 on pci8 Aug 31 17:56:28 backup1 kernel: pci9: on pcib4 Aug 31 17:56:28 backup1 kernel: bge1: mem 0xf8100000-0xf810ffff irq 25 at device 1.0 on pci9 Aug 31 17:56:28 backup1 kernel: bge1: Ethernet address: 00:10:18:0d:cc:da Aug 31 17:56:28 backup1 kernel: pci8: at device 3.1 (no driver attached) Aug 31 17:56:28 backup1 kernel: pcib5: at device 4.0 on pci8 Aug 31 17:56:28 backup1 kernel: pci14: on pcib5 Aug 31 17:56:28 backup1 kernel: hptmv0: mem 0xf8200000-0xf827ffff irq 30 at device 2.0 on pci14 Aug 31 17:56:28 backup1 kernel: RocketRAID 182x SATA Controller driver Version 1.1 Aug 31 17:56:28 backup1 kernel: RR182x [0,0]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,1]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,2]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,4]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,5]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x: RAID5 write-back enabled Aug 31 17:56:28 backup1 kernel: pci8: at device 4.1 (no driver attached) Aug 31 17:56:28 backup1 kernel: atkbdc0: port 0x64,0x60 irq 1 on acpi0 Aug 31 17:56:28 backup1 kernel: atkbd0: irq 1 on atkbdc0 Aug 31 17:56:28 backup1 kernel: fdc0: port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 Aug 31 17:56:28 backup1 kernel: sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 Aug 31 17:56:28 backup1 kernel: sio0: type 16550A Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0 Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0 Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled Aug 31 17:56:28 backup1 kernel: orm0: at iomem 0xcd000-0xd2fff,0xcb000-0xccfff,0xc0000-0xcafff on isa0 Aug 31 17:56:28 backup1 kernel: sc0: at flags 0x100 on isa0 Aug 31 17:56:28 backup1 kernel: sc0: VGA <16 virtual consoles, flags=0x300> Aug 31 17:56:28 backup1 kernel: vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0 Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled Aug 31 17:56:28 backup1 kernel: Timecounters tick every 5.000 msec Aug 31 17:56:28 backup1 kernel: da0 at hptmv0 bus 0 target 0 lun 0 Aug 31 17:56:28 backup1 kernel: da0: Fixed Direct Access SCSI-0 device Aug 31 17:56:28 backup1 kernel: da0: 1526216MB (3125691008 512 byte sectors: 255H 63S/T 194565C) Aug 31 17:56:28 backup1 kernel: da1 at hptmv0 bus 0 target 1 lun 0 Aug 31 17:56:28 backup1 kernel: da1: Fixed Direct Access SCSI-0 device Aug 31 17:56:28 backup1 kernel: da1: 381554MB (781422757 512 byte sectors: 255H 63S/T 48641C) Aug 31 17:56:28 backup1 kernel: SMP: AP CPU #1 Launched! Aug 31 17:56:28 backup1 kernel: Mounting root from ufs:/dev/da0s1d Aug 31 17:56:28 backup1 kernel: WARNING: / was not properly dismounted Aug 31 17:56:28 backup1 kernel: WARNING: R/W mount of / denied. Filesystem is not clean - run fsck [/dmesg] ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137 or return the E.mail to postmaster@multiplay.co.uk.