From owner-freebsd-questions@FreeBSD.ORG Thu Jun 9 14:32:11 2005 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6670616A41C for ; Thu, 9 Jun 2005 14:32:11 +0000 (GMT) (envelope-from tshadwick@goinet.com) Received: from mail.goinet.com (mail.goinet.com [208.207.72.15]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1748543D49 for ; Thu, 9 Jun 2005 14:32:10 +0000 (GMT) (envelope-from tshadwick@goinet.com) Received: from mail.goinet.com (localhost.goinet.com [127.0.0.1]) by mail.goinet.com (8.13.1/8.13.1) with ESMTP id j59EW9RG092292; Thu, 9 Jun 2005 09:32:09 -0500 (CDT) (envelope-from tshadwick@goinet.com) Received: from localhost (tshadwick@localhost) by mail.goinet.com (8.13.1/8.13.1/Submit) with ESMTP id j59EW7Mj092273; Thu, 9 Jun 2005 09:32:07 -0500 (CDT) (envelope-from tshadwick@goinet.com) X-Authentication-Warning: mail.goinet.com: tshadwick owned process doing -bs Date: Thu, 9 Jun 2005 09:32:07 -0500 (CDT) From: Tony Shadwick To: Steve Richardson In-Reply-To: <20050609143002.GA74546@sidehack.sat.gweep.net> Message-ID: <20050609093125.O71755@mail.goinet.com> References: <20050609143002.GA74546@sidehack.sat.gweep.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: ClamAV version 0.85.1, clamav-milter version 0.85 on mail.goinet.com X-Virus-Status: Clean Cc: freebsd-questions@freebsd.org Subject: Re: FBSD 5.4-STABLE/3Ware Escalade 7506-4LP on dual Opteron issue X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jun 2005 14:32:11 -0000 I'm not claiming this will fix your issue, but are you running the absolute latest kernel sources? There is the possibility this issue has been resolve in a newer kernel. cvsup your sources and try doing a build. See what happens. On Thu, 9 Jun 2005, Steve Richardson wrote: > > Hi, > > We're building out brand new dual Opteron box to run our public access unix > site. We're running FreeBSD 5.4 and a 3Ware Escalade 7506-4LP. We are > having difficulties with the system, and any help you can offer would be > greatly appreciated. > > For the most part, everything behaves fine. We've got the system built and > installed. Unfortunately, we're having a periodic, catastrophic failure > involving the 3Ware card. > > Periodically, the system will partly lock up with the following errors: > > twe0: unexpected status bit(s) 100000 > twe0: PCI abort, clearing. > > I say partly lock up because the kernel does not panic, nor do the console > keyboard or network interfaces become non-responsive (i.e. you can type > stuff at the login prompt, and ping the server). However, the disk > subsystem does appear to cease functioning once this has occurred. > > Frankly at this point we are baffled, because the system is stable enough to > run for days on end under light load, and will even occasionally handle > periods of medium disk load (e.g. many hours of rsyncing from our live > server, build world, etc). > > We have been using the bonnie++ hard disk benchmarking suite as a means for > recreating the problem, as follows: > >> mkdir testdir >> bonnie++ -d ./dbench -s 2g -n 100:500000:1000 -x 100 > > I've included system information below, including dmesg output. > > > regards, > Steve Richardson > System Administrator > GweepNet Cooperative Network > > > > System Description: > Gigabyte GA-7A8DW motherboard > (2) AMD Opteron 246 2GHz CPUs > 2GB Samsung PC3200 ECC RAM > 3Ware Escalade 7506-4LP parallel ATA RAID, installed in 64 bit PCI slot > > OS: > FreeBSD 5.4-STABLE FreeBSD 5.4-STABLE amd64 > > > dmesg output: > > Copyright (c) 1992-2005 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 5.4-STABLE #2: Tue Jun 7 00:10:29 EDT 2005 > root@newsidey.gweep.net:/usr/obj/usr/src/sys/SIDEHACK > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: AMD Opteron(tm) Processor 246 (1993.79-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 > Features=0x78bfbff > AMD Features=0xe0500800 > real memory = 2146893824 (2047 MB) > avail memory = 2061205504 (1965 MB) > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > MADT: Forcing active-low polarity and level trigger for SCI > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-27 on motherboard > ioapic2 irqs 28-31 on motherboard > acpi0: on motherboard > acpi0: Power Button (fixed) > acpi0: Sleep Button (fixed) > acpi_bus_number: can't get _ADR > acpi_bus_number: can't get _ADR > acpi_bus_number: can't get _ADR > unknown: I/O range not supported > unknown: I/O range not supported > ACPI-1304: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT > ACPI-0239: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT > can't fetch resources for \\_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 > cpu0: on acpi0 > cpu1: on acpi0 > acpi_button0: on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pcib1: at device 1.0 on pci0 > pci1: on pcib1 > pci1: at device 0.0 (no driver attached) > pcib2: at device 6.0 on pci0 > pci2: on pcib2 > ohci0: mem 0xd0110000-0xd0110fff irq 19 at device 0.0 on pci2 > usb0: OHCI version 1.0, legacy support > usb0: SMM does not respond, resetting > usb0: on ohci0 > usb0: USB revision 1.0 > uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub0: 3 ports with 3 removable, self powered > ohci1: mem 0xd0111000-0xd0111fff irq 19 at device 0.1 on pci2 > usb1: OHCI version 1.0, legacy support > usb1: SMM does not respond, resetting > usb1: on ohci1 > usb1: USB revision 1.0 > uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub1: 3 ports with 3 removable, self powered > ahc0: port 0x3000-0x30ff mem 0xd0112000-0xd0112fff irq 17 at device 4.0 on pci2 > aic7850: Single Channel A, SCSI Id=7, 3/253 SCBs > bge0: mem 0xd0100000-0xd010ffff irq 19 at device 5.0 on pci2 > miibus0: on bge0 > brgphy0: on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto > bge0: Ethernet address: 00:0f:ea:7e:b1:81 > atapci0: port 0x3400-0x340f,0x3410-0x3413,0x3418-0x341f,0x3414-0x3417,0x3420-0x3427 mem 0xd0113000-0xd01133ff irq 18 at device 6.0 on pci2 > ata2: channel #0 on atapci0 > ata3: channel #1 on atapci0 > ata4: channel #2 on atapci0 > ata5: channel #3 on atapci0 > isab0: at device 7.0 on pci0 > isa0: on isab0 > atapci1: port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0 > ata0: channel #0 on atapci1 > ata1: channel #1 on atapci1 > pci0: at device 7.3 (no driver attached) > pcib3: on acpi0 > pci8: on pcib3 > pcib4: at device 3.0 on pci8 > pci9: on pcib4 > pci8: at device 3.1 (no driver attached) > pcib5: at device 4.0 on pci8 > pci14: on pcib5 > twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x4000-0x400f mem 0xf0800000-0xf0ffffff irq 30 at device 2.0 on pci14 > twe0: 4 ports, Firmware FE7X 1.05.00.068, BIOS BE7X 1.08.00.048 > pci8: at device 4.1 (no driver attached) > atkbdc0: port 0x64,0x60 irq 1 on acpi0 > atkbd0: flags 0x1 irq 1 on atkbdc0 > kbd0 at atkbd0 > fdc0: port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > sio0: type 16550A > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > ppc0: cannot reserve I/O port range > ppc0: cannot reserve I/O port range > orm0: at iomem 0xd0000-0xd0fff,0xc0000-0xcffff on isa0 > ppc0: cannot reserve I/O port range > sc0: at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounters tick every 1.000 msec > ahc0: Someone reset channel A > ad0: 152627MB [310101/16/63] at ata0-master UDMA100 > ad2: 286188MB [581463/16/63] at ata1-master UDMA133 > Waiting 15 seconds for SCSI devices to settle > twed0: on twe0 > twed0: 305253MB (625159424 sectors) > sa0 at ahc0 bus 0 target 3 lun 0 > sa0: Removable Sequential Access SCSI-2 device > sa0: 10.000MB/s transfers (10.000MHz, offset 15) > SMP: AP CPU #1 Launched! > Mounting root from ufs:/dev/twed0s1a > WARNING: / was not properly dismounted > WARNING: /home/crib was not properly dismounted > WARNING: /home/domus was not properly dismounted > WARNING: /tmp was not properly dismounted > WARNING: /u was not properly dismounted > WARNING: /u/backup/nearline was not properly dismounted > WARNING: /u/backup/online was not properly dismounted > WARNING: /u/news was not properly dismounted > WARNING: /u/news/nntpcached was not properly dismounted > WARNING: /usr was not properly dismounted > WARNING: /var was not properly dismounted > WARNING: /var/tmp was not properly dismounted > bge0: firmware handshake timed out > bge0: RX CPU self-diagnostics failed! > bge0: watchdog timeout -- resetting > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" >