Date: Thu, 9 Jun 2005 10:30:02 -0400 From: Steve Richardson <prefect@sidehack.sat.gweep.net> To: freebsd-questions@freebsd.org Subject: FBSD 5.4-STABLE/3Ware Escalade 7506-4LP on dual Opteron issue Message-ID: <20050609143002.GA74546@sidehack.sat.gweep.net>
next in thread | raw e-mail | index | archive | help
Hi, We're building out brand new dual Opteron box to run our public access unix site. We're running FreeBSD 5.4 and a 3Ware Escalade 7506-4LP. We are having difficulties with the system, and any help you can offer would be greatly appreciated. For the most part, everything behaves fine. We've got the system built and installed. Unfortunately, we're having a periodic, catastrophic failure involving the 3Ware card. Periodically, the system will partly lock up with the following errors: twe0: unexpected status bit(s) 100000<PCIABRT> twe0: PCI abort, clearing. I say partly lock up because the kernel does not panic, nor do the console keyboard or network interfaces become non-responsive (i.e. you can type stuff at the login prompt, and ping the server). However, the disk subsystem does appear to cease functioning once this has occurred. Frankly at this point we are baffled, because the system is stable enough to run for days on end under light load, and will even occasionally handle periods of medium disk load (e.g. many hours of rsyncing from our live server, build world, etc). We have been using the bonnie++ hard disk benchmarking suite as a means for recreating the problem, as follows: > mkdir testdir > bonnie++ -d ./dbench -s 2g -n 100:500000:1000 -x 100 I've included system information below, including dmesg output. regards, Steve Richardson System Administrator GweepNet Cooperative Network System Description: Gigabyte GA-7A8DW motherboard (2) AMD Opteron 246 2GHz CPUs 2GB Samsung PC3200 ECC RAM 3Ware Escalade 7506-4LP parallel ATA RAID, installed in 64 bit PCI slot OS: FreeBSD 5.4-STABLE FreeBSD 5.4-STABLE amd64 dmesg output: Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.4-STABLE #2: Tue Jun 7 00:10:29 EDT 2005 root@newsidey.gweep.net:/usr/obj/usr/src/sys/SIDEHACK Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 246 (1993.79-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow> real memory = 2146893824 (2047 MB) avail memory = 2061205504 (1965 MB) ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard ioapic1 <Version 1.1> irqs 24-27 on motherboard ioapic2 <Version 1.1> irqs 28-31 on motherboard acpi0: <PTLTD XSDT> on motherboard acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR unknown: I/O range not supported unknown: I/O range not supported ACPI-1304: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT ACPI-0239: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT can't fetch resources for \\_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pci1: <display, VGA> at device 0.0 (no driver attached) pcib2: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci2: <ACPI PCI bus> on pcib2 ohci0: <OHCI (generic) USB controller> mem 0xd0110000-0xd0110fff irq 19 at device 0.0 on pci2 usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: <OHCI (generic) USB controller> mem 0xd0111000-0xd0111fff irq 19 at device 0.1 on pci2 usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: <OHCI (generic) USB controller> on ohci1 usb1: USB revision 1.0 uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered ahc0: <Adaptec 2902/04/10/15/20C/30C SCSI adapter> port 0x3000-0x30ff mem 0xd0112000-0xd0112fff irq 17 at device 4.0 on pci2 aic7850: Single Channel A, SCSI Id=7, 3/253 SCBs bge0: <Broadcom BCM5705 Gigabit Ethernet, ASIC rev. 0x3003> mem 0xd0100000-0xd010ffff irq 19 at device 5.0 on pci2 miibus0: <MII bus> on bge0 brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:0f:ea:7e:b1:81 atapci0: <SiI 3114 SATA150 controller> port 0x3400-0x340f,0x3410-0x3413,0x3418-0x341f,0x3414-0x3417,0x3420-0x3427 mem 0xd0113000-0xd01133ff irq 18 at device 6.0 on pci2 ata2: channel #0 on atapci0 ata3: channel #1 on atapci0 ata4: channel #2 on atapci0 ata5: channel #3 on atapci0 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <AMD 8111 UDMA133 controller> port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0 ata0: channel #0 on atapci1 ata1: channel #1 on atapci1 pci0: <bridge> at device 7.3 (no driver attached) pcib3: <ACPI Host-PCI bridge> on acpi0 pci8: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci8 pci9: <ACPI PCI bus> on pcib4 pci8: <base peripheral, interrupt controller> at device 3.1 (no driver attached) pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci8 pci14: <ACPI PCI bus> on pcib5 twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x4000-0x400f mem 0xf0800000-0xf0ffffff irq 30 at device 2.0 on pci14 twe0: 4 ports, Firmware FE7X 1.05.00.068, BIOS BE7X 1.08.00.048 pci8: <base peripheral, interrupt controller> at device 4.1 (no driver attached) atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: cannot reserve I/O port range ppc0: cannot reserve I/O port range orm0: <ISA Option ROMs> at iomem 0xd0000-0xd0fff,0xc0000-0xcffff on isa0 ppc0: cannot reserve I/O port range sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ahc0: Someone reset channel A ad0: 152627MB <SAMSUNG SP1614N/TM100-30> [310101/16/63] at ata0-master UDMA100 ad2: 286188MB <Maxtor 6B300R0/BAH41B70> [581463/16/63] at ata1-master UDMA133 Waiting 15 seconds for SCSI devices to settle twed0: <Unit 0, RAID5, Normal> on twe0 twed0: 305253MB (625159424 sectors) sa0 at ahc0 bus 0 target 3 lun 0 sa0: <EXABYTE EXB-89008E00012F V39e> Removable Sequential Access SCSI-2 device sa0: 10.000MB/s transfers (10.000MHz, offset 15) SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/twed0s1a WARNING: / was not properly dismounted WARNING: /home/crib was not properly dismounted WARNING: /home/domus was not properly dismounted WARNING: /tmp was not properly dismounted WARNING: /u was not properly dismounted WARNING: /u/backup/nearline was not properly dismounted WARNING: /u/backup/online was not properly dismounted WARNING: /u/news was not properly dismounted WARNING: /u/news/nntpcached was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted WARNING: /var/tmp was not properly dismounted bge0: firmware handshake timed out bge0: RX CPU self-diagnostics failed! bge0: watchdog timeout -- resetting
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050609143002.GA74546>