Date: Thu, 9 Jun 2005 09:32:07 -0500 (CDT) From: Tony Shadwick <tshadwick@goinet.com> To: Steve Richardson <prefect@sidehack.sat.gweep.net> Cc: freebsd-questions@freebsd.org Subject: Re: FBSD 5.4-STABLE/3Ware Escalade 7506-4LP on dual Opteron issue Message-ID: <20050609093125.O71755@mail.goinet.com> In-Reply-To: <20050609143002.GA74546@sidehack.sat.gweep.net> References: <20050609143002.GA74546@sidehack.sat.gweep.net>
next in thread | previous in thread | raw e-mail | index | archive | help
I'm not claiming this will fix your issue, but are you running the absolute latest kernel sources? There is the possibility this issue has been resolve in a newer kernel. cvsup your sources and try doing a build. See what happens. On Thu, 9 Jun 2005, Steve Richardson wrote: > > Hi, > > We're building out brand new dual Opteron box to run our public access unix > site. We're running FreeBSD 5.4 and a 3Ware Escalade 7506-4LP. We are > having difficulties with the system, and any help you can offer would be > greatly appreciated. > > For the most part, everything behaves fine. We've got the system built and > installed. Unfortunately, we're having a periodic, catastrophic failure > involving the 3Ware card. > > Periodically, the system will partly lock up with the following errors: > > twe0: unexpected status bit(s) 100000<PCIABRT> > twe0: PCI abort, clearing. > > I say partly lock up because the kernel does not panic, nor do the console > keyboard or network interfaces become non-responsive (i.e. you can type > stuff at the login prompt, and ping the server). However, the disk > subsystem does appear to cease functioning once this has occurred. > > Frankly at this point we are baffled, because the system is stable enough to > run for days on end under light load, and will even occasionally handle > periods of medium disk load (e.g. many hours of rsyncing from our live > server, build world, etc). > > We have been using the bonnie++ hard disk benchmarking suite as a means for > recreating the problem, as follows: > >> mkdir testdir >> bonnie++ -d ./dbench -s 2g -n 100:500000:1000 -x 100 > > I've included system information below, including dmesg output. > > > regards, > Steve Richardson > System Administrator > GweepNet Cooperative Network > > > > System Description: > Gigabyte GA-7A8DW motherboard > (2) AMD Opteron 246 2GHz CPUs > 2GB Samsung PC3200 ECC RAM > 3Ware Escalade 7506-4LP parallel ATA RAID, installed in 64 bit PCI slot > > OS: > FreeBSD 5.4-STABLE FreeBSD 5.4-STABLE amd64 > > > dmesg output: > > Copyright (c) 1992-2005 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 5.4-STABLE #2: Tue Jun 7 00:10:29 EDT 2005 > root@newsidey.gweep.net:/usr/obj/usr/src/sys/SIDEHACK > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: AMD Opteron(tm) Processor 246 (1993.79-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 > Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> > AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow> > real memory = 2146893824 (2047 MB) > avail memory = 2061205504 (1965 MB) > ACPI APIC Table: <PTLTD APIC > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > MADT: Forcing active-low polarity and level trigger for SCI > ioapic0 <Version 1.1> irqs 0-23 on motherboard > ioapic1 <Version 1.1> irqs 24-27 on motherboard > ioapic2 <Version 1.1> irqs 28-31 on motherboard > acpi0: <PTLTD XSDT> on motherboard > acpi0: Power Button (fixed) > acpi0: Sleep Button (fixed) > acpi_bus_number: can't get _ADR > acpi_bus_number: can't get _ADR > acpi_bus_number: can't get _ADR > unknown: I/O range not supported > unknown: I/O range not supported > ACPI-1304: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT > ACPI-0239: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT > can't fetch resources for \\_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 > cpu0: <ACPI CPU> on acpi0 > cpu1: <ACPI CPU> on acpi0 > acpi_button0: <Power Button> on acpi0 > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > pci1: <display, VGA> at device 0.0 (no driver attached) > pcib2: <ACPI PCI-PCI bridge> at device 6.0 on pci0 > pci2: <ACPI PCI bus> on pcib2 > ohci0: <OHCI (generic) USB controller> mem 0xd0110000-0xd0110fff irq 19 at device 0.0 on pci2 > usb0: OHCI version 1.0, legacy support > usb0: SMM does not respond, resetting > usb0: <OHCI (generic) USB controller> on ohci0 > usb0: USB revision 1.0 > uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub0: 3 ports with 3 removable, self powered > ohci1: <OHCI (generic) USB controller> mem 0xd0111000-0xd0111fff irq 19 at device 0.1 on pci2 > usb1: OHCI version 1.0, legacy support > usb1: SMM does not respond, resetting > usb1: <OHCI (generic) USB controller> on ohci1 > usb1: USB revision 1.0 > uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub1: 3 ports with 3 removable, self powered > ahc0: <Adaptec 2902/04/10/15/20C/30C SCSI adapter> port 0x3000-0x30ff mem 0xd0112000-0xd0112fff irq 17 at device 4.0 on pci2 > aic7850: Single Channel A, SCSI Id=7, 3/253 SCBs > bge0: <Broadcom BCM5705 Gigabit Ethernet, ASIC rev. 0x3003> mem 0xd0100000-0xd010ffff irq 19 at device 5.0 on pci2 > miibus0: <MII bus> on bge0 > brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto > bge0: Ethernet address: 00:0f:ea:7e:b1:81 > atapci0: <SiI 3114 SATA150 controller> port 0x3400-0x340f,0x3410-0x3413,0x3418-0x341f,0x3414-0x3417,0x3420-0x3427 mem 0xd0113000-0xd01133ff irq 18 at device 6.0 on pci2 > ata2: channel #0 on atapci0 > ata3: channel #1 on atapci0 > ata4: channel #2 on atapci0 > ata5: channel #3 on atapci0 > isab0: <PCI-ISA bridge> at device 7.0 on pci0 > isa0: <ISA bus> on isab0 > atapci1: <AMD 8111 UDMA133 controller> port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0 > ata0: channel #0 on atapci1 > ata1: channel #1 on atapci1 > pci0: <bridge> at device 7.3 (no driver attached) > pcib3: <ACPI Host-PCI bridge> on acpi0 > pci8: <ACPI PCI bus> on pcib3 > pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci8 > pci9: <ACPI PCI bus> on pcib4 > pci8: <base peripheral, interrupt controller> at device 3.1 (no driver attached) > pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci8 > pci14: <ACPI PCI bus> on pcib5 > twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x4000-0x400f mem 0xf0800000-0xf0ffffff irq 30 at device 2.0 on pci14 > twe0: 4 ports, Firmware FE7X 1.05.00.068, BIOS BE7X 1.08.00.048 > pci8: <base peripheral, interrupt controller> at device 4.1 (no driver attached) > atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 > atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 > kbd0 at atkbd0 > fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > sio0: type 16550A > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > ppc0: cannot reserve I/O port range > ppc0: cannot reserve I/O port range > orm0: <ISA Option ROMs> at iomem 0xd0000-0xd0fff,0xc0000-0xcffff on isa0 > ppc0: cannot reserve I/O port range > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounters tick every 1.000 msec > ahc0: Someone reset channel A > ad0: 152627MB <SAMSUNG SP1614N/TM100-30> [310101/16/63] at ata0-master UDMA100 > ad2: 286188MB <Maxtor 6B300R0/BAH41B70> [581463/16/63] at ata1-master UDMA133 > Waiting 15 seconds for SCSI devices to settle > twed0: <Unit 0, RAID5, Normal> on twe0 > twed0: 305253MB (625159424 sectors) > sa0 at ahc0 bus 0 target 3 lun 0 > sa0: <EXABYTE EXB-89008E00012F V39e> Removable Sequential Access SCSI-2 device > sa0: 10.000MB/s transfers (10.000MHz, offset 15) > SMP: AP CPU #1 Launched! > Mounting root from ufs:/dev/twed0s1a > WARNING: / was not properly dismounted > WARNING: /home/crib was not properly dismounted > WARNING: /home/domus was not properly dismounted > WARNING: /tmp was not properly dismounted > WARNING: /u was not properly dismounted > WARNING: /u/backup/nearline was not properly dismounted > WARNING: /u/backup/online was not properly dismounted > WARNING: /u/news was not properly dismounted > WARNING: /u/news/nntpcached was not properly dismounted > WARNING: /usr was not properly dismounted > WARNING: /var was not properly dismounted > WARNING: /var/tmp was not properly dismounted > bge0: firmware handshake timed out > bge0: RX CPU self-diagnostics failed! > bge0: watchdog timeout -- resetting > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050609093125.O71755>