Date: Mon, 13 Jun 2011 16:22:12 +0200 From: Christian Baer <christian.baer@uni-dortmund.de> To: freebsd-stable@freebsd.org Subject: Crashes with Promise controller Message-ID: <it56el$tqa$1@dough.gmane.org>
next in thread | raw e-mail | index | archive | help
Hey everybody! A while back I noticed strange system crashes on my personal file server. At first I thought they were caused by bad connections. But after exchanging all the SATA cables, the crashes didn't go away. Then I started looking around the web. As it seems there is a problem with FreeBSD ata(4) and certain Promise controllers out there - probably as a result of the lack of documentation from the hardware vendor. There have been threads discussing the problem. I haven't however seen any where someone with the actual problem and controller tried out much of anything. Sofar I have *not* been able to reproduce the errors at will, I can only say that heavy I/O helps - which however shouldn't come as much of a surprise. As far as I can tell by now, a crash only occurs when writing to a drive. I have gone through several gigs of reading (as fast as the system will read) and there hasn't been a crash yet. Writing files (even over a slow 100MB network connection) has crashed the system several times now within the last 48 hours. I have to slightly explain the word "crash" here: I don't actually have to hard reset the system myself. My box just does a reboot by itself. No filesystem is unmounted cleanly and because the machine isn't really new and powerful fsck takes pretty long. As you can imagine, the situation is rather ball busting. This is not so much a complaint but rather a show of motivation to do something about it. If one of the developers wants me to try out certain things to find clues, hit me. I don't mind compiling the kernel serveral times a day - or even the whole world if that will do any good. Just remember that this is only a 32 bit AMD, so compiling and results may take some time. :-) Below is a selection[1] of the error messages from /var/log/messages. Not that I use geli(8). All of the drives where this happens are encrypted. I doubt that that has anything to do with my problems though since I found lots of accounts of this problem on the web without any mention of geli. I have also includes the boot-messages[2] from my box. You will notice that there are two identical controller in the system. Before FreeBSD 8 I used 6 (and skipped 7). Before the installation of 6 I had no problems with there controllers. On a side note... Should this problem prove to be non.resolvable, I would appreciate a recommendation about alternative controllers that work and where the vendor isn't quite so reluctant to hand out documentation. Best regards from Germany, Chris [1] Jun 13 15:31:10 jon kernel: ata8: error issuing SET_MULTI command Jun 13 15:31:10 jon kernel: ad16: FAILURE - WRITE_DMA timed out LBA=511 Jun 13 15:31:10 jon kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ad16s1d.eli[WRITE(offset=229376, length=20480)] Jun 13 15:31:10 jon kernel: g_vfs_done():ad16s1d.eli[WRITE(offset=229376, length=20480)]error = 5 Jun 13 15:31:34 jon kernel: ata8: SIGNATURE: ffffffff Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SETFEATURES SET TRANSFER MODE command Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SETFEATURES ENABLE RCACHE command Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SETFEATURES ENABLE WCACHE command Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SET_MULTI command Jun 13 15:31:34 jon kernel: ad16: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=509133183 Jun 13 15:33:17 jon syslogd: kernel boot file is /boot/kernel/kernel Jun 13 15:33:17 jon kernel: Copyright (c) 1992-2011 The FreeBSD Project. Jun 13 15:33:17 jon kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 [2] Jun 13 15:33:17 jon kernel: FreeBSD 8.2-STABLE #1: Sun Jun 12 17:42:17 CEST 2011 Jun 13 15:33:17 jon kernel: root@jon.rz1.convenimus.net:/usr/obj/usr/src/sys/JON i386 Jun 13 15:33:17 jon kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Jun 13 15:33:17 jon kernel: CPU: AMD Athlon(tm) XP 2400+ (1992.03-MHz 686-class CPU) Jun 13 15:33:17 jon kernel: Origin = "AuthenticAMD" Id = 0x681 Family = 6 Model = 8 Stepping = 1 Jun 13 15:33:17 jon kernel: Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,P AT,PSE36,MMX,FXSR,SSE> Jun 13 15:33:17 jon kernel: AMD Features=0xc0400800<SYSCALL,MMX+,3DNow!+,3DNow!> Jun 13 15:33:17 jon kernel: real memory = 671088640 (640 MB) Jun 13 15:33:17 jon kernel: avail memory = 647278592 (617 MB) Jun 13 15:33:17 jon kernel: ACPI Error: A valid RSDP was not found (20101013/tbxfroot-309) Jun 13 15:33:17 jon kernel: MPTable: <OEM00000 PROD00000000> Jun 13 15:33:17 jon kernel: ioapic0: Assuming intbase of 0 Jun 13 15:33:17 jon kernel: ioapic0 <Version 1.1> irqs 0-23 on motherboard Jun 13 15:33:17 jon kernel: kbd1 at kbdmux0 Jun 13 15:33:17 jon kernel: ACPI Error: A valid RSDP was not found (20101013/tbxfroot-309) Jun 13 15:33:17 jon kernel: ACPI: Table initialisation failed: AE_NOT_FOUND Jun 13 15:33:17 jon kernel: ACPI: Try disabling either ACPI or apic support. Jun 13 15:33:17 jon kernel: pcib0: <MPTable Host-PCI bridge> pcibus 0 on motherboard Jun 13 15:33:17 jon kernel: pci0: <PCI bus> on pcib0 Jun 13 15:33:17 jon kernel: agp0: <VIA 82C8363 (Apollo KT133x/KM133) host to PCI bridge> on hostb0 Jun 13 15:33:17 jon kernel: agp0: aperture size is 4M Jun 13 15:33:17 jon kernel: pcib1: <PCI-PCI bridge> at device 1.0 on pci0 Jun 13 15:33:17 jon kernel: pci1: <PCI bus> on pcib1 Jun 13 15:33:17 jon kernel: isab0: <PCI-ISA bridge> at device 7.0 on pci0 Jun 13 15:33:17 jon kernel: isa0: <ISA bus> on isab0 Jun 13 15:33:17 jon kernel: atapci0: <VIA 82C686B UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xd000-0xd00f at device 7.1 on pci0 Jun 13 15:33:17 jon kernel: atapci0: Correcting VIA config for southbridge data corruption bug Jun 13 15:33:17 jon kernel: ata0: <ATA channel 0> on atapci0 Jun 13 15:33:17 jon kernel: ata0: [ITHREAD] Jun 13 15:33:17 jon kernel: ata1: <ATA channel 1> on atapci0 Jun 13 15:33:17 jon kernel: ata1: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci1: <Promise PDC40718 SATA300 controller> port 0xdc00-0xdc7f,0xe000-0xe0ff mem 0xd3461000-0xd3461fff,0xd3420000-0xd343ffff irq 11 at device 8.0 on pci0 Jun 13 15:33:17 jon kernel: atapci1: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci1: [ITHREAD] Jun 13 15:33:17 jon kernel: ata2: <ATA channel 0> on atapci1 Jun 13 15:33:17 jon kernel: ata2: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata2: [ITHREAD] Jun 13 15:33:17 jon kernel: ata3: <ATA channel 1> on atapci1 Jun 13 15:33:17 jon kernel: ata3: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata3: [ITHREAD] Jun 13 15:33:17 jon kernel: ata4: <ATA channel 2> on atapci1 Jun 13 15:33:17 jon kernel: ata4: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata4: [ITHREAD] Jun 13 15:33:17 jon kernel: ata5: <ATA channel 3> on atapci1 Jun 13 15:33:17 jon kernel: ata5: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata5: [ITHREAD] Jun 13 15:33:17 jon kernel: fxp0: <Intel 82550 Pro/100 Ethernet> port 0xe400-0xe43f mem 0xd3460000-0xd3460fff,0xd3440000-0xd345ffff irq 10 at device 9.0 on pci0 Jun 13 15:33:17 jon kernel: miibus0: <MII bus> on fxp0 Jun 13 15:33:17 jon kernel: inphy0: <i82555 10/100 media interface> PHY 1 on miibus0 Jun 13 15:33:17 jon kernel: inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow Jun 13 15:33:17 jon kernel: fxp0: Ethernet address: 00:02:b3:ad:ad:d7 Jun 13 15:33:17 jon kernel: fxp0: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci2: <Promise PDC40718 SATA300 controller> port 0xe800-0xe87f,0xec00-0xecff mem 0xd3462000-0xd3462fff,0xd3400000-0xd341ffff irq 11 at device 12.0 on pci0 Jun 13 15:33:17 jon kernel: atapci2: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci2: [ITHREAD] Jun 13 15:33:17 jon kernel: ata6: <ATA channel 0> on atapci2 Jun 13 15:33:17 jon kernel: ata6: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata6: [ITHREAD] Jun 13 15:33:17 jon kernel: ata7: <ATA channel 1> on atapci2 Jun 13 15:33:17 jon kernel: ata7: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata7: [ITHREAD] Jun 13 15:33:17 jon kernel: ata8: <ATA channel 2> on atapci2 Jun 13 15:33:17 jon kernel: ata8: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata8: [ITHREAD] Jun 13 15:33:17 jon kernel: ata9: <ATA channel 3> on atapci2 Jun 13 15:33:17 jon kernel: ata9: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata9: [ITHREAD] Jun 13 15:33:17 jon kernel: vgapci0: <VGA-compatible display> mem 0xd1000000-0xd1003fff,0xd2000000-0xd27fffff at device 13.0 on pci0 Jun 13 15:33:17 jon kernel: cpu0 on motherboard Jun 13 15:33:17 jon kernel: pmtimer0 on isa0 Jun 13 15:33:17 jon kernel: atrtc0: <AT realtime clock> at port 0x70-0x71 irq 8 pnpid PNP0b00 on isa0 Jun 13 15:33:17 jon kernel: atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 irq 1 pnpid PNP0303 on isa0 Jun 13 15:33:17 jon kernel: atkbd0: <AT Keyboard> irq 1 on atkbdc0 Jun 13 15:33:17 jon kernel: kbd0 at atkbd0 Jun 13 15:33:17 jon kernel: atkbd0: [GIANT-LOCKED] Jun 13 15:33:17 jon kernel: atkbd0: [ITHREAD] Jun 13 15:33:17 jon kernel: unknown: <PNP0c01> can't assign resources (memory) Jun 13 15:33:17 jon kernel: psm0: <PS/2 Mouse> irq 12 on atkbdc0 Jun 13 15:33:17 jon kernel: psm0: [GIANT-LOCKED] Jun 13 15:33:17 jon kernel: psm0: [ITHREAD] Jun 13 15:33:17 jon kernel: psm0: model Generic PS/2 mouse, device ID 0 Jun 13 15:33:17 jon kernel: fdc0: <Enhanced floppy controller> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 pnpid PNP0700 on isa0 Jun 13 15:33:17 jon kernel: fdc0: [FILTER] Jun 13 15:33:17 jon kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0 Jun 13 15:33:17 jon kernel: orm0: <ISA Option ROM> at iomem 0xc0000-0xc7fff pnpid ORM0000 on isa0 Jun 13 15:33:17 jon kernel: sc0: <System console> at flags 0x100 on isa0 Jun 13 15:33:17 jon kernel: sc0: VGA <16 virtual consoles, flags=0x300> Jun 13 15:33:17 jon kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Jun 13 15:33:17 jon kernel: ppc0: parallel port not found. Jun 13 15:33:17 jon kernel: unknown: <PNP0c01> can't assign resources (memory) Jun 13 15:33:17 jon kernel: Timecounter "TSC" frequency 1992030730 Hz quality 800 Jun 13 15:33:17 jon kernel: Timecounters tick every 1.000 msec Jun 13 15:33:17 jon kernel: ad0: 14664MB <IBM DJNA-351520 J56OA30K> at ata0-master UDMA66 Jun 13 15:33:17 jon kernel: ad2: 305245MB <WDC WD3200JB-00KFA0 08.05J08> at ata1-master UDMA100 Jun 13 15:33:17 jon kernel: ad3: 238475MB <Seagate ST3250823A 3.03> at ata1-slave UDMA100 Jun 13 15:33:17 jon kernel: ad4: 476940MB <SAMSUNG HD501LJ CR100-11> at ata2-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad6: 1907729MB <WDC WD20EARS-07MVWB0 51.0AB51> at ata3-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad8: 476940MB <SAMSUNG HD501LJ CR100-11> at ata4-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master UDMA100 SATA 1.5Gb/s Jun 13 15:33:17 jon kernel: ad12: 1430799MB <SAMSUNG HD154UI 1AG01118> at ata6-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad14: 476940MB <Seagate ST3500320AS SD04> at ata7-master UDMA100 SATA 1.5Gb/s Jun 13 15:33:17 jon kernel: ad16: 1907729MB <SAMSUNG HD203WI 1AN10002> at ata8-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad18: 953869MB <SAMSUNG HD103UI 1AA01113> at ata9-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: GEOM: ad2s1: geometry does not match label (16h,63s != 1h,63s). Jun 13 15:33:17 jon kernel: GEOM: ad3s1: geometry does not match label (16h,63s != 1h,63s). Jun 13 15:33:17 jon kernel: Trying to mount root from ufs:/dev/ad0s1a Jun 13 15:33:17 jon kernel: WARNING: / was not properly dismounted Jun 13 15:33:17 jon kernel: WARNING: /usr was not properly dismounted Jun 13 15:33:17 jon kernel: WARNING: /var was not properly dismounted Jun 13 15:33:17 jon kernel: /var: mount pending error: blocks 8 files 2 Jun 13 15:33:18 jon kernel: fxp0: link state changed to UP Jun 13 15:33:19 jon kernel: fxp0: link state changed to DOWN Jun 13 15:33:21 jon kernel: fxp0: link state changed to UP Jun 13 15:33:22 jon dhclient: New IP Address (fxp0): 192.168.100.9 Jun 13 15:33:22 jon kernel: fxp0: link state changed to DOWN Jun 13 15:33:22 jon dhclient: New Subnet Mask (fxp0): 255.255.255.0 Jun 13 15:33:22 jon dhclient: New Broadcast Address (fxp0): 192.168.100.255 Jun 13 15:33:22 jon dhclient: New Routers (fxp0): 192.168.100.250 Jun 13 15:33:24 jon kernel: fxp0: link state changed to UP Jun 13 15:38:07 jon fsck: /dev/ad0s1e: 285849 files, 1607328 used, 2887803 free (117739 frags, 346258 blocks, 2.6% fragmentation) [after that only messages from fsck]
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?it56el$tqa$1>