From owner-freebsd-stable@FreeBSD.ORG Mon Jun 13 14:22:28 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 980D31065672 for ; Mon, 13 Jun 2011 14:22:28 +0000 (UTC) (envelope-from freebsd-stable@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id CFC7C8FC08 for ; Mon, 13 Jun 2011 14:22:27 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QW825-0005C5-JL for freebsd-stable@freebsd.org; Mon, 13 Jun 2011 16:22:25 +0200 Received: from dtmd-4db2b5a3.pool.mediaways.net ([77.178.181.163]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 13 Jun 2011 16:22:25 +0200 Received: from christian.baer by dtmd-4db2b5a3.pool.mediaways.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 13 Jun 2011 16:22:25 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-stable@freebsd.org From: Christian Baer Date: Mon, 13 Jun 2011 16:22:12 +0200 Lines: 247 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: dtmd-4db2b5a3.pool.mediaways.net User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.16) Gecko/20101125 Lightning/1.0b1 Thunderbird/3.0.11 Subject: Crashes with Promise controller X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 14:22:28 -0000 Hey everybody! A while back I noticed strange system crashes on my personal file server. At first I thought they were caused by bad connections. But after exchanging all the SATA cables, the crashes didn't go away. Then I started looking around the web. As it seems there is a problem with FreeBSD ata(4) and certain Promise controllers out there - probably as a result of the lack of documentation from the hardware vendor. There have been threads discussing the problem. I haven't however seen any where someone with the actual problem and controller tried out much of anything. Sofar I have *not* been able to reproduce the errors at will, I can only say that heavy I/O helps - which however shouldn't come as much of a surprise. As far as I can tell by now, a crash only occurs when writing to a drive. I have gone through several gigs of reading (as fast as the system will read) and there hasn't been a crash yet. Writing files (even over a slow 100MB network connection) has crashed the system several times now within the last 48 hours. I have to slightly explain the word "crash" here: I don't actually have to hard reset the system myself. My box just does a reboot by itself. No filesystem is unmounted cleanly and because the machine isn't really new and powerful fsck takes pretty long. As you can imagine, the situation is rather ball busting. This is not so much a complaint but rather a show of motivation to do something about it. If one of the developers wants me to try out certain things to find clues, hit me. I don't mind compiling the kernel serveral times a day - or even the whole world if that will do any good. Just remember that this is only a 32 bit AMD, so compiling and results may take some time. :-) Below is a selection[1] of the error messages from /var/log/messages. Not that I use geli(8). All of the drives where this happens are encrypted. I doubt that that has anything to do with my problems though since I found lots of accounts of this problem on the web without any mention of geli. I have also includes the boot-messages[2] from my box. You will notice that there are two identical controller in the system. Before FreeBSD 8 I used 6 (and skipped 7). Before the installation of 6 I had no problems with there controllers. On a side note... Should this problem prove to be non.resolvable, I would appreciate a recommendation about alternative controllers that work and where the vendor isn't quite so reluctant to hand out documentation. Best regards from Germany, Chris [1] Jun 13 15:31:10 jon kernel: ata8: error issuing SET_MULTI command Jun 13 15:31:10 jon kernel: ad16: FAILURE - WRITE_DMA timed out LBA=511 Jun 13 15:31:10 jon kernel: GEOM_ELI: Crypto WRITE request failed (error=5). ad16s1d.eli[WRITE(offset=229376, length=20480)] Jun 13 15:31:10 jon kernel: g_vfs_done():ad16s1d.eli[WRITE(offset=229376, length=20480)]error = 5 Jun 13 15:31:34 jon kernel: ata8: SIGNATURE: ffffffff Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SETFEATURES SET TRANSFER MODE command Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SETFEATURES ENABLE RCACHE command Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SETFEATURES ENABLE WCACHE command Jun 13 15:31:34 jon kernel: ata8: timeout waiting to issue command Jun 13 15:31:34 jon kernel: ata8: error issuing SET_MULTI command Jun 13 15:31:34 jon kernel: ad16: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=509133183 Jun 13 15:33:17 jon syslogd: kernel boot file is /boot/kernel/kernel Jun 13 15:33:17 jon kernel: Copyright (c) 1992-2011 The FreeBSD Project. Jun 13 15:33:17 jon kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 [2] Jun 13 15:33:17 jon kernel: FreeBSD 8.2-STABLE #1: Sun Jun 12 17:42:17 CEST 2011 Jun 13 15:33:17 jon kernel: root@jon.rz1.convenimus.net:/usr/obj/usr/src/sys/JON i386 Jun 13 15:33:17 jon kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Jun 13 15:33:17 jon kernel: CPU: AMD Athlon(tm) XP 2400+ (1992.03-MHz 686-class CPU) Jun 13 15:33:17 jon kernel: Origin = "AuthenticAMD" Id = 0x681 Family = 6 Model = 8 Stepping = 1 Jun 13 15:33:17 jon kernel: Features=0x383fbff Jun 13 15:33:17 jon kernel: AMD Features=0xc0400800 Jun 13 15:33:17 jon kernel: real memory = 671088640 (640 MB) Jun 13 15:33:17 jon kernel: avail memory = 647278592 (617 MB) Jun 13 15:33:17 jon kernel: ACPI Error: A valid RSDP was not found (20101013/tbxfroot-309) Jun 13 15:33:17 jon kernel: MPTable: Jun 13 15:33:17 jon kernel: ioapic0: Assuming intbase of 0 Jun 13 15:33:17 jon kernel: ioapic0 irqs 0-23 on motherboard Jun 13 15:33:17 jon kernel: kbd1 at kbdmux0 Jun 13 15:33:17 jon kernel: ACPI Error: A valid RSDP was not found (20101013/tbxfroot-309) Jun 13 15:33:17 jon kernel: ACPI: Table initialisation failed: AE_NOT_FOUND Jun 13 15:33:17 jon kernel: ACPI: Try disabling either ACPI or apic support. Jun 13 15:33:17 jon kernel: pcib0: pcibus 0 on motherboard Jun 13 15:33:17 jon kernel: pci0: on pcib0 Jun 13 15:33:17 jon kernel: agp0: on hostb0 Jun 13 15:33:17 jon kernel: agp0: aperture size is 4M Jun 13 15:33:17 jon kernel: pcib1: at device 1.0 on pci0 Jun 13 15:33:17 jon kernel: pci1: on pcib1 Jun 13 15:33:17 jon kernel: isab0: at device 7.0 on pci0 Jun 13 15:33:17 jon kernel: isa0: on isab0 Jun 13 15:33:17 jon kernel: atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xd000-0xd00f at device 7.1 on pci0 Jun 13 15:33:17 jon kernel: atapci0: Correcting VIA config for southbridge data corruption bug Jun 13 15:33:17 jon kernel: ata0: on atapci0 Jun 13 15:33:17 jon kernel: ata0: [ITHREAD] Jun 13 15:33:17 jon kernel: ata1: on atapci0 Jun 13 15:33:17 jon kernel: ata1: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci1: port 0xdc00-0xdc7f,0xe000-0xe0ff mem 0xd3461000-0xd3461fff,0xd3420000-0xd343ffff irq 11 at device 8.0 on pci0 Jun 13 15:33:17 jon kernel: atapci1: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci1: [ITHREAD] Jun 13 15:33:17 jon kernel: ata2: on atapci1 Jun 13 15:33:17 jon kernel: ata2: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata2: [ITHREAD] Jun 13 15:33:17 jon kernel: ata3: on atapci1 Jun 13 15:33:17 jon kernel: ata3: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata3: [ITHREAD] Jun 13 15:33:17 jon kernel: ata4: on atapci1 Jun 13 15:33:17 jon kernel: ata4: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata4: [ITHREAD] Jun 13 15:33:17 jon kernel: ata5: on atapci1 Jun 13 15:33:17 jon kernel: ata5: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata5: [ITHREAD] Jun 13 15:33:17 jon kernel: fxp0: port 0xe400-0xe43f mem 0xd3460000-0xd3460fff,0xd3440000-0xd345ffff irq 10 at device 9.0 on pci0 Jun 13 15:33:17 jon kernel: miibus0: on fxp0 Jun 13 15:33:17 jon kernel: inphy0: PHY 1 on miibus0 Jun 13 15:33:17 jon kernel: inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow Jun 13 15:33:17 jon kernel: fxp0: Ethernet address: 00:02:b3:ad:ad:d7 Jun 13 15:33:17 jon kernel: fxp0: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci2: port 0xe800-0xe87f,0xec00-0xecff mem 0xd3462000-0xd3462fff,0xd3400000-0xd341ffff irq 11 at device 12.0 on pci0 Jun 13 15:33:17 jon kernel: atapci2: [ITHREAD] Jun 13 15:33:17 jon kernel: atapci2: [ITHREAD] Jun 13 15:33:17 jon kernel: ata6: on atapci2 Jun 13 15:33:17 jon kernel: ata6: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata6: [ITHREAD] Jun 13 15:33:17 jon kernel: ata7: on atapci2 Jun 13 15:33:17 jon kernel: ata7: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata7: [ITHREAD] Jun 13 15:33:17 jon kernel: ata8: on atapci2 Jun 13 15:33:17 jon kernel: ata8: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata8: [ITHREAD] Jun 13 15:33:17 jon kernel: ata9: on atapci2 Jun 13 15:33:17 jon kernel: ata9: SIGNATURE: 00000101 Jun 13 15:33:17 jon kernel: ata9: [ITHREAD] Jun 13 15:33:17 jon kernel: vgapci0: mem 0xd1000000-0xd1003fff,0xd2000000-0xd27fffff at device 13.0 on pci0 Jun 13 15:33:17 jon kernel: cpu0 on motherboard Jun 13 15:33:17 jon kernel: pmtimer0 on isa0 Jun 13 15:33:17 jon kernel: atrtc0: at port 0x70-0x71 irq 8 pnpid PNP0b00 on isa0 Jun 13 15:33:17 jon kernel: atkbdc0: at port 0x60,0x64 irq 1 pnpid PNP0303 on isa0 Jun 13 15:33:17 jon kernel: atkbd0: irq 1 on atkbdc0 Jun 13 15:33:17 jon kernel: kbd0 at atkbd0 Jun 13 15:33:17 jon kernel: atkbd0: [GIANT-LOCKED] Jun 13 15:33:17 jon kernel: atkbd0: [ITHREAD] Jun 13 15:33:17 jon kernel: unknown: can't assign resources (memory) Jun 13 15:33:17 jon kernel: psm0: irq 12 on atkbdc0 Jun 13 15:33:17 jon kernel: psm0: [GIANT-LOCKED] Jun 13 15:33:17 jon kernel: psm0: [ITHREAD] Jun 13 15:33:17 jon kernel: psm0: model Generic PS/2 mouse, device ID 0 Jun 13 15:33:17 jon kernel: fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 pnpid PNP0700 on isa0 Jun 13 15:33:17 jon kernel: fdc0: [FILTER] Jun 13 15:33:17 jon kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0 Jun 13 15:33:17 jon kernel: orm0: at iomem 0xc0000-0xc7fff pnpid ORM0000 on isa0 Jun 13 15:33:17 jon kernel: sc0: at flags 0x100 on isa0 Jun 13 15:33:17 jon kernel: sc0: VGA <16 virtual consoles, flags=0x300> Jun 13 15:33:17 jon kernel: vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Jun 13 15:33:17 jon kernel: ppc0: parallel port not found. Jun 13 15:33:17 jon kernel: unknown: can't assign resources (memory) Jun 13 15:33:17 jon kernel: Timecounter "TSC" frequency 1992030730 Hz quality 800 Jun 13 15:33:17 jon kernel: Timecounters tick every 1.000 msec Jun 13 15:33:17 jon kernel: ad0: 14664MB at ata0-master UDMA66 Jun 13 15:33:17 jon kernel: ad2: 305245MB at ata1-master UDMA100 Jun 13 15:33:17 jon kernel: ad3: 238475MB at ata1-slave UDMA100 Jun 13 15:33:17 jon kernel: ad4: 476940MB at ata2-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad6: 1907729MB at ata3-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad8: 476940MB at ata4-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad10: 305245MB at ata5-master UDMA100 SATA 1.5Gb/s Jun 13 15:33:17 jon kernel: ad12: 1430799MB at ata6-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad14: 476940MB at ata7-master UDMA100 SATA 1.5Gb/s Jun 13 15:33:17 jon kernel: ad16: 1907729MB at ata8-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: ad18: 953869MB at ata9-master UDMA100 SATA 3Gb/s Jun 13 15:33:17 jon kernel: GEOM: ad2s1: geometry does not match label (16h,63s != 1h,63s). Jun 13 15:33:17 jon kernel: GEOM: ad3s1: geometry does not match label (16h,63s != 1h,63s). Jun 13 15:33:17 jon kernel: Trying to mount root from ufs:/dev/ad0s1a Jun 13 15:33:17 jon kernel: WARNING: / was not properly dismounted Jun 13 15:33:17 jon kernel: WARNING: /usr was not properly dismounted Jun 13 15:33:17 jon kernel: WARNING: /var was not properly dismounted Jun 13 15:33:17 jon kernel: /var: mount pending error: blocks 8 files 2 Jun 13 15:33:18 jon kernel: fxp0: link state changed to UP Jun 13 15:33:19 jon kernel: fxp0: link state changed to DOWN Jun 13 15:33:21 jon kernel: fxp0: link state changed to UP Jun 13 15:33:22 jon dhclient: New IP Address (fxp0): 192.168.100.9 Jun 13 15:33:22 jon kernel: fxp0: link state changed to DOWN Jun 13 15:33:22 jon dhclient: New Subnet Mask (fxp0): 255.255.255.0 Jun 13 15:33:22 jon dhclient: New Broadcast Address (fxp0): 192.168.100.255 Jun 13 15:33:22 jon dhclient: New Routers (fxp0): 192.168.100.250 Jun 13 15:33:24 jon kernel: fxp0: link state changed to UP Jun 13 15:38:07 jon fsck: /dev/ad0s1e: 285849 files, 1607328 used, 2887803 free (117739 frags, 346258 blocks, 2.6% fragmentation) [after that only messages from fsck]