From owner-freebsd-questions@FreeBSD.ORG Fri Jun 18 05:11:37 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DEEE8106566B for ; Fri, 18 Jun 2010 05:11:37 +0000 (UTC) (envelope-from jerry@nrdx.com) Received: from www2.stelesys.com (www2.stelesys.com [69.61.61.170]) by mx1.freebsd.org (Postfix) with ESMTP id 8BAC18FC15 for ; Fri, 18 Jun 2010 05:11:36 +0000 (UTC) Received: from c-98-219-49-14.hsd1.ga.comcast.net ([98.219.49.14] helo=[192.168.0.101]) by www2.stelesys.com with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1OPTra-000LlF-6V for freebsd-questions@freebsd.org; Fri, 18 Jun 2010 01:11:34 -0400 Message-ID: <4C1AFFF5.1070000@nrdx.com> Date: Fri, 18 Jun 2010 01:11:17 -0400 From: Jerry Bell User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - www2.stelesys.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [26 6] / [26 6] X-AntiAbuse: Sender Address Domain - nrdx.com Subject: Need help with SATA disk timing out in 8.1 Beta X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jun 2010 05:11:37 -0000 I am having all sorts of problems with drives in a new server. I have a 450G sata drive that hold my root partition, works great, no issues. I have a second, 1TB drive that has been all sorts of trouble. When writing to this disk, I occasionally see errors like this: Jun 17 07:40:36 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1564898207 Jun 17 07:40:36 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1564898207 Jun 17 07:57:12 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1565052351 Jun 17 07:57:12 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1565052351 Jun 17 09:45:12 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1565983775 Jun 17 09:45:12 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1565983775 Jun 17 09:50:24 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1566082719 Jun 17 09:50:24 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1566082719 Jun 17 10:01:25 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1566358623 Jun 17 10:01:25 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1566358623 Jun 17 10:02:59 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1566387807 Jun 17 10:02:59 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1566387807 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=43231 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=57567 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=773471 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=786271 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=810079 Jun 17 10:19:00 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=76767 Jun 17 10:19:00 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=784479 Last week, I asked the datacenter to provide me with a new 1TB drive, and they did. It formatted fine, no errors. I copied files to it, ran bonnie, etc, and no signs of any DMA issues. Until this morning when I started having the errors again. If I run a tool like bonnie, I am very easily reproduce the errors. After some research, I find that these errors are often indicative of SATA cable problems. The datacenter replaced the cable, and the problem continues. The datacenter moved the sata cable to a new SATA port, and the problem continues The datacenter adds a BRAND NEW 1TB drive (now the system has 3 drive), and I am unable to format the drive because of these errors: ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=168172351 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=602334847 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=602334847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=427014463 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=427014463 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=15425407 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=471408895 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=471408895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=91422655 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=203161183 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1211817727 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1211817727 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=37998847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=309632575 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=309632575 ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=24831007 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=59067391 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=497744575 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=497744575 ad10: FAILURE - WRITE_MUL status=51 error=84 LBA=1128895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=13920511 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=547029919 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=547029919 So, the problem has occurred on 3 different drives. SATA ports and cables do not appear to impact the problem. The primary 450GB drive does not have any problems. I have used atacontrol to lower the speed all the way down to UDMA 33, with the same result. I am at the end of my ability to troubleshoot this. Could this be a problem with FreeBSD 8.1 beta and not the drives after all? I have seen a reference to a patch for previous versions that increase the DMA timeout time to 10 or 15 seconds, which fixes problems, but I am not certain that would fix my particular issue. Here is the dmesg output: Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-PRERELEASE #1: Thu Jun 10 23:52:29 UTC 2010 jerry@www3.stelesys.com:/usr/obj/usr/src/sys/JERRY amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU X3450 @ 2.67GHz (2674.98-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x106e5 Family = 6 Model = 1e Stepping = 5 Features=0xbfebfbff Features2=0x98e3fd AMD Features=0x28100800 AMD Features2=0x1 TSC: P-state invariant real memory = 6442450944 (6144 MB) avail memory = 6138769408 (5854 MB) ACPI APIC Table: <020910 APIC2308> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 ioapic0: Changing APIC ID to 8 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <020910 RSDT2308> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, bdf00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 ACPI Warning: Incorrect checksum in table [OEMB] - 0x86, should be 0x85 (20100331/tbutils-354) cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 acpi_hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 16 at device 3.0 on pci0 pci1: on pcib1 vgapci0: mem 0xfa000000-0xfaffffff,0xd0000000-0xdfffffff,0xf9000000-0xf9ffffff irq 16 at device 0.0 on pci1 pci0: at device 8.0 (no driver attached) pci0: at device 8.1 (no driver attached) pci0: at device 8.2 (no driver attached) pci0: at device 8.3 (no driver attached) pci0: at device 16.0 (no driver attached) pci0: at device 16.1 (no driver attached) pci0: at device 22.0 (no driver attached) ehci0: mem 0xf8ffe000-0xf8ffe3ff irq 16 at device 26.0 on pci0 ehci0: [ITHREAD] usbus0: EHCI version 1.0 usbus0: on ehci0 pci0: at device 27.0 (no driver attached) pcib2: irq 17 at device 28.0 on pci0 pci6: on pcib2 atapci0: port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0xe400-0xe40f irq 16 at device 0.0 on pci6 atapci0: [ITHREAD] ata2: on atapci0 ata2: [ITHREAD] pcib3: irq 18 at device 28.2 on pci0 pci5: on pcib3 pcib4: irq 19 at device 28.3 on pci0 pci4: on pcib4 pcib5: irq 17 at device 28.4 on pci0 pci3: on pcib5 re0: port 0xd800-0xd8ff mem 0xf7fff000-0xf7ffffff,0xf7ff8000-0xf7ffbfff irq 16 at device 0.0 on pci3 re0: Using 1 MSI messages re0: Chip rev. 0x28000000 re0: MAC rev. 0x00000000 miibus0: on re0 rgephy0: PHY 1 on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto re0: Ethernet address: e0:cb:4e:ed:05:a0 re0: [FILTER] pcib6: irq 16 at device 28.5 on pci0 pci2: on pcib6 ehci1: mem 0xf8ffd000-0xf8ffd3ff irq 23 at device 29.0 on pci0 ehci1: [ITHREAD] usbus1: EHCI version 1.0 usbus1: on ehci1 pcib7: at device 30.0 on pci0 pci7: on pcib7 isab0: at device 31.0 on pci0 isa0: on isab0 atapci1: port 0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xb480-0xb483,0xb400-0xb40f,0xb080-0xb08f irq 21 at device 31.2 on pci0 atapci1: [ITHREAD] ata3: on atapci1 ata3: [ITHREAD] ata4: on atapci1 ata4: [ITHREAD] pci0: at device 31.3 (no driver attached) atapci2: port 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc807,0xc480-0xc483,0xc400-0xc40f,0xc080-0xc08f irq 21 at device 31.5 on pci0 atapci2: [ITHREAD] ata5: on atapci2 ata5: [ITHREAD] ata6: on atapci2 ata6: [ITHREAD] acpi_button0: on acpi0 atrtc0: port 0x70-0x71 irq 8 on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] orm0: at iomem 0xce800-0xcf7ff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] ppc0: cannot reserve I/O port range est0: on cpu0 p4tcc0: on cpu0 est1: on cpu1 p4tcc1: on cpu1 est2: on cpu2 p4tcc2: on cpu2 est3: on cpu3 p4tcc3: on cpu3 est4: on cpu4 p4tcc4: on cpu4 est5: on cpu5 p4tcc5: on cpu5 est6: on cpu6 p4tcc6: on cpu6 est7: on cpu7 p4tcc7: on cpu7 Timecounters tick every 1.000 msec IP Filter: v4.1.28 initialized. Default = pass all, Logging = enabled usbus0: 480Mbps High Speed USB v2.0 usbus1: 480Mbps High Speed USB v2.0 ad7: 476940MB at ata3-slave UDMA100 SATA 3Gb/s ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 GEOM: ad7s1: geometry does not match label (255h,63s != 16h,63s). ad9: 953869MB at ata4-slave UDMA100 SATA 3Gb/s ad10: 953869MB at ata5-master UDMA100 SATA 3Gb/s SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! Root mount waiting for: usbus1 usbus0 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered Root mount waiting for: usbus1 usbus0 ugen0.2: at usbus0 uhub2: on usbus0 ugen1.2: at usbus1 uhub3: on usbus1 uhub2: 6 ports with 6 removable, self powered Root mount waiting for: usbus1 uhub3: 8 ports with 8 removable, self powered Trying to mount root from ufs:/dev/ad7s1a re0: link state changed to UP ugen1.3: at usbus1 ukbd0: on usbus1 kbd2 at ukbd0 ums0: on usbus1 ums0: 3 buttons and [Z] coordinates ID=0 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=168172351 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=602334847 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=602334847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=427014463 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=427014463 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=15425407 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=471408895 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=471408895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=91422655 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=203161183 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1211817727 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=1211817727 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=37998847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=309632575 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=309632575 ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=24831007 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=59067391 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=497744575 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=497744575 ad10: FAILURE - WRITE_MUL status=51 error=84 LBA=1128895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=13920511 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=547029919 ad10: FAILURE - WRITE_DMA48 status=51 error=10 LBA=547029919 Please help. Thank you, Jerry