Date: Fri, 18 Jun 2010 01:11:17 -0400 From: Jerry Bell <jerry@nrdx.com> To: freebsd-questions@freebsd.org Subject: Need help with SATA disk timing out in 8.1 Beta Message-ID: <4C1AFFF5.1070000@nrdx.com>
next in thread | raw e-mail | index | archive | help
I am having all sorts of problems with drives in a new server. I have a 450G sata drive that hold my root partition, works great, no issues. I have a second, 1TB drive that has been all sorts of trouble. When writing to this disk, I occasionally see errors like this: Jun 17 07:40:36 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1564898207 Jun 17 07:40:36 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1564898207 Jun 17 07:57:12 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1565052351 Jun 17 07:57:12 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1565052351 Jun 17 09:45:12 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1565983775 Jun 17 09:45:12 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1565983775 Jun 17 09:50:24 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1566082719 Jun 17 09:50:24 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1566082719 Jun 17 10:01:25 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1566358623 Jun 17 10:01:25 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1566358623 Jun 17 10:02:59 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1566387807 Jun 17 10:02:59 www3 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1566387807 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=43231 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=57567 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=773471 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=786271 Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=810079 Jun 17 10:19:00 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=76767 Jun 17 10:19:00 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=784479 Last week, I asked the datacenter to provide me with a new 1TB drive, and they did. It formatted fine, no errors. I copied files to it, ran bonnie, etc, and no signs of any DMA issues. Until this morning when I started having the errors again. If I run a tool like bonnie, I am very easily reproduce the errors. After some research, I find that these errors are often indicative of SATA cable problems. The datacenter replaced the cable, and the problem continues. The datacenter moved the sata cable to a new SATA port, and the problem continues The datacenter adds a BRAND NEW 1TB drive (now the system has 3 drive), and I am unable to format the drive because of these errors: ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=168172351 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=602334847 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=602334847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=427014463 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=427014463 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=15425407 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=471408895 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=471408895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=91422655 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=203161183 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1211817727 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1211817727 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=37998847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=309632575 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=309632575 ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=24831007 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=59067391 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=497744575 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=497744575 ad10: FAILURE - WRITE_MUL status=51<READY,DSC,ERROR> error=84<ICRC,ABORTED> LBA=1128895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=13920511 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=547029919 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=547029919 So, the problem has occurred on 3 different drives. SATA ports and cables do not appear to impact the problem. The primary 450GB drive does not have any problems. I have used atacontrol to lower the speed all the way down to UDMA 33, with the same result. I am at the end of my ability to troubleshoot this. Could this be a problem with FreeBSD 8.1 beta and not the drives after all? I have seen a reference to a patch for previous versions that increase the DMA timeout time to 10 or 15 seconds, which fixes problems, but I am not certain that would fix my particular issue. Here is the dmesg output: Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-PRERELEASE #1: Thu Jun 10 23:52:29 UTC 2010 jerry@www3.stelesys.com:/usr/obj/usr/src/sys/JERRY amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU X3450 @ 2.67GHz (2674.98-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x106e5 Family = 6 Model = 1e Stepping = 5 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant real memory = 6442450944 (6144 MB) avail memory = 6138769408 (5854 MB) ACPI APIC Table: <020910 APIC2308> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 ioapic0: Changing APIC ID to 8 ioapic0 <Version 2.0> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <020910 RSDT2308> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, bdf00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: <ACPI CPU> on acpi0 ACPI Warning: Incorrect checksum in table [OEMB] - 0x86, should be 0x85 (20100331/tbutils-354) cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 cpu4: <ACPI CPU> on acpi0 cpu5: <ACPI CPU> on acpi0 cpu6: <ACPI CPU> on acpi0 cpu7: <ACPI CPU> on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> irq 16 at device 3.0 on pci0 pci1: <ACPI PCI bus> on pcib1 vgapci0: <VGA-compatible display> mem 0xfa000000-0xfaffffff,0xd0000000-0xdfffffff,0xf9000000-0xf9ffffff irq 16 at device 0.0 on pci1 pci0: <base peripheral> at device 8.0 (no driver attached) pci0: <base peripheral> at device 8.1 (no driver attached) pci0: <base peripheral> at device 8.2 (no driver attached) pci0: <base peripheral> at device 8.3 (no driver attached) pci0: <base peripheral> at device 16.0 (no driver attached) pci0: <base peripheral> at device 16.1 (no driver attached) pci0: <simple comms> at device 22.0 (no driver attached) ehci0: <Intel PCH USB 2.0 controller USB-B> mem 0xf8ffe000-0xf8ffe3ff irq 16 at device 26.0 on pci0 ehci0: [ITHREAD] usbus0: EHCI version 1.0 usbus0: <Intel PCH USB 2.0 controller USB-B> on ehci0 pci0: <multimedia, HDA> at device 27.0 (no driver attached) pcib2: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0 pci6: <ACPI PCI bus> on pcib2 atapci0: <JMicron JMB368 UDMA133 controller> port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0xe400-0xe40f irq 16 at device 0.0 on pci6 atapci0: [ITHREAD] ata2: <ATA channel 0> on atapci0 ata2: [ITHREAD] pcib3: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0 pci5: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0 pci4: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> irq 17 at device 28.4 on pci0 pci3: <ACPI PCI bus> on pcib5 re0: <RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet> port 0xd800-0xd8ff mem 0xf7fff000-0xf7ffffff,0xf7ff8000-0xf7ffbfff irq 16 at device 0.0 on pci3 re0: Using 1 MSI messages re0: Chip rev. 0x28000000 re0: MAC rev. 0x00000000 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S/8211B media interface> PHY 1 on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto re0: Ethernet address: e0:cb:4e:ed:05:a0 re0: [FILTER] pcib6: <ACPI PCI-PCI bridge> irq 16 at device 28.5 on pci0 pci2: <ACPI PCI bus> on pcib6 ehci1: <Intel PCH USB 2.0 controller USB-A> mem 0xf8ffd000-0xf8ffd3ff irq 23 at device 29.0 on pci0 ehci1: [ITHREAD] usbus1: EHCI version 1.0 usbus1: <Intel PCH USB 2.0 controller USB-A> on ehci1 pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci7: <ACPI PCI bus> on pcib7 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <Intel PCH SATA300 controller> port 0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xb480-0xb483,0xb400-0xb40f,0xb080-0xb08f irq 21 at device 31.2 on pci0 atapci1: [ITHREAD] ata3: <ATA channel 0> on atapci1 ata3: [ITHREAD] ata4: <ATA channel 1> on atapci1 ata4: [ITHREAD] pci0: <serial bus, SMBus> at device 31.3 (no driver attached) atapci2: <Intel PCH SATA300 controller> port 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc807,0xc480-0xc483,0xc400-0xc40f,0xc080-0xc08f irq 21 at device 31.5 on pci0 atapci2: [ITHREAD] ata5: <ATA channel 0> on atapci2 ata5: [ITHREAD] ata6: <ATA channel 1> on atapci2 ata6: [ITHREAD] acpi_button0: <Power Button> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] orm0: <ISA Option ROM> at iomem 0xce800-0xcf7ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] ppc0: cannot reserve I/O port range est0: <Enhanced SpeedStep Frequency Control> on cpu0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 est1: <Enhanced SpeedStep Frequency Control> on cpu1 p4tcc1: <CPU Frequency Thermal Control> on cpu1 est2: <Enhanced SpeedStep Frequency Control> on cpu2 p4tcc2: <CPU Frequency Thermal Control> on cpu2 est3: <Enhanced SpeedStep Frequency Control> on cpu3 p4tcc3: <CPU Frequency Thermal Control> on cpu3 est4: <Enhanced SpeedStep Frequency Control> on cpu4 p4tcc4: <CPU Frequency Thermal Control> on cpu4 est5: <Enhanced SpeedStep Frequency Control> on cpu5 p4tcc5: <CPU Frequency Thermal Control> on cpu5 est6: <Enhanced SpeedStep Frequency Control> on cpu6 p4tcc6: <CPU Frequency Thermal Control> on cpu6 est7: <Enhanced SpeedStep Frequency Control> on cpu7 p4tcc7: <CPU Frequency Thermal Control> on cpu7 Timecounters tick every 1.000 msec IP Filter: v4.1.28 initialized. Default = pass all, Logging = enabled usbus0: 480Mbps High Speed USB v2.0 usbus1: 480Mbps High Speed USB v2.0 ad7: 476940MB <Seagate ST3500418AS CC38> at ata3-slave UDMA100 SATA 3Gb/s ugen0.1: <Intel> at usbus0 uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0 ugen1.1: <Intel> at usbus1 uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 GEOM: ad7s1: geometry does not match label (255h,63s != 16h,63s). ad9: 953869MB <WDC WD10EALS-00Z8A0 05.01D05> at ata4-slave UDMA100 SATA 3Gb/s ad10: 953869MB <WDC WD10EALS-00Z8A0 05.01D05> at ata5-master UDMA100 SATA 3Gb/s SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! Root mount waiting for: usbus1 usbus0 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered Root mount waiting for: usbus1 usbus0 ugen0.2: <vendor 0x8087> at usbus0 uhub2: <vendor 0x8087 product 0x0020, class 9/0, rev 2.00/0.00, addr 2> on usbus0 ugen1.2: <vendor 0x8087> at usbus1 uhub3: <vendor 0x8087 product 0x0020, class 9/0, rev 2.00/0.00, addr 2> on usbus1 uhub2: 6 ports with 6 removable, self powered Root mount waiting for: usbus1 uhub3: 8 ports with 8 removable, self powered Trying to mount root from ufs:/dev/ad7s1a re0: link state changed to UP ugen1.3: <Peppercon AG> at usbus1 ukbd0: <Peppercon AG Multidevice, class 0/0, rev 2.00/0.01, addr 3> on usbus1 kbd2 at ukbd0 ums0: <Peppercon AG Multidevice, class 0/0, rev 2.00/0.01, addr 3> on usbus1 ums0: 3 buttons and [Z] coordinates ID=0 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=168172351 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=602334847 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=602334847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=427014463 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=427014463 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=15425407 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=471408895 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=471408895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=91422655 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=203161183 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=1211817727 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1211817727 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=37998847 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=309632575 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=309632575 ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=24831007 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=59067391 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=497744575 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=497744575 ad10: FAILURE - WRITE_MUL status=51<READY,DSC,ERROR> error=84<ICRC,ABORTED> LBA=1128895 ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=13920511 ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=547029919 ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=547029919 Please help. Thank you, Jerry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C1AFFF5.1070000>