Date: Thu, 17 Feb 2005 09:56:50 -0600 (CST) From: Karl Denninger <karl@FS.denninger.net> To: FreeBSD-gnats-submit@FreeBSD.org Subject: i386/77643: SATA PCI controllers fail with WRITE_DMA errors under GMIRROR Message-ID: <200502171556.j1HFuolx028986@FS.denninger.net> Resent-Message-ID: <200502171600.j1HG0lrU060335@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 77643 >Category: i386 >Synopsis: SATA PCI controllers fail with WRITE_DMA errors under GMIRROR >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Feb 17 16:00:47 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Karl Denninger >Release: FreeBSD 5.3-STABLE i386 >Organization: Karls Sushi and Packet Smashers >Environment: System: FreeBSD FS.denninger.net 5.3-STABLE FreeBSD 5.3-STABLE #1: Wed Feb 2 22:57:48 CST 2005 karl@FS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP i386 Sources from 1/31/2005; CVS Commitlogs checked for potential relavent changes, none found. >Description: SATA controllers on a PCI bus during GMIRROR use fail randomly when both channels are in use under actual read/write I/O loads. Problem does NOT exist under saturation read or saturation write test loads, where no mix of accesses is done (e.g. a "DD" will not provoke the problem, a rebuild of a RAID 1 GEOM mirror does not provoke the problem, but once the rebuild is complete and all disks are part of the mirror it will fail within a couple of minutes to a couple of hours under production loads) Occurs with both BusTek and Adaptec PCI SATA cards. Occurs with both Maxtor DiamondMax10 and Hitachi Deskstar drives. Same drive swapped onto motherboard controller DOES NOT trigger problem, irrespective of load. Motherboard SATA adapter DOES NOT exhibit problem, irrespective of load or whether both channels are in use. However, motherboard controller is different brand/make/model. Specifically: atapci0: <SiI 3112 SATA150 controller> port 0xcef0-0xceff,0xcedc-0xcedf,0xcee8-0xceef,0xced8-0xcedb,0xcee0-0xcee7 mem 0xfe7dfe00-0xfe7dffff irq 21 at device 0.0 on pci2 atapci2: <Intel ICH5 SATA150 controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20-0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 irq 18 at device 31.2 on pci0 atapci2 is the on-motherboard controller, atapci0 is the PCI bus controller. Both Bustek and Adaptec controllers which have been tested, both of which exhibit the problem, of SiI chipset-based. System has been updated to and is running the latest (A08) BIOS revision available. Drives and controllers both certify clean using manufacturer utilities, and disks, when run on the motherboard controller, do not exhibit the problem. Error and DMESG output exhibited below: Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-STABLE #1: Wed Feb 2 22:57:48 CST 2005 karl@FS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2394.01-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs real memory = 267862016 (255 MB) avail memory = 252456960 (240 MB) ACPI APIC Table: <DELL PE400SC> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 2 ioapic0 <Version 2.0> irqs 0-23 on motherboard npx0: [FAST] npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <DELL PE400SC> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <Intel 82875P host to AGP bridge> mem 0xe8000000-0xefffffff at device 0.0 on pci0 pcib1: <PCI-PCI bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pci1: <display, VGA> at device 0.0 (no driver attached) uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0xff80-0xff9f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0xff60-0xff7f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xff40-0xff5f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: <Intel 82801EB (ICH5) USB controller USB-D> port 0xff20-0xff3f irq 16 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] usb3: <Intel 82801EB (ICH5) USB controller USB-D> on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered pci0: <serial bus, USB> at device 29.7 (no driver attached) pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci2: <ACPI PCI bus> on pcib2 atapci0: <SiI 3112 SATA150 controller> port 0xcef0-0xceff,0xcedc-0xcedf,0xcee8-0xceef,0xced8-0xcedb,0xcee0-0xcee7 mem 0xfe7dfe00-0xfe7dffff irq 21 at device 0.0 on pci2 ata2: channel #0 on atapci0 ata3: channel #1 on atapci0 rp0: <RocketPort PCI> port 0xcf00-0xcf3f irq 17 at device 2.0 on pci2 RocketPort0 (Version 3.02) 4 ports. pcib3: <PCI-PCI bridge> at device 3.0 on pci2 pci3: <PCI bus> on pcib3 fxp0: <Intel 82558 Pro/100 Ethernet> port 0xbf80-0xbf9f mem 0xfe400000-0xfe4fffff,0xf8001000-0xf8001fff irq 19 at device 4.0 on pci3 miibus0: <MII bus> on fxp0 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:d0:b7:6f:ce:e8 fxp1: <Intel 82558 Pro/100 Ethernet> port 0xbfe0-0xbfff mem 0xfe500000-0xfe5fffff,0xf8000000-0xf8000fff irq 18 at device 5.0 on pci3 miibus1: <MII bus> on fxp1 inphy1: <i82555 10/100 media interface> on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:d0:b7:6f:ce:e9 em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0xcf40-0xcf7f mem 0xfe7e0000-0xfe7fffff irq 18 at device 12.0 on pci2 em0: Ethernet address: 00:0c:f1:c9:df:c5 em0: Speed:N/A Duplex:N/A isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <Intel ICH5 UDMA100 controller> port 0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.1 on pci0 ata0: channel #0 on atapci1 ata1: channel #1 on atapci1 atapci2: <Intel ICH5 SATA150 controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20-0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 irq 18 at device 31.2 on pci0 ata4: channel #0 on atapci2 ata5: channel #1 on atapci2 pci0: <serial bus, SMBus> at device 31.3 (no driver attached) pci0: <multimedia, audio> at device 31.5 (no driver attached) fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A orm0: <ISA Option ROMs> at iomem 0xcc800-0xcffff,0xcb000-0xcc7ff,0xc0000-0xcafff on isa0 pmtimer0 on isa0 ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 RTC BIOS diagnostic error 18<memory_size,fixed_disk> Timecounters tick every 10.000 msec ipfw2 initialized, divert enabled, rule-based forwarding disabled, default to deny, logging disabled acd0: CDROM <Lite-On LTN486S 48x Max/YDS6> at ata1-master UDMA33 em0: Link is up 100 Mbps Full Duplex ad8: 239372MB <Maxtor 6B250S0/BANC1980> [486344/16/63] at ata4-master SATA150 ad10: 238475MB <HDS722525VLSA80/V36OA63A> [484521/16/63] at ata5-master SATA150 GEOM_MIRROR: Device boot created (id=1131801609). GEOM_MIRROR: Device boot: provider ad8s1 detected. GEOM_MIRROR: Device boot: provider ad10s1 detected. GEOM_MIRROR: Force device boot start due to timeout. GEOM_MIRROR: Device boot: provider ad10s1 activated. GEOM_MIRROR: Device boot: provider ad8s1 activated. GEOM_MIRROR: Device boot: provider mirror/boot launched. SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/mirror/boota em0: Link is up 100 Mbps Full Duplex ad4: 238475MB <HDS722525VLSA80/V36OA63A> [484521/16/63] at ata2-master SATA150 GEOM_MIRROR: Component ad4s1 (device boot) broken, skipping. GEOM_MIRROR: Cannot add disk ad4s1 to boot (error=22). ad6: 239372MB <Maxtor 6B250S0/BANC1B70> [486344/16/63] at ata3-master SATA150 GEOM_MIRROR: Device boot: provider ad6s1 detected. GEOM_MIRROR: Device boot: rebuilding provider ad6s1. GEOM_MIRROR: Device boot: provider ad4s1 detected. GEOM_MIRROR: Device boot: rebuilding provider ad4s1. GEOM_MIRROR: Device boot: rebuilding provider ad6s1 finished. GEOM_MIRROR: Device boot: provider ad6s1 activated. GEOM_MIRROR: Device boot: rebuilding provider ad4s1 finished. GEOM_MIRROR: Device boot: provider ad4s1 activated. GEOM_MIRROR: Device boot: provider ad4s1 disconnected. GEOM_MIRROR: Device boot: provider ad4s1 detected. GEOM_MIRROR: Device boot: rebuilding provider ad4s1. ad6: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=245216575 ad6: FAILURE - WRITE_DMA timed out GEOM_MIRROR: Request failed (error=5). ad6s1[WRITE(offset=125550854144, length=16384)] GEOM_MIRROR: Device boot: provider ad6s1 disconnected. GEOM_MIRROR: Device boot: rebuilding provider ad4s1 finished. GEOM_MIRROR: Device boot: provider ad4s1 activated. GEOM_MIRROR: Device boot: provider ad6s1 detected. GEOM_MIRROR: Device boot: rebuilding provider ad6s1. ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=27151007 ad4: FAILURE - WRITE_DMA timed out GEOM_MIRROR: Request failed (error=5). ad4s1[WRITE(offset=13901283328, length=16384)] GEOM_MIRROR: Device boot: provider ad4s1 disconnected. GEOM_MIRROR: Device boot: provider ad4s1 detected. GEOM_MIRROR: Device boot: rebuilding provider ad4s1. >How-To-Repeat: Build GEOM mirrored system with a secondary controller. Insert two additional disks into RAID 1 array so as to have four members. When rebuild completes on the two additional members on the secondary controller, normal system load will cause one of the two disks to detach with the above error. >Fix: Unknown. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200502171556.j1HFuolx028986>