Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Feb 2005 09:56:50 -0600 (CST)
From:      Karl Denninger <karl@FS.denninger.net>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   i386/77643: SATA PCI controllers fail with WRITE_DMA errors under GMIRROR
Message-ID:  <200502171556.j1HFuolx028986@FS.denninger.net>
Resent-Message-ID: <200502171600.j1HG0lrU060335@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         77643
>Category:       i386
>Synopsis:       SATA PCI controllers fail with WRITE_DMA errors under GMIRROR
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb 17 16:00:47 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Karl Denninger
>Release:        FreeBSD 5.3-STABLE i386
>Organization:
Karls Sushi and Packet Smashers
>Environment:
System: FreeBSD FS.denninger.net 5.3-STABLE FreeBSD 5.3-STABLE #1: Wed Feb 2 22:57:48 CST 2005 karl@FS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP i386

Sources from 1/31/2005; CVS Commitlogs checked for potential relavent
changes, none found.

>Description:

	SATA controllers on a PCI bus during GMIRROR use fail randomly when 
	both channels are in use under actual read/write I/O loads.  Problem 
	does NOT exist under saturation read or saturation write test loads, 
	where no mix of accesses is done (e.g. a "DD" will not provoke the 
	problem, a rebuild of a RAID 1 GEOM mirror does not provoke the 
	problem, but once the rebuild is complete and all disks are part 
	of the mirror it will fail within a couple of minutes to a couple 
	of hours under production loads)

	Occurs with both BusTek and Adaptec PCI SATA cards.  Occurs with
	both Maxtor DiamondMax10 and Hitachi Deskstar drives.  Same drive
	swapped onto motherboard controller DOES NOT trigger problem,
	irrespective of load.  Motherboard SATA adapter DOES NOT exhibit
	problem, irrespective of load or whether both channels are in use.

	However, motherboard controller is different brand/make/model.

	Specifically:

	atapci0: <SiI 3112 SATA150 controller> port
	0xcef0-0xceff,0xcedc-0xcedf,0xcee8-0xceef,0xced8-0xcedb,0xcee0-0xcee7
	mem 0xfe7dfe00-0xfe7dffff irq 21 at device 0.0 on pci2

	atapci2: <Intel ICH5 SATA150 controller> port
	0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20-0xfe27,0xfe10-0xfe13,0xfe00-0xfe07
	irq 18 at device 31.2 on pci0

	atapci2 is the on-motherboard controller, atapci0 is the PCI bus
	controller.  Both Bustek and Adaptec controllers which have been
	tested, both of which exhibit the problem, of SiI chipset-based.

	System has been updated to and is running the latest (A08) BIOS 
	revision available.

	Drives and controllers both certify clean using manufacturer
	utilities, and disks, when run on the motherboard controller, do not
	exhibit the problem.

	Error and DMESG output exhibited below:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.3-STABLE #1: Wed Feb  2 22:57:48 CST 2005
    karl@FS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2394.01-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
real memory  = 267862016 (255 MB)
avail memory = 252456960 (240 MB)
ACPI APIC Table: <DELL   PE400SC>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <DELL PE400SC> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <Intel 82875P host to AGP bridge> mem 0xe8000000-0xefffffff at device 0.0 on pci0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0xff80-0xff9f irq 16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0xff60-0xff7f irq 19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xff40-0xff5f irq 18 at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: <Intel 82801EB (ICH5) USB controller USB-D> port 0xff20-0xff3f irq 16 at device 29.3 on pci0
uhci3: [GIANT-LOCKED]
usb3: <Intel 82801EB (ICH5) USB controller USB-D> on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 29.7 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci2: <ACPI PCI bus> on pcib2
atapci0: <SiI 3112 SATA150 controller> port 0xcef0-0xceff,0xcedc-0xcedf,0xcee8-0xceef,0xced8-0xcedb,0xcee0-0xcee7 mem 0xfe7dfe00-0xfe7dffff irq 21 at device 0.0 on pci2
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
rp0: <RocketPort PCI> port 0xcf00-0xcf3f irq 17 at device 2.0 on pci2
RocketPort0 (Version 3.02) 4 ports.
pcib3: <PCI-PCI bridge> at device 3.0 on pci2
pci3: <PCI bus> on pcib3
fxp0: <Intel 82558 Pro/100 Ethernet> port 0xbf80-0xbf9f mem 0xfe400000-0xfe4fffff,0xf8001000-0xf8001fff irq 19 at device 4.0 on pci3
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:d0:b7:6f:ce:e8
fxp1: <Intel 82558 Pro/100 Ethernet> port 0xbfe0-0xbfff mem 0xfe500000-0xfe5fffff,0xf8000000-0xf8000fff irq 18 at device 5.0 on pci3
miibus1: <MII bus> on fxp1
inphy1: <i82555 10/100 media interface> on miibus1
inphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: Ethernet address: 00:d0:b7:6f:ce:e9
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0xcf40-0xcf7f mem 0xfe7e0000-0xfe7fffff irq 18 at device 12.0 on pci2
em0: Ethernet address: 00:0c:f1:c9:df:c5
em0:  Speed:N/A  Duplex:N/A
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel ICH5 UDMA100 controller> port 0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.1 on pci0
ata0: channel #0 on atapci1
ata1: channel #1 on atapci1
atapci2: <Intel ICH5 SATA150 controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20-0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 irq 18 at device 31.2 on pci0
ata4: channel #0 on atapci2
ata5: channel #1 on atapci2
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
pci0: <multimedia, audio> at device 31.5 (no driver attached)
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
orm0: <ISA Option ROMs> at iomem 0xcc800-0xcffff,0xcb000-0xcc7ff,0xc0000-0xcafff on isa0
pmtimer0 on isa0
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
RTC BIOS diagnostic error 18<memory_size,fixed_disk>
Timecounters tick every 10.000 msec
ipfw2 initialized, divert enabled, rule-based forwarding disabled, default to deny, logging disabled
acd0: CDROM <Lite-On LTN486S 48x Max/YDS6> at ata1-master UDMA33
em0: Link is up 100 Mbps Full Duplex
ad8: 239372MB <Maxtor 6B250S0/BANC1980> [486344/16/63] at ata4-master SATA150
ad10: 238475MB <HDS722525VLSA80/V36OA63A> [484521/16/63] at ata5-master SATA150
GEOM_MIRROR: Device boot created (id=1131801609).
GEOM_MIRROR: Device boot: provider ad8s1 detected.
GEOM_MIRROR: Device boot: provider ad10s1 detected.
GEOM_MIRROR: Force device boot start due to timeout.
GEOM_MIRROR: Device boot: provider ad10s1 activated.
GEOM_MIRROR: Device boot: provider ad8s1 activated.
GEOM_MIRROR: Device boot: provider mirror/boot launched.
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/mirror/boota
em0: Link is up 100 Mbps Full Duplex
ad4: 238475MB <HDS722525VLSA80/V36OA63A> [484521/16/63] at ata2-master SATA150
GEOM_MIRROR: Component ad4s1 (device boot) broken, skipping.
GEOM_MIRROR: Cannot add disk ad4s1 to boot (error=22).
ad6: 239372MB <Maxtor 6B250S0/BANC1B70> [486344/16/63] at ata3-master SATA150
GEOM_MIRROR: Device boot: provider ad6s1 detected.
GEOM_MIRROR: Device boot: rebuilding provider ad6s1.
GEOM_MIRROR: Device boot: provider ad4s1 detected.
GEOM_MIRROR: Device boot: rebuilding provider ad4s1.
GEOM_MIRROR: Device boot: rebuilding provider ad6s1 finished.
GEOM_MIRROR: Device boot: provider ad6s1 activated.
GEOM_MIRROR: Device boot: rebuilding provider ad4s1 finished.
GEOM_MIRROR: Device boot: provider ad4s1 activated.
GEOM_MIRROR: Device boot: provider ad4s1 disconnected.
GEOM_MIRROR: Device boot: provider ad4s1 detected.
GEOM_MIRROR: Device boot: rebuilding provider ad4s1.
ad6: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=245216575
ad6: FAILURE - WRITE_DMA timed out
GEOM_MIRROR: Request failed (error=5). ad6s1[WRITE(offset=125550854144, length=16384)]
GEOM_MIRROR: Device boot: provider ad6s1 disconnected.
GEOM_MIRROR: Device boot: rebuilding provider ad4s1 finished.
GEOM_MIRROR: Device boot: provider ad4s1 activated.
GEOM_MIRROR: Device boot: provider ad6s1 detected.
GEOM_MIRROR: Device boot: rebuilding provider ad6s1.
ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=27151007
ad4: FAILURE - WRITE_DMA timed out
GEOM_MIRROR: Request failed (error=5). ad4s1[WRITE(offset=13901283328, length=16384)]
GEOM_MIRROR: Device boot: provider ad4s1 disconnected.
GEOM_MIRROR: Device boot: provider ad4s1 detected.
GEOM_MIRROR: Device boot: rebuilding provider ad4s1.


>How-To-Repeat:
	Build GEOM mirrored system with a secondary controller.
	Insert two additional disks into RAID 1 array so as to have 
	four members.

	When rebuild completes on the two additional members on the
	secondary controller, normal system load will cause one of the 
	two disks to detach with the above error.


>Fix:

	Unknown.




>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200502171556.j1HFuolx028986>