Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Feb 2005 22:45:18 GMT
From:      Jason Hitt <jhitt25@swbell.net>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/78216: WRITE_DMA UDMA ICRC errors while copying data to a disk
Message-ID:  <200502282245.j1SMjIJB073949@www.freebsd.org>
Resent-Message-ID: <200502282250.j1SMo7ca004137@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         78216
>Category:       kern
>Synopsis:       WRITE_DMA UDMA ICRC errors while copying data to a disk
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Feb 28 22:50:07 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Jason Hitt
>Release:        FreeBSD 5.3-STABLE i386
>Organization:
>Environment:
FreeBSD calandor 5.3-STABLE FreeBSD 5.3-STABLE #0: Sun Feb 13 22:01:06 CST 2005     root@calandor:/usr/obj/usr/src/sys/FILESERVER_5  i386
>Description:
    My system was configured with 4.10 using vinum with a simple mirroring setup.
    I upgraded to 5.3 and attempted to convert to gmirror.
    I removed /dev/ad2 from my vinum volume and created a gmirror volume on
        it instead (on /dev/ad2s1).
    I then successfully copied all my data from the mounts residing
        on /dev/ad0 to the mounts residing on /dev/ad2 without a single error.
    I rebooted using /dev/ad2 and reset /dev/ad0.
    Upon adding /dev/ad0s1 to the gmirror volume, I immediately began receiving
        errors of the form:
        WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=########
    I then removed /dev/ad0s1 from the gmirror volume, attempted to create
        a new volume on it, and simply copy data from the volume on /dev/ad2s1
        to this new volume.  Again, the exact same results occurred.
    I disabled dma via hw.ata.ata_dma in /boot/loader.conf, and everything
        immediately began working without error.

    Two interesting points about the above process:
        1) I had no problems whatsoever copying nearly 100 gigs of data
           from /dev/ad0 to /dev/ad2 (repeatedly...it re-did the copy three
           times before i decided my new setup met my desires).
        2) After attempting to add /dev/ad0 to the gmirror volume and seeing
           errors, I rebooted my PC to use a hard disk diagnostic tool.
           When the machine rebooted, the BIOS reported the first drive in
           CHS mode, not LBA mode.  Zeroing out the drive and re-fdisking
           corrected this.  Attempting to copy data to the drive caused it
           to re-occur (with the associated WRITE_DMA errors popping up
           as well).

    The only customizations i have made to the config file were to disable
    drivers i do not use (various network cards, some drive controllers...
    basically just hardware i will never own).

    I have two hard disks, each on their own 80 conductor IDE cable.

    Below is my startup dump.

    FreeBSD 5.3-STABLE #0: Sun Feb 13 22:01:06 CST 2005
    root@calandor:/usr/obj/usr/src/sys/FILESERVER_5
    Timecounter "i8254" frequency 1193182 Hz quality 0
    CPU: AMD Duron(tm) processor (798.64-MHz 686-class CPU)
    Origin = "AuthenticAMD"  Id = 0x631  Stepping = 1
    Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
    AMD Features=0xc0440000<RSVD,AMIE,DSP,3DNow!>
    real memory  = 536805376 (511 MB)
    avail memory = 515620864 (491 MB)
    npx0: [FAST]
    npx0: <math processor> on motherboard
    npx0: INT 16 interface
    acpi0: <VIA694 AWRDACPI> on motherboard
    acpi0: Power Button (fixed)
    Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
    acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
    cpu0: <ACPI CPU (3 Cx states)> on acpi0
    acpi_tz0: <Thermal Zone> on acpi0
    acpi_button0: <Power Button> on acpi0
    acpi_button1: <Sleep Button> on acpi0
    pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    pci0: <ACPI PCI bus> on pcib0
    agp0: <VIA Generic host to PCI bridge> mem 0xe0000000-0xe1ffffff at device 0.0 on pci0
    pcib1: <PCI-PCI bridge> at device 1.0 on pci0
    pci1: <PCI bus> on pcib1
    pci1: <display, VGA> at device 0.0 (no driver attached)
    uhci0: <VIA 83C572 USB controller> port 0xd000-0xd01f irq 11 at device 16.0 on pci0
    uhci0: [GIANT-LOCKED]
    usb0: <VIA 83C572 USB controller> on uhci0
    usb0: USB revision 1.0
    uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub0: 2 ports with 2 removable, self powered
    uhci1: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 3 at device 16.1 on pci0
    uhci1: [GIANT-LOCKED]
    usb1: <VIA 83C572 USB controller> on uhci1
    usb1: USB revision 1.0
    uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub1: 2 ports with 2 removable, self powered
    uhci2: <VIA 83C572 USB controller> port 0xd800-0xd81f irq 10 at device 16.2 on pci0
    uhci2: [GIANT-LOCKED]
    usb2: <VIA 83C572 USB controller> on uhci2
    usb2: USB revision 1.0
    uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub2: 2 ports with 2 removable, self powered
    pci0: <serial bus, USB> at device 16.3 (no driver attached)
    isab0: <PCI-ISA bridge> at device 17.0 on pci0
    isa0: <ISA bus> on isab0
    atapci0: <VIA 8235 UDMA133 controller> port 0xdc00-0xdc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
    ata0: channel #0 on atapci0
    ata1: channel #1 on atapci0
    pci0: <multimedia, audio> at device 17.5 (no driver attached)
    vr0: <VIA VT6102 Rhine II 10/100BaseTX> port 0xe800-0xe8ff mem 0xe8001000-0xe80010ff irq 11 at device 18.0 on pci0
    miibus0: <MII bus> on vr0
    ukphy0: <Generic IEEE 802.3u media interface> on miibus0
    ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    vr0: Ethernet address: 00:0d:87:b0:00:55
    fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
    fdc0: [FAST]
    sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
    sio0: type 16550A
    ppc0: <ECP parallel printer port> port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0
    ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
    ppc0: FIFO with 16/16/16 bytes threshold
    ppbus0: <Parallel port bus> on ppc0
    plip0: <PLIP network interface> on ppbus0
    lpt0: <Printer> on ppbus0
    lpt0: Interrupt-driven port
    ppi0: <Parallel I/O> on ppbus0
    atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
    atkbd0: <AT Keyboard> irq 1 on atkbdc0
    kbd0 at atkbd0
    atkbd0: [GIANT-LOCKED]
    psm0: <PS/2 Mouse> irq 12 on atkbdc0
    psm0: [GIANT-LOCKED]
    psm0: model Generic PS/2 mouse, device ID 0
    orm0: <ISA Option ROM> at iomem 0xc0000-0xc9fff on isa0
    pmtimer0 on isa0
    sc0: <System console> at flags 0x100 on isa0
    sc0: VGA <16 virtual consoles, flags=0x300>
    sio1: configured irq 3 not in bitmap of probed irqs 0
    sio1: port may not be enabled
    vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
    Timecounter "TSC" frequency 798642802 Hz quality 800
    Timecounters tick every 10.000 msec
    ad0: 114473MB <WDC WD1200JB-00DUA3/75.13B75> [232581/16/63] at ata0-master PIO4
    ad2: 114473MB <WDC WD1200JB-00EVA0/15.05R15> [232581/16/63] at ata1-master PIO4
    GEOM_MIRROR: Device m0s1 created (id=1279703646).
    GEOM_MIRROR: Device m0s1: provider ad0s1 detected.
    GEOM_MIRROR: Device m0s1: provider ad2s1 detected.
    GEOM_MIRROR: Device m0s1: provider ad2s1 activated.
    GEOM_MIRROR: Device m0s1: provider ad0s1 activated.
    GEOM_MIRROR: Device m0s1: provider mirror/m0s1 launched.
    Mounting root from ufs:/dev/mirror/m0s1a
    Accounting enabled

>How-To-Repeat:
    Unknown if this is repeatable on any random system.  It appears to
    be an issue for many people, however, i did not see any reports of
    multiple drive configurations such as mine.  The fact that my second
    drive had no DMA issues while my first drive did may be revealing.

>Fix:
Workaround: disable dma access via hw.ata.ata_dma in /boot/loader.conf
I have not yet tested various DMA modes other than UDMA100, but PIO4 works flawlessly (albeit quite slowly)
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200502282245.j1SMjIJB073949>