From owner-freebsd-hardware@FreeBSD.ORG Thu Apr 21 18:30:35 2005 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 17C6716A4CE for ; Thu, 21 Apr 2005 18:30:35 +0000 (GMT) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 60DB643D46 for ; Thu, 21 Apr 2005 18:30:34 +0000 (GMT) (envelope-from zettabyte@gmail.com) Received: by rproxy.gmail.com with SMTP id i8so469469rne for ; Thu, 21 Apr 2005 11:30:33 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=NN/TIEFYeA0NbLdyFzvTLqnF1AM3pNHqnMuLCrZ8BbHnvJ/eV31zA6/PpRUqENUD5PNLfxZyZ5+VjLe1z30dEXCw8pW/CgvLMJ0cZH7z77x3IUJaiwepqzfjLVPQ6xo89uFtYPA4410aE0gG9SLv9Y/ZuN86KD4PLlQuFNxdv7Y= Received: by 10.38.96.3 with SMTP id t3mr2442831rnb; Thu, 21 Apr 2005 11:30:31 -0700 (PDT) Received: by 10.38.181.29 with HTTP; Thu, 21 Apr 2005 11:30:30 -0700 (PDT) Message-ID: <86ba954f05042111304e36b01c@mail.gmail.com> Date: Thu, 21 Apr 2005 12:30:30 -0600 From: Kendall Gifford To: freebsd-hardware@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Subject: ATA DMA Issues Resurfaced (READ_DMA TIMEOUT/FAILURE) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Kendall Gifford List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Apr 2005 18:30:35 -0000 Howdy. I'm not sure whether hardware or stable is the best list for this, but here is my problem. Any info, recommendations, or help will be greatly appreciated. I've got a server running 5-STABLE (updated/built Jan. 22, 2005). It has been running this kernel, a 5.3-RELEASE kernel, and other 5.x branch versions for the last ten or so months now. Previous to this, it was running 4.9-RELEASE. About ten months ago, when I switched from the 4.x branch to the 5.x branch, I immediately began experiencing WRITE_DMA ICRC errors durring disk activity at seemingly random times. At that time I posted to this list and questions the following message: http://groups-beta.google.com/group/mailing.freebsd.questions/browse_thread= /thread/17fe5871d823f380/a16568320427152e?rnum=3D2#a16568320427152e The gist of the message and my current experience is that my hardware (drives, cables, motherboard controllers, etc.) is definately fine and that I've noticed others posting various, possibly-related issues both before and since I posted the above message. I basically ended up working around the problem by running atacontrol in a /usr/local/etc/rc.d/ script that set my drives to PIO4 mode. I then mostly forgot about the problem as everything has since worked fine--that is until just recently. About a week ago (around April 14, 2005) after performing some updates of some ports and configurations, I decided to perform a reboot (quite extranous, I know, but reassuring to verify that all scripts/configs are properly set up the way I want). Just as my system began starting local services, and just after it ran my custom /usr/local/etc/rc.d atacontrol script, I got the following error messages: Master =3D PIO4 Slave =3D UDMA33 Master =3D PIO4 Slave =3D BIOSPIO ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=3D146793208 ad0: FAILURE - READ_DMA timed out GEOM_VINUM: subdisk raid.p0.s0 is down GEOM_VINUM: plex raid.p0 is down Starting mysql. Fatal trap 12: page fault while in kernel mode fault virtual addess =3D 0xc fault code =3D supervisor read, page not present instruction pointer =3D 0x8:0xc04ba88f stack pointer =3D 0x10:0xd321dc6c frame pointer =3D 0x10:0xd321dc98 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL0, pres 1, def32 1, gran1 processor eflags =3D interrupt enabled, resume, IOPL=3D 0 current process =3D 4 (g_down) trap number =3D 12 panic: page fault Uptime: 28s This is the first time in ten months I've had issues switching to PIO4 mode during local service startup. I really am not quite sure what happened. Anyhow, I've since rebooted into single-user mode, brought my gvinum-mirror plex back up, and the usual stuff to manually bring my system up. But, I did have one attempt at doing this when I foolishly forgot to manually atacontrol my drives before trying to bring my gvinum plex back up. As it was restoring in the background, I remembered and unthinkingly ran atacontrol and again succeeded in bringing my system down in much the same manner as shown above (only this time with WRITE_DMA errors instead of READ_DMA errors). Anyhow, based on this experience, my two guesses as to the cause of my booting problem is that disk activity from starting the system is causing problems before my disks can be put fully in PIO4 mode (and timing is immaculate) or that the current state of things when atacontrol is executed causes problems. As you can see, I have no idea what the real problem is and wonder if any more info on this/these ata/dma problems is available. I wonder if I'd be better off moving to 4.11 until the root cause of these problems is found. Any help or information anyone? System Info: machine=09=09i386 cpu=09=09I686_CPU device=09=09npx device=09=09isa device=09=09pci device=09=09agp options=09=09VESA ident=09=09KERNEL maxusers=09100 options=09=09SCHED_4BSD options=09=09COMPAT_43 options=09=09COMPAT_FREEBSD4 options=09=09SYSVSHM options=09=09SYSVSEM options=09=09SYSVMSG options=09=09KTRACE options=09=09INVARIANT_SUPPORT options=09=09INET device=09=09ether device=09=09loop device=09=09bpf device=09=09tun options=09=09IPFIREWALL options=09=09IPFIREWALL_VERBOSE options=09=09IPFIREWALL_VERBOSE_LIMIT=3D1000 options=09=09IPDIVERT options=09=09FFS options=09=09NFSCLIENT options=09=09NFSSERVER options=09=09CD9660 options=09=09FDESCFS options=09=09MSDOSFS options=09=09NTFS options=09=09NULLFS options=09=09PROCFS options=09=09PSEUDOFS options=09=09UDF options=09=09SOFTUPDATES options=09=09UFS_EXTATTR options=09=09UFS_EXTATTR_AUTOSTART options=09=09UFS_ACL options=09=09GEOM_BSD options=09=09GEOM_CONCAT options=09=09GEOM_GPT options=09=09GEOM_LABEL options=09=09GEOM_MBR options=09=09GEOM_MIRROR options=09=09GEOM_VOL options=09=09QUOTA device=09=09md device=09=09random device=09=09pty device=09=09snp options=09=09_KPOSIX_PRIORITY_SCHEDULING device=09=09atkbdc device=09=09atkbd device=09=09psm device=09=09vga device=09=09splash device=09=09sc options=09=09MAXCONS=3D16 options=09=09SC_HISTORY_SIZE=3D2000 options=09=09SC_TWOBUTTON_MOUSE options=09=09SC_KERNEL_CONS_ATTR=3D(FG_RED|BG_BLACK) options=09=09SC_KERNEL_CONS_REV_ATTR=3D(FG_BLACK|BG_RED) device=09=09ata device=09=09atadisk device=09=09ataraid device=09=09atapicd device=09=09atapifd device=09=09atapist options =09ATA_STATIC_ID device=09=09fdc device=09=09sio device=09=09ppc device=09=09ppbus device=09=09lpt device=09=09ppi device=09=09pmtimer device=09=09mem device=09=09apic device=09=09io device=09=09miibus device=09=09vr device=09=09uhci device=09=09ohci device=09=09usb device=09=09ucom device=09=09ugen device=09=09uhid device=09=09ukbd device=09=09ulpt device=09=09ums device=09=09uscanner hint.atkbdc.0.at=3D"isa" hint.atkbdc.0.port=3D"0x060" hint.atkbd.0.at=3D"atkbdc" hint.atkbd.0.irq=3D"1" hint.atkbd.0.flags=3D"0x1" hint.psm.0.at=3D"atkbdc" hint.psm.0.irq=3D"12" hint.vga.0.at=3D"isa" hint.sc.0.at=3D"isa" hint.sc.0.flags=3D"0x100" hint.fdc.0.at=3D"isa" hint.fdc.0.port=3D"0x3f0" hint.fdc.0.irq=3D"6" hint.fdc.0.drq=3D"2" hint.fd.0.at=3D"fdc0" hint.fd.0.drive=3D"0" hint.fd.1.at=3D"fdc0" hint.fd.1.drive=3D"1" hint.sio.0.at=3D"isa" hint.sio.0.port=3D"0x3f8" hint.sio.0.flags=3D"0x10" hint.sio.0.irq=3D"4" hint.sio.1.at=3D"isa" hint.sio.1.port=3D"0x2f8" hint.sio.1.irq=3D"3" hint.ppc.0.at=3D"isa" hint.ppc.0.irq=3D"7" Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 =09The Regents of the University of California. All rights reserved. FreeBSD 5.3-STABLE #0: Sat Jan 22 19:54:10 MST 2005 root@name.domain.tld:/usr/obj/usr/src/sys/KERNEL Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Duron(tm) processor (1297.79-MHz 686-class CPU) Origin =3D "AuthenticAMD" Id =3D 0x671 Stepping =3D 1 Features=3D0x383f9ff AMD Features=3D0xc0400000 real memory =3D 536870912 (512 MB) avail memory =3D 519913472 (495 MB) npx0: [FAST] npx0: on motherboard npx0: INT 16 interface pcib0: pcibus 0 on motherboard pir0: on motherboard pci0: on pcib0 agp0: mem 0xe0000000-0xe7ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci0: at device 8.0 (no driver attached) uhci0: port 0xd000-0xd01f irq 11 at device 16.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xd400-0xd41f irq 3 at device 16.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xd800-0xd81f irq 10 at device 16.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: at device 16.3 (no driver attached) isab0: at device 17.0 on pci0 isa0: on isab0 atapci0: port 0xdc00-0xdc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 pci0: at device 17.5 (no driver attached) vr0: port 0xe800-0xe8ff mem 0xed001000-0xed0010ff irq 11 at device 18.0 on pci0 miibus0: on vr0 ukphy0: on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vr0: Ethernet address: 00:0d:87:00:bf:1d cpu0 on motherboard orm0: at iomem 0xc8000-0xcffff,0xc0000-0xc7fff on isa0 pmtimer0 on isa0 atkbdc0: at port 0x64,0x60 on isa0 atkbd0: flags 0x1 irq 1 on atkbdc0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse, device ID 3 fdc0: at port 0x3f0-0x3f5 irq 6 drq 2 on isa0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppbus0: on ppc0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=3D0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 unknown: can't assign resources (port) unknown: can't assign resources (memory) unknown: can't assign resources (irq) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) Timecounter "TSC" frequency 1297789521 Hz quality 800 Timecounters tick every 10.000 msec ipfw2 initialized, divert enabled, rule-based forwarding disabled, default to deny, logging limited to 1000 packets/entry by default ad0: 117246MB [238216/16/63] at ata0-master UDMA1= 33 acd0: CDRW at ata0-slave UDMA33 ad2: 117246MB [238216/16/63] at ata1-master UDMA1= 33 Mounting root from ufs:/dev/ad0s1a WARNING: / was not properly dismounted GEOM_VINUM: subdisk raid.p1.s0 is up GEOM_VINUM: subdisk raid.p0.s0 is stale GEOM_VINUM: plex sync raid.p1 -> raid.p0 started GEOM_VINUM: sd raid.p0.s0 is initializing GEOM_VINUM: plex raid.p0 is degraded GEOM_VINUM: plex raid.p0 is up GEOM_VINUM: plex sync raid.p1 -> raid.p0 finished -- Kendall Gifford