From owner-freebsd-current@FreeBSD.ORG Thu Dec 11 20:48:51 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 58E2816A4CE for ; Thu, 11 Dec 2003 20:48:51 -0800 (PST) Received: from kanga.honeypot.net (kanga.honeypot.net [208.162.254.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5D1D143D2D for ; Thu, 11 Dec 2003 20:48:48 -0800 (PST) (envelope-from kirk@strauser.com) Received: from pooh.strauser.com (pooh.honeypot.net [10.0.5.128]) by kanga.honeypot.net (8.12.10/8.12.10) with ESMTP id hBC4mjRC015101 for ; Thu, 11 Dec 2003 22:48:46 -0600 (CST) (envelope-from kirk@strauser.com) To: freebsd-current@freebsd.org From: Kirk Strauser Date: Thu, 11 Dec 2003 22:48:33 -0600 Message-ID: <87fzfqy732.fsf@strauser.com> Lines: 204 X-Mailer: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Subject: ATA + DMA still giving repeatable freezes X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2003 04:48:51 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable I built world after cvsup'ing -CURRENT this morning and am still having the same ATA READ_DMA hangs that started in early October on my system. I can repeat the hangs at will; the machine serves as an Amanda server, and launching a backup for itself plus 3 client machines is guaranteed to trigger it: ad0: TIMEOUT - READ_DMA retrying (2 retries left) ata0: resetting devices .. ad0: FAILURE - already active DMA on this device ad0: setting up DMA failed When this happens, the system is effectively dead until I reset it. I can run for days on end by booting with DMA disabled, but that's not really my ideal long-term solution as it slows the system to a crawl. The drive in question is a Western Digital WD1200JB-00DUA3 (Caviar 120GB special edition) attached to an Asus P3V4X (Via chipset) motherboard. The combination has worked perfectly from the server's 4.8-STABLE days, through 5.0, and up until the last two months when I started experiencing this immediately after an upgrade. Kernel config is essentially "GENERIC" with the older CPU types and WITNESS* and INVARIANT* options commented out, and with the SYS-V IPC settings recommended by PostgreSQL added. Build flags are very conservative: "CFLAGS=3D -O -pipe". sysutils/smartctl reports: SMART overall-health self-assessment test result: PASSED Basically, I'm about 99% sure that this hardware is OK. It worked right up to a big ATAng commit, then stopped working right immediately afterward. Does anybody have any suggestions of how I can run my machine in UDMA33/66 mode for more than a couple of hours without freezing? Below is the dmesg. I didn't want to stick it in the middle of my post: Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.2-CURRENT #1: Thu Dec 11 14:13:32 CST 2003 root@kanga.honeypot.net:/usr/obj/usr/src/sys/KANGA Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a7d000. Preloaded elf module "/boot/kernel/linprocfs.ko" at 0xc0a7d1f4. Preloaded elf module "/boot/kernel/linux.ko" at 0xc0a7d2a4. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a7d350. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel Pentium III (936.74-MHz 686-class CPU) Origin =3D "GenuineIntel" Id =3D 0x683 Stepping =3D 3 Features=3D0x383f9ff real memory =3D 805289984 (767 MB) avail memory =3D 772505600 (736 MB) Pentium Pro MTRR support enabled npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard pcibios: BIOS version 2.10 Using $PIR table, 8 entries at 0xc00f0e60 acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 acpi_cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib0: slot 4 INTD is routed to irq 9 pcib0: slot 9 INTA is routed to irq 9 pcib0: slot 10 INTA is routed to irq 9 pcib0: slot 11 INTA is routed to irq 10 pcib0: slot 12 INTA is routed to irq 11 agp0: mem 0xe4000000-0xe7fffff= f at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 isab0: at device 4.0 on pci0 isa0: on isab0 atapci0: port 0xd800-0xd80f at device 4.1 o= n pci0 ata0: at 0x1f0 irq 14 on atapci0 ata0: [MPSAFE] ata1: at 0x170 irq 15 on atapci0 ata1: [MPSAFE] uhci0: port 0xd400-0xd41f irq 9 at device 4.2 o= n pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered ulpt0: HewLett Packard HP LaserJet 1200, rev 1.10/1.00, addr 2, iclass 7/1 ulpt0: using bi-directional mode ukbd0: Belkin Components USB-PS2 Adapter, rev 1.10/1.20, addr 3, iclass 3/1 kbd0 at ukbd0 ums0: Belkin Components USB-PS2 Adapter, rev 1.10/1.20, addr 3, iclass 3/1 ums0: 5 buttons and Z dir. viapropm0: SMBus I/O base at 0xe800 viapropm0: port 0xe800-0xe80f at devi= ce 4.3 on pci0 viapropm0: SMBus revision code 0x0 smbus0: on viapropm0 smb0: on smbus0 fxp0: port 0xd000-0xd03f mem 0xd6800000-0xd6= 8fffff,0xd7000000-0xd7000fff irq 9 at device 9.0 on pci0 fxp0: Ethernet address 00:d0:b7:0e:3a:4a miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: port 0xb800-0xb83f mem 0xd5800000-0xd5= 8fffff,0xd6000000-0xd6000fff irq 9 at device 10.0 on pci0 fxp1: Ethernet address 00:d0:b7:9e:bb:dd miibus1: on fxp1 inphy1: on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto sym0: <875> port 0xb400-0xb4ff mem 0xd4800000-0xd4800fff,0xd5000000-0xd5000= 0ff irq 10 at device 11.0 on pci0 sym0: Tekram NVRAM, ID 7, Fast-20, SE, parity checking pci0: at device 12.0 (no driver attached) fdc0: port 0x3f7,0x= 3f2-0x3f5 irq 6 drq 2 on acpi0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0 port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 sio0 port 0x3f8-0x3ff irq 4 on acpi0 sio0: type 16550A sio1 port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A atkbdc0: port 0x64,0x60 irq 1 on acpi0 orm0: