Date: Wed, 20 Sep 2006 09:06:51 -0700 (PDT) From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: FreeBSD-gnats-submit@FreeBSD.org Subject: i386/103435: Kernel appears somewhat deadlocked during heavy ATA I/O (post-August 4th) Message-ID: <20060920160651.C79AC1FA035@icarus.home.lan> Resent-Message-ID: <200609201610.k8KGAUkJ056277@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 103435 >Category: i386 >Synopsis: Kernel appears somewhat deadlocked during heavy ATA I/O (post-August 4th) >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Sep 20 16:10:30 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Jeremy Chadwick >Release: FreeBSD 6.2-PRERELEASE i386 >Organization: Parodius Networking >Environment: System: FreeBSD icarus.home.lan 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Mon Sep 18 03:38:31 PDT 2006 root@icarus.home.lan:/usr/obj/usr/src/sys/ICARUS i386 >Description: Sometime between August 4th and September 12th, someone changed something in the FreeBSD code which is breaking things badly. Particularly the following: ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=41171803 ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=51392291 ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=31011999 em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP ad12: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=84946719 The em0 timeouts happen at the same time (but not always!) as the ATA timeouts: Sep 20 08:47:42 icarus kernel: em0: watchdog timeout -- resetting Sep 20 08:47:42 icarus kernel: em0: link state changed to DOWN Sep 20 08:47:51 icarus kernel: em0: link state changed to UP Sep 20 08:47:51 icarus kernel: ad12: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Sep 20 08:47:51 icarus kernel: ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=84946719 The hardware in this box hasn't changed -- but because of the ATA errors I was seeing, I decided to swap the disk out on ad12 with a completely different (and brand new) disk. It does the same thing you see above. I also have a disk on ad14 (SATA port #1), which can induce the same thing. The controller being used is an ICH5-based controller. There is no RAID being used (all pure JBOD). The motherboard is an Intel D865GLC, running the previous-to- latest BIOS. (The latest version only fixes some VGA adapter issues). Hyperthreading is enabled in the BIOS, but the kernel itself is NOT using SMP. (But DOES have the apic device enabled) As far as IRQs go, it looks as if the ICH5 and the em0 are sharing an IRQ. This is bizzare, as I would expect the APIC to pick separate IRQs for these devices: atapci2: <Intel ICH5 SATA150 controller> port 0xe800-0xe807,0xe400-0xe403,0xe000-0xe007,0xdc00-0xdc03,0xd800-0xd80f irq 18 at device 31.2 on pci0 em0: <Intel(R) PRO/1000 Network Connection Version - 6.1.4> port 0xac00-0xac1f mem 0xff800000-0xff81ffff irq 18 at device 1.0 on pci1 I've also built a kernel as of the 18th (you can see the above uname output), and it has the same problem. >How-To-Repeat: I can reproduce this problem easily: during heavy disk activity, the system will "stall" as if the kernel is spending too much time doing something (deadlocked). The best way I've found to do this is to pick a FreeBSD port that relies on a lot of dependancies and do a 'make clean' over and over: cd /usr/ports/databases/phpmyadmin make clean & ; make clean & ; make clean & {watch above problem occur} Control-C to intercept applications doesn't work when this is going on. >Fix: I haven't tried a different motherboard (I won't deny there's a chance the MB is going bad -- hardware goes bad all the time in this day and age), but I didn't have this problem until I built the September 12th kernel. I also have not tried booting without ACPI. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060920160651.C79AC1FA035>