From owner-freebsd-i386@FreeBSD.ORG Wed Sep 20 16:10:54 2006 Return-Path: X-Original-To: freebsd-i386@hub.freebsd.org Delivered-To: freebsd-i386@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0430616A407 for ; Wed, 20 Sep 2006 16:10:54 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id A928C43D49 for ; Wed, 20 Sep 2006 16:10:49 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k8KGAUlQ056278 for ; Wed, 20 Sep 2006 16:10:30 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k8KGAUkJ056277; Wed, 20 Sep 2006 16:10:30 GMT (envelope-from gnats) Resent-Date: Wed, 20 Sep 2006 16:10:30 GMT Resent-Message-Id: <200609201610.k8KGAUkJ056277@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-i386@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Jeremy Chadwick Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A826216A416 for ; Wed, 20 Sep 2006 16:06:54 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from sccrmhc11.comcast.net (sccrmhc11.comcast.net [204.127.200.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4769243D8C for ; Wed, 20 Sep 2006 16:06:53 +0000 (GMT) (envelope-from jdc@koitsu.dyndns.org) Received: from icarus.home.lan (c-24-6-182-130.hsd1.ca.comcast.net[24.6.182.130]) by comcast.net (sccrmhc11) with ESMTP id <2006092016065201100acn30e>; Wed, 20 Sep 2006 16:06:52 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id C79AC1FA035; Wed, 20 Sep 2006 09:06:51 -0700 (PDT) Message-Id: <20060920160651.C79AC1FA035@icarus.home.lan> Date: Wed, 20 Sep 2006 09:06:51 -0700 (PDT) From: Jeremy Chadwick To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: i386/103435: Kernel appears somewhat deadlocked during heavy ATA I/O (post-August 4th) X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jeremy Chadwick List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Sep 2006 16:10:54 -0000 >Number: 103435 >Category: i386 >Synopsis: Kernel appears somewhat deadlocked during heavy ATA I/O (post-August 4th) >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Sep 20 16:10:30 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Jeremy Chadwick >Release: FreeBSD 6.2-PRERELEASE i386 >Organization: Parodius Networking >Environment: System: FreeBSD icarus.home.lan 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Mon Sep 18 03:38:31 PDT 2006 root@icarus.home.lan:/usr/obj/usr/src/sys/ICARUS i386 >Description: Sometime between August 4th and September 12th, someone changed something in the FreeBSD code which is breaking things badly. Particularly the following: ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=41171803 ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=51392291 ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=31011999 em0: watchdog timeout -- resetting em0: link state changed to DOWN em0: link state changed to UP ad12: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=84946719 The em0 timeouts happen at the same time (but not always!) as the ATA timeouts: Sep 20 08:47:42 icarus kernel: em0: watchdog timeout -- resetting Sep 20 08:47:42 icarus kernel: em0: link state changed to DOWN Sep 20 08:47:51 icarus kernel: em0: link state changed to UP Sep 20 08:47:51 icarus kernel: ad12: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Sep 20 08:47:51 icarus kernel: ad12: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=84946719 The hardware in this box hasn't changed -- but because of the ATA errors I was seeing, I decided to swap the disk out on ad12 with a completely different (and brand new) disk. It does the same thing you see above. I also have a disk on ad14 (SATA port #1), which can induce the same thing. The controller being used is an ICH5-based controller. There is no RAID being used (all pure JBOD). The motherboard is an Intel D865GLC, running the previous-to- latest BIOS. (The latest version only fixes some VGA adapter issues). Hyperthreading is enabled in the BIOS, but the kernel itself is NOT using SMP. (But DOES have the apic device enabled) As far as IRQs go, it looks as if the ICH5 and the em0 are sharing an IRQ. This is bizzare, as I would expect the APIC to pick separate IRQs for these devices: atapci2: port 0xe800-0xe807,0xe400-0xe403,0xe000-0xe007,0xdc00-0xdc03,0xd800-0xd80f irq 18 at device 31.2 on pci0 em0: port 0xac00-0xac1f mem 0xff800000-0xff81ffff irq 18 at device 1.0 on pci1 I've also built a kernel as of the 18th (you can see the above uname output), and it has the same problem. >How-To-Repeat: I can reproduce this problem easily: during heavy disk activity, the system will "stall" as if the kernel is spending too much time doing something (deadlocked). The best way I've found to do this is to pick a FreeBSD port that relies on a lot of dependancies and do a 'make clean' over and over: cd /usr/ports/databases/phpmyadmin make clean & ; make clean & ; make clean & {watch above problem occur} Control-C to intercept applications doesn't work when this is going on. >Fix: I haven't tried a different motherboard (I won't deny there's a chance the MB is going bad -- hardware goes bad all the time in this day and age), but I didn't have this problem until I built the September 12th kernel. I also have not tried booting without ACPI. >Release-Note: >Audit-Trail: >Unformatted: