From owner-freebsd-i386@FreeBSD.ORG Mon Mar 7 03:30:06 2005 Return-Path: Delivered-To: freebsd-i386@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A9E0A16A4CE for ; Mon, 7 Mar 2005 03:30:06 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 84D7843D3F for ; Mon, 7 Mar 2005 03:30:06 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j273U6j5044338 for ; Mon, 7 Mar 2005 03:30:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j273U6ri044337; Mon, 7 Mar 2005 03:30:06 GMT (envelope-from gnats) Date: Mon, 7 Mar 2005 03:30:06 GMT Message-Id: <200503070330.j273U6ri044337@freefall.freebsd.org> To: freebsd-i386@FreeBSD.org From: "Karl" Subject: Re: i386/77643: SATA PCI controllers fail with WRITE_DMA errors under GMIRROR X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Karl List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2005 03:30:06 -0000 The following reply was made to PR i386/77643; it has been noted by GNATS. From: "Karl" To: , Cc: Subject: Re: i386/77643: SATA PCI controllers fail with WRITE_DMA errors under GMIRROR Date: Sun, 6 Mar 2005 21:26:59 -0600 In an attempt to mitigate this, I saw the following commit in the CVS = logs: mdodd 2005-03-02 04:01:37 UTC FreeBSD src repository Modified files: sys/dev/ata ata-queue.c Log: When resubmitting a timed out request, reset donecount. Submitted by: Nate Lawson Revision Changes Path 1.42 +1 -0 src/sys/dev/ata/ata-queue.c Is this change supposed to be "safe" against a 5.4-PRERELEASE kernel = from =20 today (CVSupped about 1700 CST)? If it is supposed to be, its NOT! =20 It DOES fix the failure to requeue timed out requests, but it also = provokes=20 radical destabilization of the interrupt system in the kernel (e.g. = receive=20 serial interrupts "disappear", etc) leading evenutally to a panic. BTW, it appear to fix the requeue problem with disks, and wth = this in a disk that takes a timeout (but is actually working) does not = disconnect from a GEOM mirror - the requeue is successful. However, for obvious reasons the kernel instability that results from = the=20 retried write is not acceptable :) Don't know if this is germane to what is about to show up in = 5.4-RELEASE, but if it is, this urgently needs to be looked at. Needless to say I've backed this attempt at a workaround out.