From owner-freebsd-bugs@FreeBSD.ORG Mon May 8 18:18:34 2006 Return-Path: X-Original-To: freebsd-bugs@freebsd.org Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AFAA316A401 for ; Mon, 8 May 2006 18:18:34 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id 22BE043D46 for ; Mon, 8 May 2006 18:18:31 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 54350 invoked by uid 2001); 8 May 2006 18:18:30 -0000 Date: Mon, 8 May 2006 13:18:30 -0500 From: "Rick C. Petty" To: freebsd-bugs@freebsd.org Message-ID: <20060508181830.GA54101@megan.kiwi-computer.com> Mime-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.4.2.1i Subject: same troubles as PR kern/91410 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 May 2006 18:18:34 -0000 We have two machines, each with two Promise TX4 SATA300 controllers (PDC40718 chip), and both are using gvinum mirrors and raid5. One machine is running RELENG_6_1 as of 2006-May-07 with stock GENERIC, the other has RELENG_6_0_0_RELEASE with stock GENERIC. Under moderate to heavy (and sometimes even under very little) disk load, the kernel spews out a bunch of "SETFEATURES SET TRANSFER MODE semaphore timeout" messages, often followed closely by DMA timeout/failures (either READ_DMA or WRITE_DMA, depending), and always the drive is detached which sometimes causes the box to hang or panic. Here's what we've tried so far: I read the PR kern/91410, which was ultimately closed due to lack of timely feedback. Søren suggested 1). using atacontrol to do the striping (which doesn't help us with using raid5) and also to disable PREEMPTION in the kernel. I also read elsewhere the suggestion to boot with ACPI disabled. On the 6.0-RELEASE box, I added the disable- acpi hint to loader.conf and rebuilt GENERIC without PREEMPTION, and that seemed to just work. I loaded the machine with disk activity and it's been solid all weekend. I'll try with ACPI enabled later today. Now on the 6.1 box, we've tried a number of different setups but the only thing that seems to work is booting in safe mode, with PREEMPTION disabled in the GENERIC kernel. Today we're going to try enabling write caching and ACPI one at a time to see if it's just a DMA problem. We'll also try throwing 6.0-R without PREEMPTION. Aside from the Promise controllers, these two boxes have different hardware (motherboard, CPU). I'm hoping the test with 6.0-R will narrow down the problem to whatever changed between 6.0 and 6.1. In the meantime, I wanted to see if anyone could provide any other suggestions or feedback. I also wanted to add my comments onto kern/91410.. booting without PREEMPTION seems to have fixed my problems, but then I'm stuck using 6.0. Any ideas would be greatly appreciated! -- Rick C. Petty