From owner-freebsd-questions@FreeBSD.ORG Mon Feb 27 02:08:13 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 221ED16A420 for ; Mon, 27 Feb 2006 02:08:13 +0000 (GMT) (envelope-from brian@etchings.com) Received: from kosh.etchings.com (kosh.etchings.com [216.231.38.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id B7F9C43D46 for ; Mon, 27 Feb 2006 02:08:12 +0000 (GMT) (envelope-from brian@etchings.com) Received: by kosh.etchings.com (Postfix, from userid 1000) id A9108117038; Sun, 26 Feb 2006 18:08:11 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by kosh.etchings.com (Postfix) with ESMTP id A3427117035 for ; Sun, 26 Feb 2006 18:08:11 -0800 (PST) Date: Sun, 26 Feb 2006 18:08:11 -0800 (PST) From: Brian Kraemer To: freebsd-questions@freebsd.org Message-ID: <20060226174858.U69435@kosh.etchings.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: RAID issues X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Feb 2006 02:08:13 -0000 I'm not subscribed to freebsd-questions, so please Cc me on any responses. Hello, I'm in the process of building a new server based on Supermicro's SuperServer 5014C-T platform. It uses the Intel ICH6 SATA controllers and supports RAID0 and RAID1 via Intel MatrixRAID. I have the BIOS set up for RAID1 and usually FreeBSD detects this and everything is fine. My problem is that on occasion, on a reboot, one of the drives (usually the second one, ata3 on atapci1) is not detected at all by FreeBSD. The BIOS continues to detect both drives but FreeBSD does not. When this happens, FreeBSD notes that the RAID is in a degraded state. I can use atacontrol to detach and reattach ata3 which usually finds the drive but the damage has been done. What I mean by damage is this: On the next reboot, I have massive filesystem errors, even after a full fsck. These errors are usually related to soft-updates. These errors are so bad that the kernel will panic as soon as a file is accessed in the bad partition. The only workaround I have found so far is to boot into single user mode and run newfs on the partition(s) that are causing the kernel panic. Obviously this is a less than ideal solution. My question is this. Does this sound like bad hardware, or a software problem? I thought at first that it might be a bad hard drive but I ran some diagnostic software on them and they both came up clean. Has anyone else experienced this? Perhaps turning off soft-updates is the answer? Here's some dmesg output (when things are working properly): atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0xe900-0xe907,0xea00-0xea03,0xeb00-0xeb07,0xec00-0xec03,0xed00-0xed0f mem 0xd03c3000-0xd03c33ff irq 19 at device 31.2 on pci0 ata2: on atapci1 ata3: on atapci1 acd0: CDROM at ata0-master UDMA33 ad4: 381554MB at ata2-master SATA150 ad6: 381554MB at ata3-master SATA150 ar0: 381553MB status: READY ar0: disk0 READY (master) using ad4 at ata2-master ar0: disk1 READY (mirror) using ad6 at ata3-master I still have the latest vmcore dump if that will help. -Brian