From owner-freebsd-stable@FreeBSD.ORG  Fri Apr 30 10:52:32 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AE32216A4CE
	for <stable@freebsd.org>; Fri, 30 Apr 2004 10:52:32 -0700 (PDT)
Received: from boromir.vpop.net (dns1.vpop.net [207.178.248.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8C53D43D5A
	for <stable@freebsd.org>; Fri, 30 Apr 2004 10:52:32 -0700 (PDT)
	(envelope-from mreimer@vpop.net)
Received: from vpop.net (bilbo.vpop.net [65.103.33.41])
	by boromir.vpop.net (Postfix) with ESMTP id 5C5743A7FD4
	for <stable@freebsd.org>; Fri, 30 Apr 2004 10:52:30 -0700 (PDT)
Message-ID: <40929275.90203@vpop.net>
Date: Fri, 30 Apr 2004 12:52:53 -0500
From: Matthew Reimer <mreimer@vpop.net>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US;
	rv:1.6b) Gecko/20040102 Thunderbird/0.4
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: stable@freebsd.org
References: <lists.freebsd.stable.20040430102518.V67392@carver.gumbysoft.com>
In-Reply-To: <lists.freebsd.stable.20040430102518.V67392@carver.gumbysoft.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2004 17:52:32 -0000

Is your card plugged into a riser card? We had similar problems (random 
corruption) with a 7506-8 card. The workaround was to set the speed for 
that PCI slot to 33MHz (rather than Auto or 66MHz). I think this tech 
note describes our problem:

http://www.3ware.com/kb/article.aspx?id=10848

(Read the PDF file attached to the tech note.)

Now the box is as solid as a rock.

Matt

Doug White wrote:
> On Sun, 18 Apr 2004, Ollie Cook wrote:
> 
> 
>>I am experiencing filesystem corruption while using a 1TB (appx.) partition
>>under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe
>>device driver). The RAID set comprises 5x250GB ATA disks.
> 
> 
> [...]
> 
> The type of corruption you're seeing would be consistent with one of the
> disks not accepting writes or some other sort of array corruption. I
> realize it'll take forever, but can you run an array verify?  I wonder if
> the BIOS isn't picking up a disk failure since it isn't throwing errors,
> but isn't doing any useful work either.
> 
> 
> 
>>The kernel logs such messages as:
>>
>>Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks
>>Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks
>>Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks
>>
>>The operations it was performing at the time involved copying a lot of small
>>(email messages) files from a busy NFS mount to the RAID5 array. A number of
>>processes were all copying different files and the throughput was around 3MB/s
>>to disk.
>>
>>As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a
>>kernel data structure contains unexpected data, but I'm not confident enough to
>>be able to tell what might be causing that.
>>
>>After such messages, if I cleanly unmount the filesystem and run fsck, errors
>>are detected. Such errors are:
>>
>>  directory corrupted
>>  directory contains empty blocks
>>  unallocated inode
>>  wrong link counts
>>
>>There are many more distinct error messages, but those are the ones I recall.
>>After a number of passes through fsck, the filesystem is eventually marked
>>clean but quite a number of files wind up in lost+found.
>>
>>Has anyone seen behaviour similar to this with twe RAID sets or large
>>partitions in the past? I've not been able to find reports of similar symptoms
>>using Google.
>>
>>Can anyone offer advice on how I might further debug this problem?
>>
>>Yours,
>>
>>Ollie
>>
>>Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3
>>Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048
>>Apr 16 11:34:12 heman /kernel: twed0: <Unit 0, JBOD, Normal> on twe0
>>Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors)
>>Apr 16 11:34:12 heman /kernel: twed1: <Unit 1, RAID5, Normal> on twe0
>>Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors)
>>Apr 16 11:34:12 heman /kernel: twe0: command interrupt
>>
>>
> 
>