From owner-freebsd-stable@FreeBSD.ORG  Sun Apr 18 14:18:54 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5A56F16A4CE
	for <freebsd-stable@freebsd.org>;
	Sun, 18 Apr 2004 14:18:54 -0700 (PDT)
Received: from mutare.noc.clara.net (mutare.noc.clara.net [195.8.70.95])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2ABC443D49
	for <freebsd-stable@freebsd.org>;
	Sun, 18 Apr 2004 14:18:54 -0700 (PDT)
	(envelope-from ollie@mutare.noc.clara.net)
Received: from ollie by mutare.noc.clara.net with local (Exim 4.30)
	id 1BFJgj-000HqR-Da
	for freebsd-stable@freebsd.org; Sun, 18 Apr 2004 22:18:53 +0100
Date: Sun, 18 Apr 2004 22:18:53 +0100
From: Ollie Cook <ollie@uk.clara.net>
To: freebsd-stable@freebsd.org
Message-ID: <20040418211852.GA67452@mutare.noc.clara.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.1i
X-Operating-System: FreeBSD 4.9-STABLE i386
X-NCC-RegID: uk.claranet
Sender: Ollie Cook <ollie@mutare.noc.clara.net>
X-Envelope-To: freebsd-stable@freebsd.org
X-Clara-Scan: content scanned according to recipient preferences
Subject: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Apr 2004 21:18:54 -0000

Hi,

I am experiencing filesystem corruption while using a 1TB (appx.) partition
under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe
device driver). The RAID set comprises 5x250GB ATA disks.

The kernel logs such messages as:

Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks
Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks
Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks

The operations it was performing at the time involved copying a lot of small
(email messages) files from a busy NFS mount to the RAID5 array. A number of
processes were all copying different files and the throughput was around 3MB/s
to disk.

As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a
kernel data structure contains unexpected data, but I'm not confident enough to
be able to tell what might be causing that.

After such messages, if I cleanly unmount the filesystem and run fsck, errors
are detected. Such errors are:

  directory corrupted
  directory contains empty blocks
  unallocated inode
  wrong link counts

There are many more distinct error messages, but those are the ones I recall.
After a number of passes through fsck, the filesystem is eventually marked
clean but quite a number of files wind up in lost+found.

Has anyone seen behaviour similar to this with twe RAID sets or large
partitions in the past? I've not been able to find reports of similar symptoms
using Google.

Can anyone offer advice on how I might further debug this problem?

Yours,

Ollie

Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3
Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048
Apr 16 11:34:12 heman /kernel: twed0: <Unit 0, JBOD, Normal> on twe0
Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors)
Apr 16 11:34:12 heman /kernel: twed1: <Unit 1, RAID5, Normal> on twe0
Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors)
Apr 16 11:34:12 heman /kernel: twe0: command interrupt

-- 
Oliver Cook    Systems Administrator, Claranet UK
ollie@uk.clara.net               +44 20 7903 3065