Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Apr 2004 10:27:53 -0700 (PDT)
From:      Doug White <dwhite@gumbysoft.com>
To:        Ollie Cook <ollie@uk.clara.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe
Message-ID:  <20040430102518.V67392@carver.gumbysoft.com>
In-Reply-To: <20040418211852.GA67452@mutare.noc.clara.net>
References:  <20040418211852.GA67452@mutare.noc.clara.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 18 Apr 2004, Ollie Cook wrote:

> I am experiencing filesystem corruption while using a 1TB (appx.) partition
> under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe
> device driver). The RAID set comprises 5x250GB ATA disks.

[...]

The type of corruption you're seeing would be consistent with one of the
disks not accepting writes or some other sort of array corruption. I
realize it'll take forever, but can you run an array verify?  I wonder if
the BIOS isn't picking up a disk failure since it isn't throwing errors,
but isn't doing any useful work either.


>
> The kernel logs such messages as:
>
> Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks
> Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks
> Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks
>
> The operations it was performing at the time involved copying a lot of small
> (email messages) files from a busy NFS mount to the RAID5 array. A number of
> processes were all copying different files and the throughput was around 3MB/s
> to disk.
>
> As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a
> kernel data structure contains unexpected data, but I'm not confident enough to
> be able to tell what might be causing that.
>
> After such messages, if I cleanly unmount the filesystem and run fsck, errors
> are detected. Such errors are:
>
>   directory corrupted
>   directory contains empty blocks
>   unallocated inode
>   wrong link counts
>
> There are many more distinct error messages, but those are the ones I recall.
> After a number of passes through fsck, the filesystem is eventually marked
> clean but quite a number of files wind up in lost+found.
>
> Has anyone seen behaviour similar to this with twe RAID sets or large
> partitions in the past? I've not been able to find reports of similar symptoms
> using Google.
>
> Can anyone offer advice on how I might further debug this problem?
>
> Yours,
>
> Ollie
>
> Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3
> Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048
> Apr 16 11:34:12 heman /kernel: twed0: <Unit 0, JBOD, Normal> on twe0
> Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors)
> Apr 16 11:34:12 heman /kernel: twed1: <Unit 1, RAID5, Normal> on twe0
> Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors)
> Apr 16 11:34:12 heman /kernel: twe0: command interrupt
>
>

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040430102518.V67392>