Date: Fri, 30 Apr 2004 10:27:53 -0700 (PDT) From: Doug White <dwhite@gumbysoft.com> To: Ollie Cook <ollie@uk.clara.net> Cc: freebsd-stable@freebsd.org Subject: Re: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe Message-ID: <20040430102518.V67392@carver.gumbysoft.com> In-Reply-To: <20040418211852.GA67452@mutare.noc.clara.net> References: <20040418211852.GA67452@mutare.noc.clara.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 18 Apr 2004, Ollie Cook wrote: > I am experiencing filesystem corruption while using a 1TB (appx.) partition > under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe > device driver). The RAID set comprises 5x250GB ATA disks. [...] The type of corruption you're seeing would be consistent with one of the disks not accepting writes or some other sort of array corruption. I realize it'll take forever, but can you run an array verify? I wonder if the BIOS isn't picking up a disk failure since it isn't throwing errors, but isn't doing any useful work either. > > The kernel logs such messages as: > > Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks > Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks > Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks > > The operations it was performing at the time involved copying a lot of small > (email messages) files from a busy NFS mount to the RAID5 array. A number of > processes were all copying different files and the throughput was around 3MB/s > to disk. > > As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a > kernel data structure contains unexpected data, but I'm not confident enough to > be able to tell what might be causing that. > > After such messages, if I cleanly unmount the filesystem and run fsck, errors > are detected. Such errors are: > > directory corrupted > directory contains empty blocks > unallocated inode > wrong link counts > > There are many more distinct error messages, but those are the ones I recall. > After a number of passes through fsck, the filesystem is eventually marked > clean but quite a number of files wind up in lost+found. > > Has anyone seen behaviour similar to this with twe RAID sets or large > partitions in the past? I've not been able to find reports of similar symptoms > using Google. > > Can anyone offer advice on how I might further debug this problem? > > Yours, > > Ollie > > Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3 > Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048 > Apr 16 11:34:12 heman /kernel: twed0: <Unit 0, JBOD, Normal> on twe0 > Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors) > Apr 16 11:34:12 heman /kernel: twed1: <Unit 1, RAID5, Normal> on twe0 > Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors) > Apr 16 11:34:12 heman /kernel: twe0: command interrupt > > -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040430102518.V67392>