Date: Fri, 30 Apr 2004 12:52:53 -0500 From: Matthew Reimer <mreimer@vpop.net> To: stable@freebsd.org Subject: Re: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe Message-ID: <40929275.90203@vpop.net> In-Reply-To: <lists.freebsd.stable.20040430102518.V67392@carver.gumbysoft.com> References: <lists.freebsd.stable.20040430102518.V67392@carver.gumbysoft.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Is your card plugged into a riser card? We had similar problems (random corruption) with a 7506-8 card. The workaround was to set the speed for that PCI slot to 33MHz (rather than Auto or 66MHz). I think this tech note describes our problem: http://www.3ware.com/kb/article.aspx?id=10848 (Read the PDF file attached to the tech note.) Now the box is as solid as a rock. Matt Doug White wrote: > On Sun, 18 Apr 2004, Ollie Cook wrote: > > >>I am experiencing filesystem corruption while using a 1TB (appx.) partition >>under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe >>device driver). The RAID set comprises 5x250GB ATA disks. > > > [...] > > The type of corruption you're seeing would be consistent with one of the > disks not accepting writes or some other sort of array corruption. I > realize it'll take forever, but can you run an array verify? I wonder if > the BIOS isn't picking up a disk failure since it isn't throwing errors, > but isn't doing any useful work either. > > > >>The kernel logs such messages as: >> >>Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks >>Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks >>Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks >> >>The operations it was performing at the time involved copying a lot of small >>(email messages) files from a busy NFS mount to the RAID5 array. A number of >>processes were all copying different files and the throughput was around 3MB/s >>to disk. >> >>As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a >>kernel data structure contains unexpected data, but I'm not confident enough to >>be able to tell what might be causing that. >> >>After such messages, if I cleanly unmount the filesystem and run fsck, errors >>are detected. Such errors are: >> >> directory corrupted >> directory contains empty blocks >> unallocated inode >> wrong link counts >> >>There are many more distinct error messages, but those are the ones I recall. >>After a number of passes through fsck, the filesystem is eventually marked >>clean but quite a number of files wind up in lost+found. >> >>Has anyone seen behaviour similar to this with twe RAID sets or large >>partitions in the past? I've not been able to find reports of similar symptoms >>using Google. >> >>Can anyone offer advice on how I might further debug this problem? >> >>Yours, >> >>Ollie >> >>Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3 >>Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048 >>Apr 16 11:34:12 heman /kernel: twed0: <Unit 0, JBOD, Normal> on twe0 >>Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors) >>Apr 16 11:34:12 heman /kernel: twed1: <Unit 1, RAID5, Normal> on twe0 >>Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors) >>Apr 16 11:34:12 heman /kernel: twe0: command interrupt >> >> > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40929275.90203>
