From owner-freebsd-stable@FreeBSD.ORG Fri Apr 30 10:52:32 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AE32216A4CE for ; Fri, 30 Apr 2004 10:52:32 -0700 (PDT) Received: from boromir.vpop.net (dns1.vpop.net [207.178.248.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8C53D43D5A for ; Fri, 30 Apr 2004 10:52:32 -0700 (PDT) (envelope-from mreimer@vpop.net) Received: from vpop.net (bilbo.vpop.net [65.103.33.41]) by boromir.vpop.net (Postfix) with ESMTP id 5C5743A7FD4 for ; Fri, 30 Apr 2004 10:52:30 -0700 (PDT) Message-ID: <40929275.90203@vpop.net> Date: Fri, 30 Apr 2004 12:52:53 -0500 From: Matthew Reimer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6b) Gecko/20040102 Thunderbird/0.4 X-Accept-Language: en-us, en MIME-Version: 1.0 To: stable@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Apr 2004 17:52:32 -0000 Is your card plugged into a riser card? We had similar problems (random corruption) with a 7506-8 card. The workaround was to set the speed for that PCI slot to 33MHz (rather than Auto or 66MHz). I think this tech note describes our problem: http://www.3ware.com/kb/article.aspx?id=10848 (Read the PDF file attached to the tech note.) Now the box is as solid as a rock. Matt Doug White wrote: > On Sun, 18 Apr 2004, Ollie Cook wrote: > > >>I am experiencing filesystem corruption while using a 1TB (appx.) partition >>under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe >>device driver). The RAID set comprises 5x250GB ATA disks. > > > [...] > > The type of corruption you're seeing would be consistent with one of the > disks not accepting writes or some other sort of array corruption. I > realize it'll take forever, but can you run an array verify? I wonder if > the BIOS isn't picking up a disk failure since it isn't throwing errors, > but isn't doing any useful work either. > > > >>The kernel logs such messages as: >> >>Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks >>Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks >>Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks >> >>The operations it was performing at the time involved copying a lot of small >>(email messages) files from a busy NFS mount to the RAID5 array. A number of >>processes were all copying different files and the throughput was around 3MB/s >>to disk. >> >>As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a >>kernel data structure contains unexpected data, but I'm not confident enough to >>be able to tell what might be causing that. >> >>After such messages, if I cleanly unmount the filesystem and run fsck, errors >>are detected. Such errors are: >> >> directory corrupted >> directory contains empty blocks >> unallocated inode >> wrong link counts >> >>There are many more distinct error messages, but those are the ones I recall. >>After a number of passes through fsck, the filesystem is eventually marked >>clean but quite a number of files wind up in lost+found. >> >>Has anyone seen behaviour similar to this with twe RAID sets or large >>partitions in the past? I've not been able to find reports of similar symptoms >>using Google. >> >>Can anyone offer advice on how I might further debug this problem? >> >>Yours, >> >>Ollie >> >>Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3 >>Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048 >>Apr 16 11:34:12 heman /kernel: twed0: on twe0 >>Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors) >>Apr 16 11:34:12 heman /kernel: twed1: on twe0 >>Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors) >>Apr 16 11:34:12 heman /kernel: twe0: command interrupt >> >> > >