From owner-freebsd-questions@FreeBSD.ORG Wed Jul 16 13:15:52 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5FBA937B401 for ; Wed, 16 Jul 2003 13:15:52 -0700 (PDT) Received: from wrongcrowd.com (dsl231-036-178.sea1.dsl.speakeasy.net [216.231.36.178]) by mx1.FreeBSD.org (Postfix) with ESMTP id 98BE743F93 for ; Wed, 16 Jul 2003 13:15:49 -0700 (PDT) (envelope-from matt@wrongcrowd.com) Received: from [192.168.1.99] (helo=thunderbird.wrongcrowd.com) by wrongcrowd.com with esmtp (Exim 3.34 #1) id 19csgj-0003ip-00 for freebsd-questions@freebsd.org; Wed, 16 Jul 2003 13:15:45 -0700 Message-Id: <5.2.0.9.2.20030716124813.035e9e68@192.168.1.1> X-Sender: matt@192.168.1.1 X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Wed, 16 Jul 2003 13:16:26 -0700 To: freebsd-questions@freebsd.org From: Matt Staroscik In-Reply-To: <20030716054801.7A07737B404@hub.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: Re: Adaptec 2400A RAID controller corrupting data (4.8) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jul 2003 20:15:52 -0000 I am going to break this saga into 2 posts, one with the ugly details for those who are interested, and one short post with the essential questions and observations. >I have that card with 6 60 gig drives and set the box up (freebsd 4.7?) >and it would run for a day or so and just crash. I also recall having >similar panics when moving large amounts of data. I've given up on >using the box for any real work so it's just sitting doing nothing >waiting... hoping for a solution... a glimmer of hope. ;-) > >If you get it working please post. Here is an update. While I have made progress I am not 100% hopeful for a solution that is stable in the long term. To make a long story short, I seem to have made the system much stable by turning off soft updates. I was able to do a make buildworld, and then delete the contents of /usr/obj. Previously, one of those actions was sure to trigger a panic. Before I tried disabling soft updates I also did all this, some of which I readily admit is voodoo: - cable replacement - jumped drives to Master instead of Cable Select - Changed RAID card PCI slot - Wiggled everything I continued my test by cvsupping my source and doing another make buildworld. However, this time it bombed out while working on groff. I checked the file in an editor and it didn't look munged, so I am not sure if there is an error in the cvs tree, an innocent file transfer error, or a sign of deeper issues with my disk subsystem. I am going to thrash the machine with more builds but avoid CVS for now. Unfortunately, turning off soft updates isn't a great solution, if indeed it IS a solution, which I am still testing. It definitely makes things slower. My buildworld went from about 23 minutes to 34 minutes this way. Removing the contents of /usr/obj took about 1 minute, whereas with soft updates it took only a few seconds (though it panicked afterwards). Update: I created a custom kernel config (adding only device pcm and removing nothing) and successfully built it. I then installed it, rebooted, and tried to make installworld. Bomb city! getty dumped core before I even logged in and it got worse from there. Then I tried deleting /usr/obj and I got the kernel panic again. :) Observation: My last 2 panics (ffs_blkfree) reported these block numbers: 54608, 54592. Those are awfully close. Could my trouble stem from a defect on a disk? Things I have yet to try: - Removing the Maxtor 160s from the RAID and trying them individually on the motherboard controller. - Applying a hammer to the system Cheers, Matt