Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Jul 2003 13:16:26 -0700
From:      Matt Staroscik <matt@wrongcrowd.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: Adaptec 2400A RAID controller corrupting data (4.8)
Message-ID:  <5.2.0.9.2.20030716124813.035e9e68@192.168.1.1>
In-Reply-To: <20030716054801.7A07737B404@hub.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
I am going to break this saga into 2 posts, one with the ugly details for 
those who are interested, and one short post with the essential questions 
and observations.

>I have that card with 6 60 gig drives and set the box up (freebsd 4.7?)
>and it would run for a day or so and just crash.  I also recall having
>similar panics when moving large amounts of data.  I've given up on
>using the box for any real work so it's just sitting doing nothing
>waiting... hoping for a solution... a glimmer of hope.  ;-)
>
>If you get it working please post.

Here is an update. While I have made progress I am not 100% hopeful for a 
solution that is stable in the long term.

To make a long story short, I seem to have made the system much stable by 
turning off soft updates. I was able to do a make buildworld, and then 
delete the contents of /usr/obj. Previously, one of those actions was sure 
to trigger a panic. Before I tried disabling soft updates I also did all 
this, some of which I readily admit is voodoo:

- cable replacement
- jumped drives to Master instead of Cable Select
- Changed RAID card PCI slot
- Wiggled everything

I continued my test by cvsupping my source and doing another make 
buildworld. However, this time it bombed out while working on groff. I 
checked the file in an editor and it didn't look munged, so I am not sure 
if there is an error in the cvs tree, an innocent file transfer error, or a 
sign of deeper issues with my disk subsystem. I am going to thrash the 
machine with more builds but avoid CVS for now.

Unfortunately, turning off soft updates isn't a great solution, if indeed 
it IS a solution, which I am still testing. It definitely makes things 
slower. My buildworld went from about 23 minutes to 34 minutes this way. 
Removing the contents of /usr/obj took about 1 minute, whereas with soft 
updates it took only a few seconds (though it panicked afterwards).

Update: I created a custom kernel config (adding only device pcm and 
removing nothing) and successfully built it. I then installed it, rebooted, 
and tried to make installworld. Bomb city! getty dumped core before I even 
logged in and it got worse from there.

Then I tried deleting /usr/obj and I got the kernel panic again. :)

Observation: My last 2 panics (ffs_blkfree) reported these block numbers: 
54608, 54592. Those are awfully close. Could my trouble stem from a defect 
on a disk?

Things I have yet to try:

- Removing the Maxtor 160s from the RAID and trying them individually on 
the motherboard controller.
- Applying a hammer to the system

Cheers,
Matt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5.2.0.9.2.20030716124813.035e9e68>