Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Oct 2005 11:13:42 -0500
From:      Dan Rue <drue@therub.org>
To:        Vinod Kashyap <vkashyap@amcc.com>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: twa kernel panic under heavy IO
Message-ID:  <20051024161342.GI38097@therub.org>
In-Reply-To: <2B3B2AA816369A4E87D7BE63EC9D2F26C621CC@SDCEXCHANGE01.ad.amcc.com>
References:  <2B3B2AA816369A4E87D7BE63EC9D2F26C621CC@SDCEXCHANGE01.ad.amcc.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 06, 2005 at 01:41:38PM -0700, Vinod Kashyap wrote:
> > -----Original Message-----
> > From: owner-freebsd-stable@freebsd.org 
> > [mailto:owner-freebsd-stable@freebsd.org] On Behalf Of Jung-uk Kim
> > Sent: Thursday, October 06, 2005 1:30 PM
> > To: freebsd-stable@FreeBSD.org
> > Cc: Dan Rue
> > Subject: Re: twa kernel panic under heavy IO
> > 
> > On Thursday 06 October 2005 04:07 pm, Dan Rue wrote:
> > > Greetings,
> > >
> > > I am running a 3ware 9500 SATA raid card in a 12x300GB raid 50 
> > > configuration.
> > >
> > > Here is dmesg identifying the controller:
> > > 3ware device driver for 9000 series storage controllers, version:
> > > 2.50.02.012 twa0: <3ware 9000 series Storage Controller> port 
> > > 0xb800-0xb8ff mem 0xfb800000-0xfbffffff,0xfc5ffc00-0xfc5ffcff irq
> > > 24 at device 2.0 on pci2 twa0: 12 ports, Firmware FE9X 2.06.00.009, 
> > > BIOS BE9X 2.03.01.051
> > >
> > > I was getting occasional kernel panics in 5.4 doing high I/O type 
> > > things (typically an rsync operation).  I was told that twa was 
> > > updated in 5-STABLE, so yesterday I upgraded.  I've 
> 
> Going by the dmesg, you have a 9.1.5.2 driver and 9.2 firmware.  The
> driver in 5 -STABLE is from the 9.2 release.  So, you might not have
> the driver upgrade done properly.  Try using the driver and firmware
> from the same release.  If you still see problems, please contact
> 3ware support.

Sorry about that, the driver and firmware were not actually mismatched -
I had pasted my dmesg from a previous email when I was running a
different version of FreeBSD.

---

After going around with 3ware web support, this issue has been
concluded, but not resolved.  I tried my 3ware 9500 on FreeBSD 5.3, 5.4,
and 5-STABLE.  With all of these versions of OS and driver (i never
changed the driver version manually), I received hard lock ups and
reboots (though, interestingly, no kernel panics).  

3ware had me check and troubleshoot a number of possibilities, until
they finally decided it was a hardware problem and issued me a
replacement card.  However, in the meantime, I upgraded to FreeBSD
6.0RC1 and the machine is now working flawlessly.  I returned the
replacement card unused.  

I can only conclude that this means that there is a large (timing?) bug
in the twa driver in freebsd 5.3/5.4/5-stable (as opposed to an isolated
hardware problem with my setup).

I have pasted the full conversation with 3ware on my website for those
interested here: 
http://therub.org/9500.txt (sorry for the poor formatting)

At one point, I received the following error message just before the
machine locked up:

>Oct 12 11:36:13 leopard kernel: initiate_write_filepage: already started

I grepped for that error message in the freebsd kernel source, and found
it in sys/ufs/ffs/ffs_softdep.c on line 3580.  What makes it really
interesting is the comment above where the error is thrown:

if (pagedep->pd_state & IOSTARTED) {
        /*
         * This can only happen if there is a driver that does not
         * understand chaining. Here biodone will reissue the call
         * to strategy for the incomplete buffers.
         */
        printf("initiate_write_filepage: already started\n");
        return;
}

I know this is a 3ware issue.  I am posting this resolution response
here in hopes that it may help someone else that hits this bug - and
with the hope that publically it will get the attention of the 3ware
freebsd driver team/individual.

Dan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051024161342.GI38097>