Date: Mon, 24 Oct 2005 11:07:28 -0700 From: "Vinod Kashyap" <vkashyap@amcc.com> To: "Dan Rue" <drue@therub.org> Cc: freebsd-stable@FreeBSD.org Subject: RE: twa kernel panic under heavy IO Message-ID: <2B3B2AA816369A4E87D7BE63EC9D2F26D89125@SDCEXCHANGE01.ad.amcc.com>
next in thread | raw e-mail | index | archive | help
> -----Original Message----- > From: Dan Rue [mailto:drue@therub.org]=20 > Sent: Monday, October 24, 2005 9:14 AM > To: Vinod Kashyap > Cc: freebsd-stable@FreeBSD.org > Subject: Re: twa kernel panic under heavy IO >=20 > On Thu, Oct 06, 2005 at 01:41:38PM -0700, Vinod Kashyap wrote: > > > -----Original Message----- > > > From: owner-freebsd-stable@freebsd.org=20 > > > [mailto:owner-freebsd-stable@freebsd.org] On Behalf Of Jung-uk Kim > > > Sent: Thursday, October 06, 2005 1:30 PM > > > To: freebsd-stable@FreeBSD.org > > > Cc: Dan Rue > > > Subject: Re: twa kernel panic under heavy IO > > >=20 > > > On Thursday 06 October 2005 04:07 pm, Dan Rue wrote: > > > > Greetings, > > > > > > > > I am running a 3ware 9500 SATA raid card in a 12x300GB raid 50=20 > > > > configuration. > > > > > > > > Here is dmesg identifying the controller: > > > > 3ware device driver for 9000 series storage=20 > controllers, version: > > > > 2.50.02.012 twa0: <3ware 9000 series Storage Controller> port=20 > > > > 0xb800-0xb8ff mem=20 > 0xfb800000-0xfbffffff,0xfc5ffc00-0xfc5ffcff irq > > > > 24 at device 2.0 on pci2 twa0: 12 ports, Firmware FE9X=20 > > > > 2.06.00.009, BIOS BE9X 2.03.01.051 > > > > > > > > I was getting occasional kernel panics in 5.4 doing=20 > high I/O type=20 > > > > things (typically an rsync operation). I was told that twa was=20 > > > > updated in 5-STABLE, so yesterday I upgraded. I've > >=20 > > Going by the dmesg, you have a 9.1.5.2 driver and 9.2=20 > firmware. The=20 > > driver in 5 -STABLE is from the 9.2 release. So, you might=20 > not have=20 > > the driver upgrade done properly. Try using the driver and=20 > firmware=20 > > from the same release. If you still see problems, please contact=20 > > 3ware support. >=20 > Sorry about that, the driver and firmware were not actually=20 > mismatched - I had pasted my dmesg from a previous email when=20 > I was running a different version of FreeBSD. >=20 > --- >=20 > After going around with 3ware web support, this issue has=20 > been concluded, but not resolved. I tried my 3ware 9500 on=20 > FreeBSD 5.3, 5.4, and 5-STABLE. With all of these versions=20 > of OS and driver (i never changed the driver version=20 > manually), I received hard lock ups and reboots (though,=20 > interestingly, no kernel panics). =20 >=20 > 3ware had me check and troubleshoot a number of=20 > possibilities, until they finally decided it was a hardware=20 > problem and issued me a replacement card. However, in the=20 > meantime, I upgraded to FreeBSD > 6.0RC1 and the machine is now working flawlessly. I returned=20 > the replacement card unused. =20 >=20 > I can only conclude that this means that there is a large=20 > (timing?) bug in the twa driver in freebsd 5.3/5.4/5-stable=20 > (as opposed to an isolated hardware problem with my setup). >=20 > I have pasted the full conversation with 3ware on my website=20 > for those interested here:=20 > http://therub.org/9500.txt (sorry for the poor formatting) >=20 > At one point, I received the following error message just=20 > before the machine locked up: >=20 > >Oct 12 11:36:13 leopard kernel: initiate_write_filepage: already=20 > >started >=20 > I grepped for that error message in the freebsd kernel=20 > source, and found it in sys/ufs/ffs/ffs_softdep.c on line=20 > 3580. What makes it really interesting is the comment above=20 > where the error is thrown: >=20 > if (pagedep->pd_state & IOSTARTED) { > /* > * This can only happen if there is a driver that does not > * understand chaining. Here biodone will reissue the call > * to strategy for the incomplete buffers. > */ > printf("initiate_write_filepage: already started\n"); > return; > } >=20 > I know this is a 3ware issue. I am posting this resolution=20 > response here in hopes that it may help someone else that=20 > hits this bug - and with the hope that publically it will get=20 > the attention of the 3ware freebsd driver team/individual. >=20 The error messages you are seeing are consistent with bad hardware. The hardware is becoming unavailable for the driver to talk to it. This other message "initiate_write_filepage..." is different but did you see the machine hang after this message got printed? I don't think it's related to the hang.=20 > Dan > -------------------------------------------------------- CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, = is for the sole use of the intended recipient(s) and contains = information that is confidential and proprietary to Applied Micro = Circuits Corporation or its subsidiaries. It is to be used solely for = the purpose of furthering the parties' business relationship. All = unauthorized review, use, disclosure or distribution is prohibited. If = you are not the intended recipient, please contact the sender by reply = e-mail and destroy all copies of the original message.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2B3B2AA816369A4E87D7BE63EC9D2F26D89125>