Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Jul 2008 21:07:41 -0400
From:      "Brian A. Seklecki (Mobile)" <bseklecki@collaborativefusion.com>
To:        Scott Long <scottl@samsco.org>
Cc:        Sean McAfee <smcafee@collaborativefusion.com>, Jason Thomson <jason.thomson@mintel.com>, scottl@freebsd.org, Benjie Chen <benjie@addgene.org>, "freebsd-hardware@freebsd.org" <freebsd-hardware@freebsd.org>
Subject:   Re: PERC5 (LSI MegaSAS) Patrol Read crashes
Message-ID:  <1215047261.9810.16.camel@localhost.localdomain>
In-Reply-To: <486909B1.3020309@samsco.org>
References:  <20071114122210.42E8613C4BB@mx1.freebsd.org> <1195160114.4042.154.camel@new-host> <1214840198.18670.43.camel@soundwave.ws.pitbpa0.priv.collaborativefusion.com> <486909B1.3020309@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> >> Its a software bug (driver).  It can probably be easily fixed.  I
> >> think there's a PR on it somewhere (will check).
> 
> The problem is a firmware bug in the Megaraid SAS controller.  It seems
> that while the controller can handle 512 or more concurrent commands,

That's great news.  We will try that patch in our local source tree.

One thing to note, though, is that both or R1 and R2/3 systems have the
same controller with the same firmware version, but we never saw the
problem in the R2/3.

Indeed the Dell product number revision is different (from ipmitool fru)
for the parts.

Although the firmware updates are the same for R1 and R2/3, maybe the
updater probes the underlying hardware revision and applies different
code?

Or perhaps it is something to do with the performance or kernel behavior
of the older Hyperthreading Xeon's (and motherboard) in the the R1 that
just causes it to occur more-often. 

~BAS

> it can only handle 128 concurrent commands to each array.  Patrols
> reads aren't the primary cause, they just help the problem appear; when
> a patrol read cycle runs, it tends to slow down i/o enough that commands
> to the array get backed up, and you tend to reach the 128 limit.
> 
> I don't know if there is a firmware fix from Dell/LSI, or if there will
> ever be a fix.  FreeBSD drivers tend to stress hardware a lot more
> than Linux and Windows do, and since the latter two are used as the
> QA yardstick, anything that doesn't affect them doesn't usually get
> fixed.  An easy work-around for the driver is to change the following
> line in /sys/dev/mfi/mfi.c::mfi_alloc_commands()
> 
> ncmds = sc->mfi_max_fw_cmds;
> 
> to
> 
> ncmds = 128;
> 
> A more complete solution requires me writing an i/o scheduler in the
> driver, something that would take quite a bit of effort.
> 
> With all this said, I still stand behind LSI controllers.  This bug,
> while unfortunate, is relatively minor and easy to work around, and
> it's the only significant bug that has turned up in over two and half
> years with this hardware.
> 
> Scott
> 
> _______________________________________________
> freebsd-hardware@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
> To unsubscribe, send any mail to "freebsd-hardware-unsubscribe@freebsd.org"




IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1215047261.9810.16.camel>