Date: Mon, 30 Jun 2008 10:28:33 -0600 From: Scott Long <scottl@samsco.org> To: bseklecki@collaborativefusion.com Cc: Sean McAfee <smcafee@collaborativefusion.com>, scottl@freebsd.org, Jason Thomson <jason.thomson@mintel.com>, "freebsd-hardware@freebsd.org" <freebsd-hardware@freebsd.org>, Benjie Chen <benjie@addgene.org> Subject: Re: PERC5 (LSI MegaSAS) Patrol Read crashes Message-ID: <486909B1.3020309@samsco.org> In-Reply-To: <1214840198.18670.43.camel@soundwave.ws.pitbpa0.priv.collaborativefusion.com> References: <20071114122210.42E8613C4BB@mx1.freebsd.org> <1195160114.4042.154.camel@new-host> <1214840198.18670.43.camel@soundwave.ws.pitbpa0.priv.collaborativefusion.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Brian A. Seklecki wrote: > On Thu, 2007-11-15 at 15:55 -0500, Brian A Seklecki (Mobile) wrote: >> Normally I'd be praising Dell, but I think a little vendor bashing is >> due here. > > All: > > Just to follow up, we've been running these 1st-generation 2950s in our > lab with RHEl5.2 x86_64 for ~3 weeks w/o any disk or I/O problems. > > It must have been some random bug with the FreeBSD mfi(4) that only > affected that revision of the PERC5, or, since the motherboard/CPU > family/chipset is entirely different in R2 and R3, something with > FreeBSD and how it was handling the controller (ACPI?) > > We never had any stability problems with R2 and R3 on RELENG_6_3 on the > 2950 or 1950. > >>From now on we'll wait for R2 before we go anywhere near new Dell > gear. > > What do you think the chances of them dumping LSI for Acera and Broadcom > for Intel? :) > > ~BAS > >> Its a software bug (driver). It can probably be easily fixed. I >> think there's a PR on it somewhere (will check). The problem is a firmware bug in the Megaraid SAS controller. It seems that while the controller can handle 512 or more concurrent commands, it can only handle 128 concurrent commands to each array. Patrols reads aren't the primary cause, they just help the problem appear; when a patrol read cycle runs, it tends to slow down i/o enough that commands to the array get backed up, and you tend to reach the 128 limit. I don't know if there is a firmware fix from Dell/LSI, or if there will ever be a fix. FreeBSD drivers tend to stress hardware a lot more than Linux and Windows do, and since the latter two are used as the QA yardstick, anything that doesn't affect them doesn't usually get fixed. An easy work-around for the driver is to change the following line in /sys/dev/mfi/mfi.c::mfi_alloc_commands() ncmds = sc->mfi_max_fw_cmds; to ncmds = 128; A more complete solution requires me writing an i/o scheduler in the driver, something that would take quite a bit of effort. With all this said, I still stand behind LSI controllers. This bug, while unfortunate, is relatively minor and easy to work around, and it's the only significant bug that has turned up in over two and half years with this hardware. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?486909B1.3020309>