From owner-freebsd-hardware@FreeBSD.ORG Mon Jun 30 16:47:33 2008 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3ECCA10656AA for ; Mon, 30 Jun 2008 16:47:33 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id C1CEA8FC14 for ; Mon, 30 Jun 2008 16:47:31 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.local ([192.168.254.200]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id m5UGSXKA078251; Mon, 30 Jun 2008 10:28:34 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <486909B1.3020309@samsco.org> Date: Mon, 30 Jun 2008 10:28:33 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.13) Gecko/20080313 SeaMonkey/1.1.9 MIME-Version: 1.0 To: bseklecki@collaborativefusion.com References: <20071114122210.42E8613C4BB@mx1.freebsd.org> <1195160114.4042.154.camel@new-host> <1214840198.18670.43.camel@soundwave.ws.pitbpa0.priv.collaborativefusion.com> In-Reply-To: <1214840198.18670.43.camel@soundwave.ws.pitbpa0.priv.collaborativefusion.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=3.8 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: Sean McAfee , scottl@freebsd.org, Jason Thomson , "freebsd-hardware@freebsd.org" , Benjie Chen Subject: Re: PERC5 (LSI MegaSAS) Patrol Read crashes X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jun 2008 16:47:33 -0000 Brian A. Seklecki wrote: > On Thu, 2007-11-15 at 15:55 -0500, Brian A Seklecki (Mobile) wrote: >> Normally I'd be praising Dell, but I think a little vendor bashing is >> due here. > > All: > > Just to follow up, we've been running these 1st-generation 2950s in our > lab with RHEl5.2 x86_64 for ~3 weeks w/o any disk or I/O problems. > > It must have been some random bug with the FreeBSD mfi(4) that only > affected that revision of the PERC5, or, since the motherboard/CPU > family/chipset is entirely different in R2 and R3, something with > FreeBSD and how it was handling the controller (ACPI?) > > We never had any stability problems with R2 and R3 on RELENG_6_3 on the > 2950 or 1950. > >>From now on we'll wait for R2 before we go anywhere near new Dell > gear. > > What do you think the chances of them dumping LSI for Acera and Broadcom > for Intel? :) > > ~BAS > >> Its a software bug (driver). It can probably be easily fixed. I >> think there's a PR on it somewhere (will check). The problem is a firmware bug in the Megaraid SAS controller. It seems that while the controller can handle 512 or more concurrent commands, it can only handle 128 concurrent commands to each array. Patrols reads aren't the primary cause, they just help the problem appear; when a patrol read cycle runs, it tends to slow down i/o enough that commands to the array get backed up, and you tend to reach the 128 limit. I don't know if there is a firmware fix from Dell/LSI, or if there will ever be a fix. FreeBSD drivers tend to stress hardware a lot more than Linux and Windows do, and since the latter two are used as the QA yardstick, anything that doesn't affect them doesn't usually get fixed. An easy work-around for the driver is to change the following line in /sys/dev/mfi/mfi.c::mfi_alloc_commands() ncmds = sc->mfi_max_fw_cmds; to ncmds = 128; A more complete solution requires me writing an i/o scheduler in the driver, something that would take quite a bit of effort. With all this said, I still stand behind LSI controllers. This bug, while unfortunate, is relatively minor and easy to work around, and it's the only significant bug that has turned up in over two and half years with this hardware. Scott