From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 6 22:04:14 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1275F96F; Wed, 6 Nov 2013 22:04:14 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CBF442254; Wed, 6 Nov 2013 22:04:13 +0000 (UTC) Received: by mail-ie0-f170.google.com with SMTP id at1so251846iec.1 for ; Wed, 06 Nov 2013 14:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=wLkVMSXlSvwDbOE4hx+9QkD+fSCogt9V80siYa43+WE=; b=cxYdMa/dN8eUoUiag2T1o3WkzwHc8ti5YgH4RZsqwWWYWfmE+Ttv884z4jJhgve9rJ srmbnzdRL7HUpjK06cbQe+/6Zb0BBSYIGJhk2OduTbbgS4+K6k7vderXuONpcAlxwlfg DJsCFUnoQUgi++OCTXbd95T58wAR9ehzF62pWnCTK27rntD0VS6qtslf39BCPX1GZSTC bRI5zo4k0bfy7djjt+xUIOcucTylK8ITy3qK5JVl653NXvY11l7+9cwedBR/SJzI1Fuo 5EaCxyjN/q1+7JBg4MW+ZHdwwaf7tq9vk+lOwQo86I9y9kckXCEbUTS+TRftgiSr0ieY eXLQ== X-Received: by 10.50.29.4 with SMTP id f4mr22183266igh.11.1383775452566; Wed, 06 Nov 2013 14:04:12 -0800 (PST) Received: from charmander.sandvine.com ([64.7.137.182]) by mx.google.com with ESMTPSA id m1sm16292190igj.10.2013.11.06.14.04.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Nov 2013 14:04:11 -0800 (PST) Sender: Mark Johnston Date: Wed, 6 Nov 2013 18:03:57 -0500 From: Mark Johnston To: Charles Owens Subject: Re: adding BBU relearn support to mfiutil Message-ID: <20131106230356.GA86666@charmander.sandvine.com> References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <527A7603.7090303@greatbaysoftware.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-scsi@freebsd.org, Steve McCoy X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Nov 2013 22:04:14 -0000 On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote: > Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 > (we extracted r250483 and r250497 from stable/8 and applied to > releng/8.4). I'm seeing some results that make me question whether or > not caching is really working correctly after a BBU relearn operation > has completed -- or maybe whether or not the new BBU patch is talking to > LSI controller properly. > > Our test system had a BBU in the failed state (relearn needed). We used > the "start learn command" and it seemed to go well, but strangely, when > process is seems to have completed, and now several days later, status > is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). > This may be entirely normal -- maybe it says that because the autolearn > feature is now enabled? I suspect that the status is bogus and that the battery is in fact dead. There seem to be a few firmware bugs in the BBU status reporting, at least with iBBU07. In your output below, I see: Design Capacity: 1215 mAh Full Charge Capacity: 65262 mAh Current Capacity: 61543 mAh which clearly isn't right. I've seen this problem before as well: over time, the full charge capacity decreases, and eventually it seems to wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports exactly the same thing, so it's a problem with the controller firmware. If you look at MegaCli output you get things like "Absolute charge: 6000%". So I suspect that the status is incorrect as well; when I've run into this problem, I still see "status: normal". > > The "cache" status command also suggests also is a bit strange. Here is > the raw output of these status commands: > > # mfiutil cache mfid0 > mfi0 volume mfid0 cache settings: > I/O caching: disabled > write caching: write-back > write cache with bad BBU: disabled > read ahead: adaptive > drive write cache: enabled > Cache disabled due to dead battery or ongoing battery relearn > > > # ./mfiutil show battery > mfi0: Battery State: > Manufacture Date: 3/18/2010 > Serial Number: 77 > Manufacturer: LS1111001A > Model: 3598501 > Chemistry: LION > Design Capacity: 1215 mAh > Full Charge Capacity: 65262 mAh > Current Capacity: 61543 mAh > Charge Cycles: 120 > Current Charge: 94% > Design Voltage: 3700 mV > Current Voltage: 4081 mV > Temperature: 23 C > Autolearn period: 30 days > Next learn time: Tue Nov 26 20:06:40 2013 > Learn delay interval: 0 hours > Autolearn mode: enabled > Status: LEARN_CYCLE_REQUESTED > > > /Why does cache status now say "Cache disabled due to dead battery or > ongoing battery relearn"/? Shouldn't this no longer be the case since > I've run the "learn" operation? Does this indicate that the I/O caching > is really disabled? I believe so. You can try changing the write caching policy to write-back with bad BBU and see if that re-enables the cache. If it does, that's more evidence that the BBU is dead and needs to be replaced. > > I'd appreciate any and all assistance. Here's a bit of other info that > might be of interest: > > # mfiutil show adapter > mfi0 Adapter: > Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 > Serial Number: > Firmware: 11.0.1-0036 > RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 > Battery Backup: present > NVRAM: 32K > Onboard Memory: 512M > Minimum Stripe: 8k > Maximum Stripe: 1M > > # mfiutil show drives > mfi0 Physical Drives: > 1 ( 136G) ONLINE SAS E1:S0 > 2 ( 136G) ONLINE SAS E1:S1 > 3 ( 136G) ONLINE SAS E1:S4 > 4 ( 136G) ONLINE SAS E1:S2 > 5 ( 136G) HOT SPARE SAS E1:S3 > > The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon > E5530 CPUs, on an Intel S5520UR motherboard. It might be useful to check the output of "mfiutil show events -c info". > > Thanks! > > Charles Owens > Great Bay Software > > > > On Fri Apr 5 20:08:09 2013, Mark Johnston wrote: > > > > On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote: > >> > >> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote: > >>> > >>> Hi Everyone, > >>> > >>> I recently needed to add a couple of features to mfiutil related to BBU > >>> relearning. I've pasted a patch below which > >>> > >>> 1. adds extra fields to the output of "mfiutil show battery" showing BBU > >>> properties. This is essentially the output of > >>> > >>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL > >>> > >>> and consists of info about battery learning: the learn period, the > >>> time at which the controller will start the next relearn, and the BBU > >>> mode (which indicates whether the battery supports transparent > >>> relearning). > >>> > >>> 2. adds a couple of subcommands under "mfiutil bbu" which lets users set > >>> the BBU properties which can be set by MegaCli. > >>> > >>> 3. adds a command "mfiutil start learn" which immediately kicks off a > >>> battery relearn. > >>> > >>> These changes grew out of concern about the fact that the controller > >>> write cache is set to write-through mode during a relearn period (which > >>> usually lasts for several hours). This ended up causing some mysterious > >>> and intermittent performance issues, so I needed a way of getting more > >>> info about what was going on (using MegaCli isn't really an option for > >>> several reasons). Some BBUs support transparent relearning, which > >>> basically means that the controller write cache doesn't get turned off > >>> during a relearn. However, LSI's default config doesn't enable it, and > >>> now mfiutil can be used to do that (through "mfiutil bbu bbu-mode"). > >>> > >>> I was hoping someone would be able to review the patch. If anyone's able > >>> and willing to test it, I'd very much appreciate feedback from that. > >>> > >>> Thanks! > >>> -Mark > >> > >> > >> Just to document for the record. Finally got around to testing this > >> today with Mark providing updates. Looks good overall with a couple of > >> nits that he is handling at the moment (man page and variable name > >> collision). > > > > > > The updated patch is here: > > http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff > > > > I'll commit it in a few days if there aren't any problems. > > > > Thanks, > > -Mark > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > > > > >