Date: Wed, 6 Nov 2013 18:03:57 -0500 From: Mark Johnston <markj@freebsd.org> To: Charles Owens <cowens@greatbaysoftware.com> Cc: freebsd-scsi@freebsd.org, Steve McCoy <smccoy@greatbaysoftware.com> Subject: Re: adding BBU relearn support to mfiutil Message-ID: <20131106230356.GA86666@charmander.sandvine.com> In-Reply-To: <527A7603.7090303@greatbaysoftware.com> References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote: > Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 > (we extracted r250483 and r250497 from stable/8 and applied to > releng/8.4). I'm seeing some results that make me question whether or > not caching is really working correctly after a BBU relearn operation > has completed -- or maybe whether or not the new BBU patch is talking to > LSI controller properly. > > Our test system had a BBU in the failed state (relearn needed). We used > the "start learn command" and it seemed to go well, but strangely, when > process is seems to have completed, and now several days later, status > is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). > This may be entirely normal -- maybe it says that because the autolearn > feature is now enabled? I suspect that the status is bogus and that the battery is in fact dead. There seem to be a few firmware bugs in the BBU status reporting, at least with iBBU07. In your output below, I see: Design Capacity: 1215 mAh Full Charge Capacity: 65262 mAh Current Capacity: 61543 mAh which clearly isn't right. I've seen this problem before as well: over time, the full charge capacity decreases, and eventually it seems to wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports exactly the same thing, so it's a problem with the controller firmware. If you look at MegaCli output you get things like "Absolute charge: 6000%". So I suspect that the status is incorrect as well; when I've run into this problem, I still see "status: normal". > > The "cache" status command also suggests also is a bit strange. Here is > the raw output of these status commands: > > # mfiutil cache mfid0 > mfi0 volume mfid0 cache settings: > I/O caching: disabled > write caching: write-back > write cache with bad BBU: disabled > read ahead: adaptive > drive write cache: enabled > Cache disabled due to dead battery or ongoing battery relearn > > > # ./mfiutil show battery > mfi0: Battery State: > Manufacture Date: 3/18/2010 > Serial Number: 77 > Manufacturer: LS1111001A > Model: 3598501 > Chemistry: LION > Design Capacity: 1215 mAh > Full Charge Capacity: 65262 mAh > Current Capacity: 61543 mAh > Charge Cycles: 120 > Current Charge: 94% > Design Voltage: 3700 mV > Current Voltage: 4081 mV > Temperature: 23 C > Autolearn period: 30 days > Next learn time: Tue Nov 26 20:06:40 2013 > Learn delay interval: 0 hours > Autolearn mode: enabled > Status: LEARN_CYCLE_REQUESTED > > > /Why does cache status now say "Cache disabled due to dead battery or > ongoing battery relearn"/? Shouldn't this no longer be the case since > I've run the "learn" operation? Does this indicate that the I/O caching > is really disabled? I believe so. You can try changing the write caching policy to write-back with bad BBU and see if that re-enables the cache. If it does, that's more evidence that the BBU is dead and needs to be replaced. > > I'd appreciate any and all assistance. Here's a bit of other info that > might be of interest: > > # mfiutil show adapter > mfi0 Adapter: > Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 > Serial Number: > Firmware: 11.0.1-0036 > RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 > Battery Backup: present > NVRAM: 32K > Onboard Memory: 512M > Minimum Stripe: 8k > Maximum Stripe: 1M > > # mfiutil show drives > mfi0 Physical Drives: > 1 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005JE> SAS E1:S0 > 2 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005JV> SAS E1:S1 > 3 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005KD> SAS E1:S4 > 4 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005BQ> SAS E1:S2 > 5 ( 136G) HOT SPARE <SEAGATE ST9146852SS 0005 serial=6TB005FJ> SAS E1:S3 > > The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon > E5530 CPUs, on an Intel S5520UR motherboard. It might be useful to check the output of "mfiutil show events -c info". > > Thanks! > > Charles Owens > Great Bay Software > > > > On Fri Apr 5 20:08:09 2013, Mark Johnston wrote: > > > > On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote: > >> > >> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote: > >>> > >>> Hi Everyone, > >>> > >>> I recently needed to add a couple of features to mfiutil related to BBU > >>> relearning. I've pasted a patch below which > >>> > >>> 1. adds extra fields to the output of "mfiutil show battery" showing BBU > >>> properties. This is essentially the output of > >>> > >>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL > >>> > >>> and consists of info about battery learning: the learn period, the > >>> time at which the controller will start the next relearn, and the BBU > >>> mode (which indicates whether the battery supports transparent > >>> relearning). > >>> > >>> 2. adds a couple of subcommands under "mfiutil bbu" which lets users set > >>> the BBU properties which can be set by MegaCli. > >>> > >>> 3. adds a command "mfiutil start learn" which immediately kicks off a > >>> battery relearn. > >>> > >>> These changes grew out of concern about the fact that the controller > >>> write cache is set to write-through mode during a relearn period (which > >>> usually lasts for several hours). This ended up causing some mysterious > >>> and intermittent performance issues, so I needed a way of getting more > >>> info about what was going on (using MegaCli isn't really an option for > >>> several reasons). Some BBUs support transparent relearning, which > >>> basically means that the controller write cache doesn't get turned off > >>> during a relearn. However, LSI's default config doesn't enable it, and > >>> now mfiutil can be used to do that (through "mfiutil bbu bbu-mode"). > >>> > >>> I was hoping someone would be able to review the patch. If anyone's able > >>> and willing to test it, I'd very much appreciate feedback from that. > >>> > >>> Thanks! > >>> -Mark > >> > >> > >> Just to document for the record. Finally got around to testing this > >> today with Mark providing updates. Looks good overall with a couple of > >> nits that he is handling at the moment (man page and variable name > >> collision). > > > > > > The updated patch is here: > > http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff > > > > I'll commit it in a few days if there aren't any problems. > > > > Thanks, > > -Mark > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131106230356.GA86666>