Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Nov 2013 18:03:57 -0500
From:      Mark Johnston <markj@freebsd.org>
To:        Charles Owens <cowens@greatbaysoftware.com>
Cc:        freebsd-scsi@freebsd.org, Steve McCoy <smccoy@greatbaysoftware.com>
Subject:   Re: adding BBU relearn support to mfiutil
Message-ID:  <20131106230356.GA86666@charmander.sandvine.com>
In-Reply-To: <527A7603.7090303@greatbaysoftware.com>
References:  <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote:
> Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 
> (we extracted r250483 and r250497 from stable/8 and applied to 
> releng/8.4).  I'm seeing some results that make me question whether or 
> not caching is really working correctly after a BBU relearn operation 
> has completed -- or maybe whether or not the new BBU patch is talking to 
> LSI controller properly.
> 
> Our test system had a BBU in the failed state (relearn needed).  We used 
> the "start learn command" and it seemed to go well, but strangely, when 
> process is seems to have completed, and now several days later, status 
> is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery").  
> This may be entirely normal -- maybe it says that because the autolearn 
> feature is now enabled?

I suspect that the status is bogus and that the battery is in fact dead.
There seem to be a few firmware bugs in the BBU status reporting, at
least with iBBU07. In your output below, I see:

        Design Capacity: 1215 mAh
   Full Charge Capacity: 65262 mAh
       Current Capacity: 61543 mAh

which clearly isn't right. I've seen this problem before as well: over
time, the full charge capacity decreases, and eventually it seems to
wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports
exactly the same thing, so it's a problem with the controller firmware.
If you look at MegaCli output you get things like "Absolute charge: 6000%".
So I suspect that the status is incorrect as well; when I've run into
this problem, I still see "status: normal".

> 
> The "cache" status command also suggests also is a bit strange. Here is 
> the raw output of these status commands:
> 
> # mfiutil cache mfid0
> mfi0 volume mfid0 cache settings:
>               I/O caching: disabled
>             write caching: write-back
> write cache with bad BBU: disabled
>                read ahead: adaptive
>         drive write cache: enabled
> Cache disabled due to dead battery or ongoing battery relearn
> 
> 
> # ./mfiutil show battery
> mfi0: Battery State:
>       Manufacture Date: 3/18/2010
>          Serial Number: 77
>           Manufacturer: LS1111001A
>                  Model: 3598501
>              Chemistry: LION
>        Design Capacity: 1215 mAh
>   Full Charge Capacity: 65262 mAh
>       Current Capacity: 61543 mAh
>          Charge Cycles: 120
>         Current Charge: 94%
>         Design Voltage: 3700 mV
>        Current Voltage: 4081 mV
>            Temperature: 23 C
>       Autolearn period: 30 days
>        Next learn time: Tue Nov 26 20:06:40 2013
>   Learn delay interval: 0 hours
>         Autolearn mode: enabled
>                 Status: LEARN_CYCLE_REQUESTED
> 
> 
> /Why does cache status now say  "Cache disabled due to dead battery or 
> ongoing battery relearn"/?  Shouldn't this no longer be the case since 
> I've run the "learn" operation?  Does this indicate that the I/O caching 
> is really disabled?

I believe so. You can try changing the write caching policy to write-back
with bad BBU and see if that re-enables the cache. If it does, that's
more evidence that the BBU is dead and needs to be replaced.

> 
> I'd appreciate any and all assistance.  Here's a bit of other info that 
> might be of interest:
> 
> # mfiutil show adapter
> mfi0 Adapter:
>      Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2
>     Serial Number:
>          Firmware: 11.0.1-0036
>       RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
>    Battery Backup: present
>             NVRAM: 32K
>    Onboard Memory: 512M
>    Minimum Stripe: 8k
>    Maximum Stripe: 1M
> 
> # mfiutil show drives
> mfi0 Physical Drives:
>   1 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005JE> SAS E1:S0
>   2 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005JV> SAS E1:S1
>   3 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005KD> SAS E1:S4
>   4 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005BQ> SAS E1:S2
>   5 (  136G) HOT SPARE <SEAGATE ST9146852SS 0005 serial=6TB005FJ> SAS E1:S3
> 
> The storage volume is 4-drives, RAID10.  System has 16GB RAM, dual Xeon 
> E5530 CPUs, on an Intel S5520UR motherboard.

It might be useful to check the output of "mfiutil show events -c info".

> 
> Thanks!
> 
> Charles Owens
> Great Bay Software
> 
> 
> 
> On Fri Apr 5 20:08:09 2013, Mark Johnston wrote:
> >
> > On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote:
> >>
> >> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote:
> >>>
> >>> Hi Everyone,
> >>>
> >>> I recently needed to add a couple of features to mfiutil related to BBU
> >>> relearning. I've pasted a patch below which
> >>>
> >>> 1. adds extra fields to the output of "mfiutil show battery" showing BBU
> >>> properties. This is essentially the output of
> >>>
> >>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL
> >>>
> >>> and consists of info about battery learning: the learn period, the
> >>> time at which the controller will start the next relearn, and the BBU
> >>> mode (which indicates whether the battery supports transparent
> >>> relearning).
> >>>
> >>> 2. adds a couple of subcommands under "mfiutil bbu" which lets users set
> >>> the BBU properties which can be set by MegaCli.
> >>>
> >>> 3. adds a command "mfiutil start learn" which immediately kicks off a
> >>> battery relearn.
> >>>
> >>> These changes grew out of concern about the fact that the controller
> >>> write cache is set to write-through mode during a relearn period (which
> >>> usually lasts for several hours). This ended up causing some mysterious
> >>> and intermittent performance issues, so I needed a way of getting more
> >>> info about what was going on (using MegaCli isn't really an option for
> >>> several reasons). Some BBUs support transparent relearning, which
> >>> basically means that the controller write cache doesn't get turned off
> >>> during a relearn. However, LSI's default config doesn't enable it, and
> >>> now mfiutil can be used to do that (through "mfiutil bbu bbu-mode").
> >>>
> >>> I was hoping someone would be able to review the patch. If anyone's able
> >>> and willing to test it, I'd very much appreciate feedback from that.
> >>>
> >>> Thanks!
> >>> -Mark
> >>
> >>
> >> Just to document for the record. Finally got around to testing this
> >> today with Mark providing updates. Looks good overall with a couple of
> >> nits that he is handling at the moment (man page and variable name
> >> collision).
> >
> >
> > The updated patch is here:
> > http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff
> >
> > I'll commit it in a few days if there aren't any problems.
> >
> > Thanks,
> > -Mark
> > _______________________________________________
> > freebsd-scsi@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
> >
> >
> >



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131106230356.GA86666>