Date: Wed, 17 Feb 2016 14:38:10 +0700 From: Tinker <tinkr@openmailbox.org> To: Doug Ambrisko <ambrisko@ambrisko.com> Cc: freebsd-scsi@freebsd.org Subject: Re: MRSAS driver/LSI MegaRaid 92XX-93XX admin question: When one of the Raid's physical drives break, how is it reported in the =?UTF-8?Q?logs=3F?= Message-ID: <fceaf3867796102969153dea4a4cbbde@openmailbox.org> In-Reply-To: <20160217000002.GA81916@ambrisko.com> References: <6a648d421b6d611b4f6f411b66303017@openmailbox.org> <55de137d1ed81930cfdbee579d881d62@openmailbox.org> <20160217000002.GA81916@ambrisko.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Doug, Would you mind sharing your kernel patch for that functionality (if I understand you right, you patched your kernel to channelize the events to the dmesg)? Thanks, Tinker On 2016-02-17 07:00, Doug Ambrisko wrote: > On Sun, Feb 14, 2016 at 10:13:31PM +0700, Tinker wrote: > | (Will send any followup from now only to freebsd-scsi@ .) > | > | Did some additional research and found that the disk failure indeed > is > | reported in MRSAS' "event log". > | > | So my final question then is, how do you extract it into userland (in > | the absence of an "mfiutil" as the MFI driver has)? > > I have local changes to print the event log in dmesg which gets > sysloged. > We then watch syslog for issues to report things to our customers > automatically. This is similar to mfi(4). > > Thanks, > > Doug A. > | Details below. Thanks. > | > | On 2016-02-14 19:59, Tinker wrote: > | [...] > | > > http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/3rd-party/lsi/mrsas/userguide/LSI_MR_SAS_SW_UG.pdf > | > on page 305, that is section "A.2 Event Messages" - I don't know > for > | > what LGI chip this document is, but, it does not list particular > event > | > message very clearly for when an individual underlying disk would > have > | > broken, I don't even see any event for when a hot spare would be > taken > | > in use! > | > | > | Wait - this page: > | > | > https://www.schirmacher.de/display/Linux/Replace+failed+disk+in+MegaRAID+array > | > | (and also > | > http://serverfault.com/questions/485147/drive-is-failing-but-lsi-megaraid-controller-does-not-detect-it > | ) > | > | gives an example of how the host system learns about broken disks: > | > | > | Code: 0x00000051 .. Event Description: State change on VD 00/1 from > | OPTIMAL(3) to DEGRADED(2) > | > | > | Code: 0x00000072 .. Event Description: State change on PD > 05(e0xfc/s0) > | from ONLINE(18) to FAILED(11) > | > | (unclean disk broken seems to be shown as:) > | > | Code: 0x00000071 .. Event Description: Unexpected sense: PD > 05(e0xfc/s0) > | Path 4433221103000000, CDB: 2e 00 3a 38 1b c7 00 00 01 00, Sense: > | b/00/00 > | > | > | And this version of the LSI documentation > | > | > http://hwraid.le-vert.net/raw-attachment/wiki/LSIMegaRAIDSAS/megacli_user_guide.pdf > | > | gives a clearer definition of the physical and virtual drive states > in > | "1.4.16 Physical Drive States" > | and "1.4.17 Virtual Disk States" on pages 1-11 to 1-12. > | > | So as we see, a physical drive breaking would > | > | * "FAILED" the physical drive > | > | * "DEGRADED" the Virtual Drive (that is the logical exported drive) > | (from "OPTIMAL") > | > | > | So then, it was indeed the card's "event log" that contains this > info. > | > | > | > | Last question then would only be then, *where* FreeBSD's MRSAS driver > | sends its event log? > | > | > | > | _______________________________________________ > | freebsd-stable@freebsd.org mailing list > | https://lists.freebsd.org/mailman/listinfo/freebsd-stable > | To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fceaf3867796102969153dea4a4cbbde>