Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Feb 2016 16:00:02 -0800
From:      Doug Ambrisko <ambrisko@ambrisko.com>
To:        Tinker <tinkr@openmailbox.org>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: MRSAS driver/LSI MegaRaid 92XX-93XX admin question: When one of the Raid's physical drives break, how is it reported in the logs?
Message-ID:  <20160217000002.GA81916@ambrisko.com>
In-Reply-To: <55de137d1ed81930cfdbee579d881d62@openmailbox.org>
References:  <6a648d421b6d611b4f6f411b66303017@openmailbox.org> <55de137d1ed81930cfdbee579d881d62@openmailbox.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Feb 14, 2016 at 10:13:31PM +0700, Tinker wrote:
| (Will send any followup from now only to freebsd-scsi@ .)
| 
| Did some additional research and found that the disk failure indeed is 
| reported in MRSAS' "event log".
| 
| So my final question then is, how do you extract it into userland (in 
| the absence of an "mfiutil" as the MFI driver has)?

I have local changes to print the event log in dmesg which gets sysloged.
We then watch syslog for issues to report things to our customers
automatically.  This is similar to mfi(4).

Thanks,

Doug A.
| Details below. Thanks.
| 
| On 2016-02-14 19:59, Tinker wrote:
| [...]
| > http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/3rd-party/lsi/mrsas/userguide/LSI_MR_SAS_SW_UG.pdf
| > on page 305, that is section "A.2 Event Messages" - I don't know for
| > what LGI chip this document is, but, it does not list particular event
| > message very clearly for when an individual underlying disk would have
| > broken, I don't even see any event for when a hot spare would be taken
| > in use!
| 
| 
| Wait - this page:
| 
| https://www.schirmacher.de/display/Linux/Replace+failed+disk+in+MegaRAID+array
| 
| (and also 
| http://serverfault.com/questions/485147/drive-is-failing-but-lsi-megaraid-controller-does-not-detect-it 
| )
| 
| gives an example of how the host system learns about broken disks:
| 
| 
| Code: 0x00000051 .. Event Description: State change on VD 00/1 from 
| OPTIMAL(3) to DEGRADED(2)
| 
| 
| Code: 0x00000072 .. Event Description: State change on PD 05(e0xfc/s0) 
| from ONLINE(18) to FAILED(11)
| 
| (unclean disk broken seems to be shown as:)
| 
| Code: 0x00000071 .. Event Description: Unexpected sense: PD 05(e0xfc/s0) 
| Path 4433221103000000, CDB: 2e 00 3a 38 1b c7 00 00 01 00, Sense: 
| b/00/00
| 
| 
| And this version of the LSI documentation
| 
| http://hwraid.le-vert.net/raw-attachment/wiki/LSIMegaRAIDSAS/megacli_user_guide.pdf
| 
| gives a clearer definition of the physical and virtual drive states in 
| "1.4.16 Physical Drive States"
| and "1.4.17 Virtual Disk States" on pages 1-11 to 1-12.
| 
| So as we see, a physical drive breaking would
| 
|   * "FAILED" the physical drive
| 
|   * "DEGRADED" the Virtual Drive (that is the logical exported drive) 
| (from "OPTIMAL")
| 
| 
| So then, it was indeed the card's "event log" that contains this info.
| 
| 
| 
| Last question then would only be then, *where* FreeBSD's MRSAS driver 
| sends its event log?
| 
| 
| 
| _______________________________________________
| freebsd-stable@freebsd.org mailing list
| https://lists.freebsd.org/mailman/listinfo/freebsd-stable
| To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160217000002.GA81916>