Date: Fri, 22 Mar 2019 10:06:11 +0100 From: "Aurelien \"beorn\" ROUGEMONT" <beorn@binaries.fr> To: freebsd-current@freebsd.org Subject: lsi Message-ID: <b78c6384-607f-6742-1be6-5c0dfa801320@binaries.fr>
index | next in thread | raw e-mail
Hi the list,
I have been using FreeBSD at home and in production for years and today
i stumbled upon a question i could not answer.
Context
-----------------------------------------
I'm building a backup server on a server with this HBA :
3:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)
   Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9271-8i
   Flags: bus master, fast devsel, latency 0, IRQ 34
   I/O ports at e000
   Memory at fb160000 (64-bit, non-prefetchable)
   Memory at fb100000 (64-bit, non-prefetchable)
   Expansion ROM at fb140000 [disabled]
   Capabilities: [50] Power Management version 3
   Capabilities: [68] Express Endpoint, MSI 00
   Capabilities: [d0] Vital Product Data
   Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
   Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
   Capabilities: [100] Advanced Error Reporting
   Capabilities: [1e0] Secondary PCI Express <?>
   Capabilities: [1c0] Power Budgeting <?>
   Capabilities: [190] Dynamic Power Allocation <?>
   Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
After pushing the server I/Os to its limits the server had a very nastyÂ
crash.
It happens very seldomly, in roughly 10 years among the petabytes of
storage servers i kept running it always was hardware or driver/firmware
related.
|Shortening read at 4292967280 from 16 to 15 ZFS: i/o error - all
block copies unavailable ZFS: can't read object set for dataset 52
ZFS: can't open root filesystem gptzfsboot: failed to mount default
pool zroot|
After simply reinstalling (for nothing) the bootloaders, checking the
partition tables, i went digging a lot in the FreeBSD codebase. I found
that it was a ZFS problem.
The nasty crash was indeed due to ZFSÂ data corruption. Hence the
checksum errors while scrubing the zpool on a rescue network boot image :
 pool: zroot                                                                                                                                                                                                     Â
 state: ONLINE                                                                   Â
status: One or more devices has experienced an unrecoverable error. An          Â
       attempt was made to correct the error. Applications are unaffected.     Â
action: Determine if the device needs to be replaced, and clear the errors       Â
       using 'zpool clear' or replace the device with 'zpool replace'.          Â
  see: http://illumos.org/msg/ZFS-8000-9P                                       Â
 scan: scrub in progress since Fri Mar 15 15:15:25 2019                         Â
       49.6G scanned out of 1.65T at 109M/s, 4h15m to go                        Â
       677M repaired, 2.94% done                                                Â
config:Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
       NAME             STATE    READ WRITE CKSUM                             Â
       zroot            ONLINE      0    0    0                             Â
         raidz2-0       ONLINE      0    0    0                             Â
           mfisyspd0p3  ONLINE      0    0 5.44K (repairing)                Â
           mfisyspd1p3  ONLINE      0    0 4.76K (repairing)                Â
           mfisyspd10p3 ONLINE      0    0 4.35K (repairing)                Â
           mfisyspd11p3 ONLINE      0    0 5.17K (repairing)                Â
           mfisyspd2p3  ONLINE      0    0 4.76K (repairing)                Â
           mfisyspd3p3  ONLINE      0    0 4.24K (repairing)                Â
           mfisyspd4p3  ONLINE      0    0 4.75K (repairing)                Â
           mfisyspd5p3  ONLINE      0    0 5.20K (repairing)                Â
           mfisyspd6p3  ONLINE      0    0 4.51K (repairing)                Â
           mfisyspd7p3  ONLINE      0    0 4.65K (repairing)                Â
           mfisyspd8p3  ONLINE      0    0 4.70K (repairing)                Â
           mfisyspd9p3  ONLINE      0    0 3.81K (repairing) Â
At this point the server was still unable to reboot. I've had to force
data re-copy with a dumb :
mv /boot{,.dist}
cp -pr /boot{.dist}
Which turned out to be fine.
Going further i finally killed for good the zpool. It took me some time
and i stumbled upon the mfi(4) and the mrsas(4) man pages and code.
    The mfi driver supports the following hardware:
    o  LSI MegaRAID SAS 1078
    o  LSI MegaRAID SAS 8408E
    o  LSI MegaRAID SAS 8480E
    o  LSI MegaRAID SAS 9240
    o  LSI MegaRAID SAS 9260
    o  Dell PERC5
    o  Dell PERC6
    o  IBM ServeRAID M1015 SAS/SATA
    o  IBM ServeRAID M1115 SAS/SATA
    o  IBM ServeRAID M5015 SAS/SATA
    o  IBM ServeRAID M5110 SAS/SATA
    o  IBM ServeRAID-MR10i
    o  Intel RAID Controller SRCSAS18E
    o  Intel RAID Controller SROMBSAS18E
    The mrsas driver supports the following hardware:
    [ Thunderbolt 6Gb/s MR controller ]
    o  LSI MegaRAID SAS 9265
    o  LSI MegaRAID SAS 9266
    o  LSI MegaRAID SAS 9267
    o  LSI MegaRAID SAS 9270
    o  LSI MegaRAID SAS 9271
    o  LSI MegaRAID SAS 9272
    o  LSI MegaRAID SAS 9285
    o  LSI MegaRAID SAS 9286
    o  DELL PERC H810
    o  DELL PERC H710/P
There was a detectoin priority problem
hw.mfi.mrsas_enable=1
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b78c6384-607f-6742-1be6-5c0dfa801320>
