From owner-freebsd-current@freebsd.org Fri Mar 22 09:06:21 2019 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5FF81534A24 for ; Fri, 22 Mar 2019 09:06:21 +0000 (UTC) (envelope-from beorn@binaries.fr) Received: from www.binaries.fr (www.binaries.fr [212.83.176.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 847A8710D7 for ; Fri, 22 Mar 2019 09:06:20 +0000 (UTC) (envelope-from beorn@binaries.fr) To: freebsd-current@freebsd.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=binaries.fr; s=mail; t=1553245571; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=ajjKswJ3ooJQMJbB5ZtIVUCq8bB7E3qDUdpS2ipJET8=; b=ALDeoyohcI2UAyMuaz3woZH3LEmqxRM6jlD5tGV6odUQZVvq01LVEJChmoOHj73NxVYL7P lTwGTyW2AmeSaFHQ3yDknkmGwXxSAWs+gYlS30YqtqQqZD9SCffXQrTsPDghe7aqMxTsFu QSJSLrmLox/I50wh8Usb85DpazGTQIC4C0SgaE5aKCbX2zW+vMSJwTloGuQcnW6KUzWuBu OJDmps9/X93XK2IYJrB5gLGaXs/0d/GwwiT7lDX9ULe27UGhSdRXEzVeEnjIFAzIICpBrU 7jI3FotpzcPmPUVOJlX+v9RG95CPH8er6WgGRWlNia7ssrkxJG37p1PNsbUuPA== From: "Aurelien \"beorn\" ROUGEMONT" Subject: lsi Message-ID: Date: Fri, 22 Mar 2019 10:06:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US X-Rspamd-Queue-Id: 847A8710D7 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=binaries.fr header.s=mail header.b=ALDeoyoh; spf=pass (mx1.freebsd.org: domain of beorn@binaries.fr designates 212.83.176.254 as permitted sender) smtp.mailfrom=beorn@binaries.fr X-Spamd-Result: default: False [0.16 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.05)[-0.048,0]; R_DKIM_ALLOW(-0.20)[binaries.fr:s=mail]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[binaries.fr]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-0.77)[-0.765,0]; NEURAL_SPAM_SHORT(0.93)[0.930,0]; DKIM_TRACE(0.00)[binaries.fr:+]; MX_GOOD(-0.01)[mail.binaries.fr]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(0.56)[ipnet: 212.83.160.0/19(3.09), asn: 12876(-0.29), country: FR(-0.01)]; ASN(0.00)[asn:12876, ipnet:212.83.160.0/19, country:FR]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2019 09:06:22 -0000 Hi the list, I have been using FreeBSD at home and in production for years and today i stumbled upon a question i could not answer. Context ----------------------------------------- I'm building a backup server on a server with this HBA : 3:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)     Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9271-8i     Flags: bus master, fast devsel, latency 0, IRQ 34     I/O ports at e000     Memory at fb160000 (64-bit, non-prefetchable)     Memory at fb100000 (64-bit, non-prefetchable)     Expansion ROM at fb140000 [disabled]     Capabilities: [50] Power Management version 3     Capabilities: [68] Express Endpoint, MSI 00     Capabilities: [d0] Vital Product Data     Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+     Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-     Capabilities: [100] Advanced Error Reporting     Capabilities: [1e0] Secondary PCI Express     Capabilities: [1c0] Power Budgeting     Capabilities: [190] Dynamic Power Allocation     Capabilities: [148] Alternative Routing-ID Interpretation (ARI) After pushing the server I/Os to its limits the server had a very nasty  crash. It happens very seldomly, in roughly 10 years among the petabytes of storage servers i kept running it always was hardware or driver/firmware related. |Shortening read at 4292967280 from 16 to 15 ZFS: i/o error - all block copies unavailable ZFS: can't read object set for dataset 52 ZFS: can't open root filesystem gptzfsboot: failed to mount default pool zroot| After simply reinstalling (for nothing) the bootloaders, checking the partition tables, i went digging a lot in the FreeBSD codebase. I found that it was a ZFS problem. The nasty crash was indeed due to ZFS  data corruption. Hence the checksum errors while scrubing the zpool on a rescue network boot image :   pool: zroot                                                                                                                                                                                                        state: ONLINE                                                                     status: One or more devices has experienced an unrecoverable error.  An                    attempt was made to correct the error.  Applications are unaffected.       action: Determine if the device needs to be replaced, and clear the errors                 using 'zpool clear' or replace the device with 'zpool replace'.               see: http://illumos.org/msg/ZFS-8000-9P                                           scan: scrub in progress since Fri Mar 15 15:15:25 2019                                   49.6G scanned out of 1.65T at 109M/s, 4h15m to go                                  677M repaired, 2.94% done                                                  config:                                                                                    NAME              STATE     READ WRITE CKSUM                                       zroot             ONLINE       0     0     0                                         raidz2-0        ONLINE       0     0     0                                           mfisyspd0p3   ONLINE       0     0 5.44K  (repairing)                              mfisyspd1p3   ONLINE       0     0 4.76K  (repairing)                              mfisyspd10p3  ONLINE       0     0 4.35K  (repairing)                              mfisyspd11p3  ONLINE       0     0 5.17K  (repairing)                              mfisyspd2p3   ONLINE       0     0 4.76K  (repairing)                              mfisyspd3p3   ONLINE       0     0 4.24K  (repairing)                              mfisyspd4p3   ONLINE       0     0 4.75K  (repairing)                              mfisyspd5p3   ONLINE       0     0 5.20K  (repairing)                              mfisyspd6p3   ONLINE       0     0 4.51K  (repairing)                              mfisyspd7p3   ONLINE       0     0 4.65K  (repairing)                              mfisyspd8p3   ONLINE       0     0 4.70K  (repairing)                              mfisyspd9p3   ONLINE       0     0 3.81K  (repairing)   At this point the server was still unable to reboot. I've had to force data re-copy with a dumb : mv /boot{,.dist} cp -pr /boot{.dist} Which turned out to be fine. Going further i finally killed for good the zpool. It took me some time and i stumbled upon the mfi(4) and the mrsas(4) man pages and code.      The mfi driver supports the following hardware:      o   LSI MegaRAID SAS 1078      o   LSI MegaRAID SAS 8408E      o   LSI MegaRAID SAS 8480E      o   LSI MegaRAID SAS 9240      o   LSI MegaRAID SAS 9260      o   Dell PERC5      o   Dell PERC6      o   IBM ServeRAID M1015 SAS/SATA      o   IBM ServeRAID M1115 SAS/SATA      o   IBM ServeRAID M5015 SAS/SATA      o   IBM ServeRAID M5110 SAS/SATA      o   IBM ServeRAID-MR10i      o   Intel RAID Controller SRCSAS18E      o   Intel RAID Controller SROMBSAS18E      The mrsas driver supports the following hardware:      [ Thunderbolt 6Gb/s MR controller ]      o   LSI MegaRAID SAS 9265      o   LSI MegaRAID SAS 9266      o   LSI MegaRAID SAS 9267      o   LSI MegaRAID SAS 9270      o   LSI MegaRAID SAS 9271      o   LSI MegaRAID SAS 9272      o   LSI MegaRAID SAS 9285      o   LSI MegaRAID SAS 9286      o   DELL PERC H810      o   DELL PERC H710/P There was a detectoin priority problem hw.mfi.mrsas_enable=1