From owner-freebsd-current@freebsd.org Fri Mar 22 09:12:05 2019 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 187C11534FAD for ; Fri, 22 Mar 2019 09:12:05 +0000 (UTC) (envelope-from beorn@binaries.fr) Received: from www.binaries.fr (www.binaries.fr [212.83.176.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3922371647 for ; Fri, 22 Mar 2019 09:12:04 +0000 (UTC) (envelope-from beorn@binaries.fr) Subject: Re: lsi DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=binaries.fr; s=mail; t=1553245922; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gwTNg7Cb1/FSerMJLH/Qc5m5qP7V3kKDtq9BzGPkMEo=; b=PV4veRzjdymaUcZSdZg7uhEH1WAxCjBNUdDYy4PLhHBvUR1Zq5yYEZquezCA/8zInpd6Hy n9zShK55HZkZ1bSzO4BJXRJSAmZme3iDtyuG2UM9xgSrh3ymzPL+xFv/xZMlFjVVdjzJwh W9q/3aX0vSBqsbcGr0pzoDfvEqTgXwovGGg82h9K+v/1HD/SAH1brMUhz/NCQ3Kfgo5JP6 Dc2Cx9NqmwYM8PymHlb6wsIzoWZFeCkpjf1g7nASLhTu1aYiPtbdlBvFfLBzK04vO5Iblr tQ/OdLarR6KOfHC/agoZz2TYr0OVbGcU6Fr1NppBwLqHEwwS96/w3M9hPjkIbQ== To: freebsd-current@freebsd.org References: From: "Aurelien \"beorn\" ROUGEMONT" Message-ID: <27f18d66-d3f2-3e33-56d0-e9a1ddb37e1c@binaries.fr> Date: Fri, 22 Mar 2019 10:12:02 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US X-Rspamd-Queue-Id: 3922371647 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=binaries.fr header.s=mail header.b=PV4veRzj; spf=pass (mx1.freebsd.org: domain of beorn@binaries.fr designates 212.83.176.254 as permitted sender) smtp.mailfrom=beorn@binaries.fr X-Spamd-Result: default: False [-0.68 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.07)[-0.072,0]; R_DKIM_ALLOW(-0.20)[binaries.fr:s=mail]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[binaries.fr]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-0.77)[-0.770,0]; NEURAL_SPAM_SHORT(0.13)[0.126,0]; DKIM_TRACE(0.00)[binaries.fr:+]; MX_GOOD(-0.01)[cached: mail.binaries.fr]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(0.55)[ipnet: 212.83.160.0/19(3.05), asn: 12876(-0.29), country: FR(-0.01)]; ASN(0.00)[asn:12876, ipnet:212.83.160.0/19, country:FR]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2019 09:12:05 -0000 On 3/22/19 10:06 AM, Aurelien "beorn" ROUGEMONT wrote: > Hi the list, > > I have been using FreeBSD at home and in production for years and today > i stumbled upon a question i could not answer. > > > Context > > ----------------------------------------- > > I'm building a backup server on a server with this HBA : > > 3:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05) >     Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9271-8i >     Flags: bus master, fast devsel, latency 0, IRQ 34 >     I/O ports at e000 >     Memory at fb160000 (64-bit, non-prefetchable) >     Memory at fb100000 (64-bit, non-prefetchable) >     Expansion ROM at fb140000 [disabled] >     Capabilities: [50] Power Management version 3 >     Capabilities: [68] Express Endpoint, MSI 00 >     Capabilities: [d0] Vital Product Data >     Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+ >     Capabilities: [c0] MSI-X: Enable+ Count=16 Masked- >     Capabilities: [100] Advanced Error Reporting >     Capabilities: [1e0] Secondary PCI Express >     Capabilities: [1c0] Power Budgeting >     Capabilities: [190] Dynamic Power Allocation >     Capabilities: [148] Alternative Routing-ID Interpretation (ARI) > > After pushing the server I/Os to its limits the server had a very nasty  > crash. > > It happens very seldomly, in roughly 10 years among the petabytes of > storage servers i kept running it always was hardware or driver/firmware > related. > > |Shortening read at 4292967280 from 16 to 15 ZFS: i/o error - all > block copies unavailable ZFS: can't read object set for dataset 52 > ZFS: can't open root filesystem gptzfsboot: failed to mount default > pool zroot| > > After simply reinstalling (for nothing) the bootloaders, checking the > partition tables, i went digging a lot in the FreeBSD codebase. I found > that it was a ZFS problem. > > The nasty crash was indeed due to ZFS  data corruption. Hence the > checksum errors while scrubing the zpool on a rescue network boot image : > >   pool: zroot                                                                                                                                                                                                       >  state: ONLINE                                                                     > status: One or more devices has experienced an unrecoverable error.  An            >         attempt was made to correct the error.  Applications are unaffected.       > action: Determine if the device needs to be replaced, and clear the errors         >         using 'zpool clear' or replace the device with 'zpool replace'.            >    see: http://illumos.org/msg/ZFS-8000-9P                                         >   scan: scrub in progress since Fri Mar 15 15:15:25 2019                           >         49.6G scanned out of 1.65T at 109M/s, 4h15m to go                          >         677M repaired, 2.94% done                                                  > config:                                                                            >         NAME              STATE     READ WRITE CKSUM                               >         zroot             ONLINE       0     0     0                               >           raidz2-0        ONLINE       0     0     0                               >             mfisyspd0p3   ONLINE       0     0 5.44K  (repairing)                  >             mfisyspd1p3   ONLINE       0     0 4.76K  (repairing)                  >             mfisyspd10p3  ONLINE       0     0 4.35K  (repairing)                  >             mfisyspd11p3  ONLINE       0     0 5.17K  (repairing)                  >             mfisyspd2p3   ONLINE       0     0 4.76K  (repairing)                  >             mfisyspd3p3   ONLINE       0     0 4.24K  (repairing)                  >             mfisyspd4p3   ONLINE       0     0 4.75K  (repairing)                  >             mfisyspd5p3   ONLINE       0     0 5.20K  (repairing)                  >             mfisyspd6p3   ONLINE       0     0 4.51K  (repairing)                  >             mfisyspd7p3   ONLINE       0     0 4.65K  (repairing)                  >             mfisyspd8p3   ONLINE       0     0 4.70K  (repairing)                  >             mfisyspd9p3   ONLINE       0     0 3.81K  (repairing)   > > At this point the server was still unable to reboot. I've had to force > data re-copy with a dumb : > > mv /boot{,.dist} > > cp -pr /boot{.dist} > > Which turned out to be fine. > > Going further i finally killed for good the zpool. It took me some time > and i stumbled upon the mfi(4) and the mrsas(4) man pages and code. > >      The mfi driver supports the following hardware: > >      o   LSI MegaRAID SAS 1078 > >      o   LSI MegaRAID SAS 8408E > >      o   LSI MegaRAID SAS 8480E > >      o   LSI MegaRAID SAS 9240 > >      o   LSI MegaRAID SAS 9260 > >      o   Dell PERC5 > >      o   Dell PERC6 > >      o   IBM ServeRAID M1015 SAS/SATA > >      o   IBM ServeRAID M1115 SAS/SATA > >      o   IBM ServeRAID M5015 SAS/SATA > >      o   IBM ServeRAID M5110 SAS/SATA > >      o   IBM ServeRAID-MR10i > >      o   Intel RAID Controller SRCSAS18E > >      o   Intel RAID Controller SROMBSAS18E > > >      The mrsas driver supports the following hardware: > >      [ Thunderbolt 6Gb/s MR controller ] > >      o   LSI MegaRAID SAS 9265 > >      o   LSI MegaRAID SAS 9266 > >      o   LSI MegaRAID SAS 9267 > >      o   LSI MegaRAID SAS 9270 > >      o   LSI MegaRAID SAS 9271 > >      o   LSI MegaRAID SAS 9272 > >      o   LSI MegaRAID SAS 9285 > >      o   LSI MegaRAID SAS 9286 > >      o   DELL PERC H810 > >      o   DELL PERC H710/P > There was a detection priority problem mfi wins for the wrong HBA. The fix was to add  hw.mfi.mrsas_enable=1 in /boot/loader.conf After this the server behaved correctly. Should it be fixed for everyone ? NB: sorry my last email was mistakenly sent unfinished