Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Dec 2014 21:37:15 +0000
From:      Steven Hartland <killing@multiplay.co.uk>
To:        George Kontostanos <gkontos.mail@gmail.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: LSI SAS 9300-8i weird ZFS checksum errors
Message-ID:  <549C838B.1070302@multiplay.co.uk>
In-Reply-To: <CA%2BdUSyrkfp%2Bgz1zqCJJWo=VjMuEJf6A4vEmOpqzu7L-sAU9U%2Bg@mail.gmail.com>
References:  <CA%2BdUSyo56ioZC4Kn4XTcf_GgeSsQrtd7FYpCxjsqOxQ5ON-_CA@mail.gmail.com>	<549C65FF.4010702@multiplay.co.uk> <CA%2BdUSyrkfp%2Bgz1zqCJJWo=VjMuEJf6A4vEmOpqzu7L-sAU9U%2Bg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 25/12/2014 21:03, George Kontostanos wrote:
>
>
> On Thu, Dec 25, 2014 at 9:31 PM, Steven Hartland 
> <killing@multiplay.co.uk <mailto:killing@multiplay.co.uk>> wrote:
>
>
>     On 25/12/2014 14:39, George Kontostanos wrote:
>
>         Hello, list and Merry Christmas to all
>
>         I am facing some weird checksum errors during scrub. The
>         configuration is
>         the following:
>
>         Board:        Supermicro Motherboard X10DRi-T4+ (
>         http://www.supermicro.com/products/motherboard/xeon/c600/x10dri-t4_.cfm)
>         Controller:  LSI SAS 9300-8i (
>         http://www.lsi.com/products/host-bus-adapters/pages/lsi-sas-9300-8i.aspx)
>         HDD:         21X6TB Western Digital WD60EFRX
>         HDD:         2XIntel SATA 600GB Solid-State Drive
>         SSDSC2BB600G401 DC S3500
>         (SWAP, ZIL, CACHE)
>         Chassis:    Supermicro 847BE1C-R1K28LPB 4U Storage Chassis
>         RAM:         64 GB
>
>         I installed initially FreeBSD 10.1-RELEASE created one pool
>         consistent by 3
>         X7disk VDEVs in RAIDZ3. I used NFS to start copying some data.
>         After
>         copying around 3TB I initiated a scrub.
>         The result was the following: http://pastebin.com/rswgCY2A and
>         http://pastebin.com/DQ2urGXk
>
>         I tried to flash the controller but the LSI utility did not
>         recognize the
>         controller. I installed FreeBSD 9.3-RELEASE and used LSI's
>         mpslsi3 driver.
>         I was able to flash the latest bios and firmware that way.
>
>         LSI Corporation SAS3 Flash Utility
>         Version 07.00.00.00 (2014.08.14)
>         Copyright (c) 2008-2014 LSI Corporation. All rights reserved
>
>         Adapter Selected is a LSI SAS: SAS3008(C0)
>
>         Controller Number              : 0
>         Controller                     : SAS3008(C0)
>         PCI Address                    : 00:82:00:00
>         SAS Address                    : 500605b-0-06ce-27e0
>         NVDATA Version (Default)       : 06.03.00.05
>         NVDATA Version (Persistent)    : 06.03.00.05
>         Firmware Product ID            : 0x2221 (IT)
>         Firmware Version               : 06.00.00.00
>         NVDATA Vendor                  : LSI
>         NVDATA Product ID              : SAS9300-8i
>         BIOS Version                   : 08.13.00.00
>         UEFI BSD Version               : 02.00.00.00
>         FCODE Version                  : N/A
>         Board Name                     : SAS9300-8i
>         Board Assembly                 : H3-25573-00E
>         Board Tracer Number            : SV32928040
>
>         I recreated the pool again and started writing data via NFS
>         again. After 3
>         TB of data I started a scrub and I am still getting checksum
>         errors though
>         there are no messages regarding the drives anymore in
>         /var/log/messages
>
>            pool: Pool
>           state: ONLINE
>         status: One or more devices has experienced an unrecoverable
>         error.  An
>         attempt was made to correct the error.  Applications are
>         unaffected.
>         action: Determine if the device needs to be replaced, and
>         clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>             see: http://illumos.org/msg/ZFS-8000-9P
>
>            scan: scrub in progress since Thu Dec 25 08:46:21 2014
>                  2.28T scanned out of 5.54T at 816M/s, 1h9m to go
>                  11.9M repaired, 41.26% done
>         config:
>
>         NAME                     STATE     READ WRITE CKSUM
>         Pool                     ONLINE       0     0     0
>            raidz3-0               ONLINE       0     0     0
>              gpt/WD-WX41D94RN5A3  ONLINE       0     0 15  (repairing)
>              gpt/WD-WX41D948YE1U  ONLINE       0     0 14  (repairing)
>              gpt/WD-WX41D94RN879  ONLINE       0     0 16  (repairing)
>              gpt/WD-WX21D947NC83  ONLINE       0     0 24  (repairing)
>              gpt/WD-WX21D947NT77  ONLINE       0     0 15  (repairing)
>              gpt/WD-WX41D948YAKV  ONLINE       0     0 19  (repairing)
>              gpt/WD-WX21D9421SCV  ONLINE       0     0 20  (repairing)
>            raidz3-1               ONLINE       0     0     0
>              gpt/WD-WX21D9421F6F  ONLINE       0     0 16  (repairing)
>              gpt/WD-WX41D948YPN4  ONLINE       0     0 14  (repairing)
>              gpt/WD-WX21D947NE2K  ONLINE       0     0 22  (repairing)
>              gpt/WD-WX41D948Y2PX  ONLINE       0     0 19  (repairing)
>              gpt/WD-WX41D94RNAX7  ONLINE       0     0 17  (repairing)
>              gpt/WD-WX21D947N1RP  ONLINE       0     0 12  (repairing)
>              gpt/WD-WX21D94216X7  ONLINE       0     0 20  (repairing)
>            raidz3-2               ONLINE       0     0     0
>              gpt/WD-WX41D948YAHP  ONLINE       0     0 25  (repairing)
>              gpt/WD-WX21D947N06F  ONLINE       0     0 18  (repairing)
>              gpt/WD-WX21D947N3T1  ONLINE       0     0 21  (repairing)
>              gpt/WD-WX41D94RNT7D  ONLINE       0     0  5  (repairing)
>              gpt/WD-WX41D948Y9VV  ONLINE       0     0 18  (repairing)
>              gpt/WD-WX41D94RNS62  ONLINE       0     0 24  (repairing)
>              gpt/WD-WX21D9421ZP9  ONLINE       0     0 28  (repairing)
>         logs
>            mirror-3               ONLINE       0     0     0
>              gpt/zil0             ONLINE       0     0     0
>              gpt/zil1             ONLINE       0     0     0
>         cache
>            gpt/cache0             ONLINE       0     0     0
>            gpt/cache1             ONLINE       0     0     0
>
>         errors: No known data errors
>
>         This is really driving me crazy since smartmon tools do not
>         display any
>         errors on the drives.
>
>         Any suggestions are most welcomed!!!
>
>     Check for bad hardware, first guess would be memory, next would be
>     hotswap backplane.
>
>         Regards
>         Steve
>     _______________________________________________
>     freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>     http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     To unsubscribe, send any mail to
>     "freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>
>
> Hi Steve,
>
> Memory looks good in memtest. I am not sure what you mean 
> regarding hotswap backplane.
How are the disks attached?

The most common way is your controller being attached to a hotswap 
backplane, which you then plug the disks into.

Unfortunately these backplanes are one of the most common sources of 
issues, especially at higher speeds and even more so if they aren't 
direct passthrough i.e. they are actually expanders which processing of 
their own.

You report the chassis is a 847BE1C-R1K28LPB which includes such 
expanders, specifically BPN-SAS3-846EL1 and BPN-SAS3-826EL1.

If this is how you are connecting the disk I would strongly advise 
eliminating this from the equation by connecting the disks direct to the 
LSI controller.

You can also check to see if there are any firmware updates for the 
expanders.

     Regards
     Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?549C838B.1070302>