Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 26 Dec 2014 12:21:13 +0200
From:      George Kontostanos <gkontos.mail@gmail.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: LSI SAS 9300-8i weird ZFS checksum errors
Message-ID:  <CA%2BdUSypSqqMH3pmwLz9YRfBfNsGb5XpLW03FTYWRgSO-qHbneQ@mail.gmail.com>
In-Reply-To: <549C838B.1070302@multiplay.co.uk>
References:  <CA%2BdUSyo56ioZC4Kn4XTcf_GgeSsQrtd7FYpCxjsqOxQ5ON-_CA@mail.gmail.com> <549C65FF.4010702@multiplay.co.uk> <CA%2BdUSyrkfp%2Bgz1zqCJJWo=VjMuEJf6A4vEmOpqzu7L-sAU9U%2Bg@mail.gmail.com> <549C838B.1070302@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 25, 2014 at 11:37 PM, Steven Hartland <killing@multiplay.co.uk>
wrote:

>
> On 25/12/2014 21:03, George Kontostanos wrote:
>
>
>
> On Thu, Dec 25, 2014 at 9:31 PM, Steven Hartland <killing@multiplay.co.uk>
> wrote:
>
>>
>> On 25/12/2014 14:39, George Kontostanos wrote:
>>
>>> Hello, list and Merry Christmas to all
>>>
>>> I am facing some weird checksum errors during scrub. The configuration is
>>> the following:
>>>
>>> Board:        Supermicro Motherboard X10DRi-T4+ (
>>> http://www.supermicro.com/products/motherboard/xeon/c600/x10dri-t4_.cfm)
>>> Controller:  LSI SAS 9300-8i (
>>> http://www.lsi.com/products/host-bus-adapters/pages/lsi-sas-9300-8i.aspx
>>> )
>>> HDD:         21X6TB Western Digital WD60EFRX
>>> HDD:         2XIntel SATA 600GB Solid-State Drive SSDSC2BB600G401 DC
>>> S3500
>>> (SWAP, ZIL, CACHE)
>>> Chassis:    Supermicro 847BE1C-R1K28LPB 4U Storage Chassis
>>> RAM:         64 GB
>>>
>>> I installed initially FreeBSD 10.1-RELEASE created one pool consistent
>>> by 3
>>> X7disk VDEVs in RAIDZ3. I used NFS to start copying some data. After
>>> copying around 3TB I initiated a scrub.
>>> The result was the following: http://pastebin.com/rswgCY2A and
>>> http://pastebin.com/DQ2urGXk
>>>
>>> I tried to flash the controller but the LSI utility did not recognize the
>>> controller. I installed FreeBSD 9.3-RELEASE and used LSI's mpslsi3
>>> driver.
>>> I was able to flash the latest bios and firmware that way.
>>>
>>> LSI Corporation SAS3 Flash Utility
>>> Version 07.00.00.00 (2014.08.14)
>>> Copyright (c) 2008-2014 LSI Corporation. All rights reserved
>>>
>>> Adapter Selected is a LSI SAS: SAS3008(C0)
>>>
>>> Controller Number              : 0
>>> Controller                     : SAS3008(C0)
>>> PCI Address                    : 00:82:00:00
>>> SAS Address                    : 500605b-0-06ce-27e0
>>> NVDATA Version (Default)       : 06.03.00.05
>>> NVDATA Version (Persistent)    : 06.03.00.05
>>> Firmware Product ID            : 0x2221 (IT)
>>> Firmware Version               : 06.00.00.00
>>> NVDATA Vendor                  : LSI
>>> NVDATA Product ID              : SAS9300-8i
>>> BIOS Version                   : 08.13.00.00
>>> UEFI BSD Version               : 02.00.00.00
>>> FCODE Version                  : N/A
>>> Board Name                     : SAS9300-8i
>>> Board Assembly                 : H3-25573-00E
>>> Board Tracer Number            : SV32928040
>>>
>>> I recreated the pool again and started writing data via NFS again. After
>>> 3
>>> TB of data I started a scrub and I am still getting checksum errors
>>> though
>>> there are no messages regarding the drives anymore in /var/log/messages
>>>
>>>    pool: Pool
>>>   state: ONLINE
>>> status: One or more devices has experienced an unrecoverable error.  An
>>> attempt was made to correct the error.  Applications are unaffected.
>>> action: Determine if the device needs to be replaced, and clear the
>>> errors
>>> using 'zpool clear' or replace the device with 'zpool replace'.
>>>     see: http://illumos.org/msg/ZFS-8000-9P
>>>
>>>    scan: scrub in progress since Thu Dec 25 08:46:21 2014
>>>          2.28T scanned out of 5.54T at 816M/s, 1h9m to go
>>>          11.9M repaired, 41.26% done
>>> config:
>>>
>>> NAME                     STATE     READ WRITE CKSUM
>>> Pool                     ONLINE       0     0     0
>>>    raidz3-0               ONLINE       0     0     0
>>>      gpt/WD-WX41D94RN5A3  ONLINE       0     0    15  (repairing)
>>>      gpt/WD-WX41D948YE1U  ONLINE       0     0    14  (repairing)
>>>      gpt/WD-WX41D94RN879  ONLINE       0     0    16  (repairing)
>>>      gpt/WD-WX21D947NC83  ONLINE       0     0    24  (repairing)
>>>      gpt/WD-WX21D947NT77  ONLINE       0     0    15  (repairing)
>>>      gpt/WD-WX41D948YAKV  ONLINE       0     0    19  (repairing)
>>>      gpt/WD-WX21D9421SCV  ONLINE       0     0    20  (repairing)
>>>    raidz3-1               ONLINE       0     0     0
>>>      gpt/WD-WX21D9421F6F  ONLINE       0     0    16  (repairing)
>>>      gpt/WD-WX41D948YPN4  ONLINE       0     0    14  (repairing)
>>>      gpt/WD-WX21D947NE2K  ONLINE       0     0    22  (repairing)
>>>      gpt/WD-WX41D948Y2PX  ONLINE       0     0    19  (repairing)
>>>      gpt/WD-WX41D94RNAX7  ONLINE       0     0    17  (repairing)
>>>      gpt/WD-WX21D947N1RP  ONLINE       0     0    12  (repairing)
>>>      gpt/WD-WX21D94216X7  ONLINE       0     0    20  (repairing)
>>>    raidz3-2               ONLINE       0     0     0
>>>      gpt/WD-WX41D948YAHP  ONLINE       0     0    25  (repairing)
>>>      gpt/WD-WX21D947N06F  ONLINE       0     0    18  (repairing)
>>>      gpt/WD-WX21D947N3T1  ONLINE       0     0    21  (repairing)
>>>      gpt/WD-WX41D94RNT7D  ONLINE       0     0     5  (repairing)
>>>      gpt/WD-WX41D948Y9VV  ONLINE       0     0    18  (repairing)
>>>      gpt/WD-WX41D94RNS62  ONLINE       0     0    24  (repairing)
>>>      gpt/WD-WX21D9421ZP9  ONLINE       0     0    28  (repairing)
>>> logs
>>>    mirror-3               ONLINE       0     0     0
>>>      gpt/zil0             ONLINE       0     0     0
>>>      gpt/zil1             ONLINE       0     0     0
>>> cache
>>>    gpt/cache0             ONLINE       0     0     0
>>>    gpt/cache1             ONLINE       0     0     0
>>>
>>> errors: No known data errors
>>>
>>> This is really driving me crazy since smartmon tools do not display any
>>> errors on the drives.
>>>
>>> Any suggestions are most welcomed!!!
>>>
>>>   Check for bad hardware, first guess would be memory, next would be
>> hotswap backplane.
>>
>>     Regards
>>     Steve
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
>
> Hi Steve,
>
>  Memory looks good in memtest. I am not sure what you mean
> regarding hotswap backplane.
>
> How are the disks attached?
>
> The most common way is your controller being attached to a hotswap
> backplane, which you then plug the disks into.
>
> Unfortunately these backplanes are one of the most common sources of
> issues, especially at higher speeds and even more so if they aren't direct
> passthrough i.e. they are actually expanders which processing of their own.
>
> You report the chassis is a 847BE1C-R1K28LPB which includes such
> expanders, specifically BPN-SAS3-846EL1 and BPN-SAS3-826EL1.
>
> If this is how you are connecting the disk I would strongly advise
> eliminating this from the equation by connecting the disks direct to the
> LSI controller.
>
> You can also check to see if there are any firmware updates for the
> expanders.
>
>     Regards
>     Steve
>


Thanks for your reply Steve. Unfortunately I am thousands of miles away
from the DC. In another continent actually!
I have contacted SuperMicro support to see if they do have any firmware
updates.
I might also need to find someone to go to the DC and physically attach the
disks directly to the controller.

Best!


-- 
George Kontostanos
---



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BdUSypSqqMH3pmwLz9YRfBfNsGb5XpLW03FTYWRgSO-qHbneQ>