Date: Thu, 25 Dec 2014 23:03:08 +0200 From: George Kontostanos <gkontos.mail@gmail.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: LSI SAS 9300-8i weird ZFS checksum errors Message-ID: <CA%2BdUSyrkfp%2Bgz1zqCJJWo=VjMuEJf6A4vEmOpqzu7L-sAU9U%2Bg@mail.gmail.com> In-Reply-To: <549C65FF.4010702@multiplay.co.uk> References: <CA%2BdUSyo56ioZC4Kn4XTcf_GgeSsQrtd7FYpCxjsqOxQ5ON-_CA@mail.gmail.com> <549C65FF.4010702@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 25, 2014 at 9:31 PM, Steven Hartland <killing@multiplay.co.uk> wrote: > > On 25/12/2014 14:39, George Kontostanos wrote: > >> Hello, list and Merry Christmas to all >> >> I am facing some weird checksum errors during scrub. The configuration is >> the following: >> >> Board: Supermicro Motherboard X10DRi-T4+ ( >> http://www.supermicro.com/products/motherboard/xeon/c600/x10dri-t4_.cfm) >> Controller: LSI SAS 9300-8i ( >> http://www.lsi.com/products/host-bus-adapters/pages/lsi-sas-9300-8i.aspx) >> HDD: 21X6TB Western Digital WD60EFRX >> HDD: 2XIntel SATA 600GB Solid-State Drive SSDSC2BB600G401 DC S3500 >> (SWAP, ZIL, CACHE) >> Chassis: Supermicro 847BE1C-R1K28LPB 4U Storage Chassis >> RAM: 64 GB >> >> I installed initially FreeBSD 10.1-RELEASE created one pool consistent by >> 3 >> X7disk VDEVs in RAIDZ3. I used NFS to start copying some data. After >> copying around 3TB I initiated a scrub. >> The result was the following: http://pastebin.com/rswgCY2A and >> http://pastebin.com/DQ2urGXk >> >> I tried to flash the controller but the LSI utility did not recognize the >> controller. I installed FreeBSD 9.3-RELEASE and used LSI's mpslsi3 driver. >> I was able to flash the latest bios and firmware that way. >> >> LSI Corporation SAS3 Flash Utility >> Version 07.00.00.00 (2014.08.14) >> Copyright (c) 2008-2014 LSI Corporation. All rights reserved >> >> Adapter Selected is a LSI SAS: SAS3008(C0) >> >> Controller Number : 0 >> Controller : SAS3008(C0) >> PCI Address : 00:82:00:00 >> SAS Address : 500605b-0-06ce-27e0 >> NVDATA Version (Default) : 06.03.00.05 >> NVDATA Version (Persistent) : 06.03.00.05 >> Firmware Product ID : 0x2221 (IT) >> Firmware Version : 06.00.00.00 >> NVDATA Vendor : LSI >> NVDATA Product ID : SAS9300-8i >> BIOS Version : 08.13.00.00 >> UEFI BSD Version : 02.00.00.00 >> FCODE Version : N/A >> Board Name : SAS9300-8i >> Board Assembly : H3-25573-00E >> Board Tracer Number : SV32928040 >> >> I recreated the pool again and started writing data via NFS again. After 3 >> TB of data I started a scrub and I am still getting checksum errors though >> there are no messages regarding the drives anymore in /var/log/messages >> >> pool: Pool >> state: ONLINE >> status: One or more devices has experienced an unrecoverable error. An >> attempt was made to correct the error. Applications are unaffected. >> action: Determine if the device needs to be replaced, and clear the errors >> using 'zpool clear' or replace the device with 'zpool replace'. >> see: http://illumos.org/msg/ZFS-8000-9P >> >> scan: scrub in progress since Thu Dec 25 08:46:21 2014 >> 2.28T scanned out of 5.54T at 816M/s, 1h9m to go >> 11.9M repaired, 41.26% done >> config: >> >> NAME STATE READ WRITE CKSUM >> Pool ONLINE 0 0 0 >> raidz3-0 ONLINE 0 0 0 >> gpt/WD-WX41D94RN5A3 ONLINE 0 0 15 (repairing) >> gpt/WD-WX41D948YE1U ONLINE 0 0 14 (repairing) >> gpt/WD-WX41D94RN879 ONLINE 0 0 16 (repairing) >> gpt/WD-WX21D947NC83 ONLINE 0 0 24 (repairing) >> gpt/WD-WX21D947NT77 ONLINE 0 0 15 (repairing) >> gpt/WD-WX41D948YAKV ONLINE 0 0 19 (repairing) >> gpt/WD-WX21D9421SCV ONLINE 0 0 20 (repairing) >> raidz3-1 ONLINE 0 0 0 >> gpt/WD-WX21D9421F6F ONLINE 0 0 16 (repairing) >> gpt/WD-WX41D948YPN4 ONLINE 0 0 14 (repairing) >> gpt/WD-WX21D947NE2K ONLINE 0 0 22 (repairing) >> gpt/WD-WX41D948Y2PX ONLINE 0 0 19 (repairing) >> gpt/WD-WX41D94RNAX7 ONLINE 0 0 17 (repairing) >> gpt/WD-WX21D947N1RP ONLINE 0 0 12 (repairing) >> gpt/WD-WX21D94216X7 ONLINE 0 0 20 (repairing) >> raidz3-2 ONLINE 0 0 0 >> gpt/WD-WX41D948YAHP ONLINE 0 0 25 (repairing) >> gpt/WD-WX21D947N06F ONLINE 0 0 18 (repairing) >> gpt/WD-WX21D947N3T1 ONLINE 0 0 21 (repairing) >> gpt/WD-WX41D94RNT7D ONLINE 0 0 5 (repairing) >> gpt/WD-WX41D948Y9VV ONLINE 0 0 18 (repairing) >> gpt/WD-WX41D94RNS62 ONLINE 0 0 24 (repairing) >> gpt/WD-WX21D9421ZP9 ONLINE 0 0 28 (repairing) >> logs >> mirror-3 ONLINE 0 0 0 >> gpt/zil0 ONLINE 0 0 0 >> gpt/zil1 ONLINE 0 0 0 >> cache >> gpt/cache0 ONLINE 0 0 0 >> gpt/cache1 ONLINE 0 0 0 >> >> errors: No known data errors >> >> This is really driving me crazy since smartmon tools do not display any >> errors on the drives. >> >> Any suggestions are most welcomed!!! >> >> Check for bad hardware, first guess would be memory, next would be > hotswap backplane. > > Regards > Steve > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Hi Steve, Memory looks good in memtest. I am not sure what you mean regarding hotswap backplane. -- George Kontostanos ---
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BdUSyrkfp%2Bgz1zqCJJWo=VjMuEJf6A4vEmOpqzu7L-sAU9U%2Bg>