Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 02 Nov 2018 22:07:17 +1000
From:      Andy Farkas <andyf@andyit.com.au>
To:        freebsd-fs@freebsd.org
Subject:   zpool scrub 9TB finishes in ~40 mins
Message-ID:  <5BDC3DF5.9020501@andyit.com.au>

next in thread | raw e-mail | index | archive | help

[TL;DR - scrub finishes in ~40 min whereas previously it took ~9 hrs]

My home server/gateway is a Dell PowerEdge T110 server with 12GB RAM 
running:

# uname -mv
FreeBSD 11.1-STABLE #0 r331113: Sun Mar 18 19:57:45 AEST 2018 
root@drunkfish.andyit.com.au:/usr/obj/usr/src/sys/DRUNKFISH  amd64

It currently has the following HDDs in it:

# camcontrol devlist
<ATA ST4000DM000-1F21 CC52>        at scbus0 target 3 lun 0 (pass0,da0)
<ATA ST4000DM000-1F21 CC52>        at scbus0 target 4 lun 0 (pass1,da1)
<ATA ST4000DM004-2CV1 0001>        at scbus0 target 5 lun 0 (pass2,da2)
<ATA ST4000DM004-2CV1 0001>        at scbus0 target 6 lun 0 (pass3,da3)
<WDC WD5002ABYS-18B1B0 02.03B04>   at scbus1 target 0 lun 0 (ada0,pass4)
<WDC WD5002ABYS-18B1B0 02.03B04>   at scbus3 target 0 lun 0 (ada1,pass5)
<AHCI SGPIO Enclosure 1.00 0001>   at scbus7 target 0 lun 0 (ses0,pass6)

The WDs are a ZFS mirror for booting, and the Seagates are a ZFS RAIDZ1 data
store. My issue is with the Seagates for the data store.

The WDs are attached to the mainboard SATA ports, and the Seagates to a:

# grep mps /var/run/dmesg.boot | head -3
mps0: <Avago Technologies (LSI) SAS2008> port 0xfc00-0xfcff mem 
0xdeab0000-0xdeabffff,0xdeac0000-0xdeafffff irq 16 at device 0.0 on pci2
mps0: Firmware: 07.15.08.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 
185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>

Originally there were 4 x ST4000DM000-1F21 but about 6 months ago 2 of them
started showing Offline_Uncorrectable errors so I needed to replace 
them. Not
having immediate cash available I only got around to it a few weeks ago. The
pool held up until recently.

The controller started "invalidating disk pack" due to too many read errors.
At first it only dropped 1 disk (so the pool stayed up) but shortly 
after, the
controller started dropping the second disk. It kept doing this every 
time it
was rebooted when a resilver started. No more zpool :(

So I physically moved the Seagates to another box and attached them to its
mainboard. I booted a TrueOS DVD and using the "Emergency shell" 
imported the
pool which initiated a resilver. Lots of read/write errors on the 2 failing
disks reported in /var/log/messages but at least they weren't being 
"invalidated".
I left it attempting to resilver for a few days but it was obvious it 
was not
going to finish until next decade due to read/write timeouts and retries.

I then decided to take each bad disk and clone them to the new disks. 
The first
one I did using 'dd if=<bad> of=<good> conv=noerror,sync' and the second one
via 'recoverdisk <bad> <good>' - I only found out about recoverdisk(1) after
asking a question regarding what does dd(1) do when it gets a read error
with bs=1m; does it fill the buffer with good data, then pad the read 
errors,
and continue filling the buffer, or does it pad out the rest of the 
buffer and
start reading the next 1m... another question for another time.

So the bad disks were cloned to the good disks sans read errors. I moved the
4 good disks back into the Dell, fired it up, and went "yay!" my pool is 
back.

It finished resilvering, I hadn't really payed attention but it seemed the
resilver happened fairly quickly (I was expecting many hours but when I came
back an hour later it was finished). Interestingly, zpool status showed:

   scan: resilvered 143G in 190h41m with 59191 errors on Fri Nov  2 
15:54:56 2018

even though the server had only been running for an hour. I presume this was
because of having the disks in the other box and the various attempts at
resilvering with lots of retry/timeouts on the bad disks.

Then I issued a zpool scrub expecting it to take all night. I checked it 
a few
times and saw:

# zpool status z
   pool: z
  state: ONLINE
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
     entire pool from backup.
    see: http://illumos.org/msg/ZFS-8000-8A
   scan: scrub in progress since Fri Nov  2 16:59:24 2018
     365G scanned out of 11.9T at 228M/s, 14h47m to go
         2.47M repaired, 2.99% done

...about what I was expecting. Another check ~10 mins later:

# zpool status z
   pool: z
  state: ONLINE
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
     entire pool from backup.
    see: http://illumos.org/msg/ZFS-8000-8A
   scan: scrub in progress since Fri Nov  2 16:59:24 2018
     560G scanned out of 11.9T at 238M/s, 13h54m to go
         3.07M repaired, 4.59% done

...looks good. Dinner time. Came back a bit later to check again:

# zpool status z
   pool: z
  state: ONLINE
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
     entire pool from backup.
    see: http://illumos.org/msg/ZFS-8000-8A
   scan: scrub repaired 3.26M in 0h41m with 59191 errors on Fri Nov  2 
17:40:58 2018

Its finished? In only 41 minutes? Very strange.

So I issued another scrub. Here is the complete output:

# zpool status z ; date
   pool: z
  state: ONLINE
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
     entire pool from backup.
    see: http://illumos.org/msg/ZFS-8000-8A
   scan: scrub repaired 0 in 0h39m with 59191 errors on Fri Nov  2 
19:23:26 2018
config:

     NAME                      STATE     READ WRITE CKSUM
     z                         ONLINE       0     0  139K
       raidz1-0                ONLINE       0     0  278K
         diskid/DISK-WFN02KHJ  ONLINE       0     0    11  block size: 
512B configured, 4096B native
         da2                   ONLINE       0     0 1.66K  block size: 
512B configured, 4096B native
         diskid/DISK-Z300CYLN  ONLINE       0     0     0  block size: 
512B configured, 4096B native
         diskid/DISK-Z300C2CK  ONLINE       0     0     0  block size: 
512B configured, 4096B native

errors: 59191 data errors, use '-v' for a list
Fri Nov  2 19:33:28 AEST 2018
#

Finished again in less than 40 minutes what used to take > 9hrs.

Has anyone an explanation? Also why did da2 not get its proper label back?

I very much doubt the pool has been scrubbed properly. Next I'm going to
clear the (expected) errors and see what happens....

Thanks for reading,

-andyf

PS. Yes, I did a backup of the pool to another box (via rsync) a few months
ago when the errors started showing up. Trying to scrounge together another
10TB of storage as a hobbyist is fairly difficult. But I did actually do it.
Unfortunately when I went to restore the files that ZFS -v showed from the
backup, 1 of the 6 2TB disks crashed and wiped out the (non-redundant) pool.

# zfs list z
NAME   USED  AVAIL  REFER  MOUNTPOINT
z     8.92T  1.63T  8.92T  /z





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5BDC3DF5.9020501>