Date: Sun, 20 Feb 2011 01:31:45 +0100 From: Piotr Kucharski <piotr.kucharski@42.pl> To: freebsd-fs@freebsd.org Subject: very slow zfs scrub Message-ID: <AANLkTim6z5KiceXQE-DHt51TBs%2BSO8NNLpi7fYBTwXpE@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
FreeBSD 8.2-PRERELEASE #18 r218734M with zfs28 patch applied, machine has 24G ram, 2 package(s) x 4 core(s) x 2 SMT threads (Xeon E5620 @ 2.40GHz) and is 99% idle. # date Sun Feb 20 01:17:05 CET 2011 # zpool status vol pool: vol state: ONLINE scan: scrub in progress since Fri Feb 18 05:24:55 2011 163G scanned out of 8.28T at 1.06M/s, (scan is slow, no estimated time) 0 repaired, 1.93% done config: NAME STATE READ WRITE CKSUM vol ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ggate1.eli ONLINE 0 0 0 ggate3.eli ONLINE 0 0 0 ggate2.eli ONLINE 0 0 0 ggate4.eli ONLINE 0 0 0 ggate5.eli ONLINE 0 0 0 ggate6.eli ONLINE 0 0 0 ggate7.eli ONLINE 0 0 0 ggate8.eli ONLINE 0 0 0 ggate9.eli ONLINE 0 0 0 ggate10.eli ONLINE 0 0 0 ggate11.eli ONLINE 0 0 0 ggate12.eli ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 da0p4 ONLINE 0 0 0 da1p4 ONLINE 0 0 0 ggateX.eli are (obviously) geli over ggate devices located in the same LAN, no special setup. Under normal circumstances performance is fine, at least 20MB/s transfers for files, no problem with multiple accesses etc. It's okay for me. zfs scrub, however, is completely different story a) it is very slow (as you can see above) b) it makes zfs very slow: # dd if=/vol/file of=/dev/null bs=64k [several ^T] load: 0.01 cmd: dd 98988 [zfs] 3.00r 0.00u 0.00s 0% 832k load: 0.01 cmd: dd 98988 [zio->io_cv)] 19.66r 0.00u 0.00s 0% 900k load: 0.01 cmd: dd 98988 [zio->io_cv)] 20.43r 0.00u 0.00s 0% 900k 34+0 records in 34+0 records out 2228224 bytes transferred in 15.816547 secs (140879 bytes/sec) 34+0 records in 34+0 records out 2228224 bytes transferred in 15.816555 secs (140879 bytes/sec) 34+0 records in 34+0 records out 2228224 bytes transferred in 15.816562 secs (140879 bytes/sec) ^C98+0 records in 98+0 records out 6422528 bytes transferred in 46.926860 secs (136863 bytes/sec) it looks like some of the ggate.eli's get hammered by scrub and performance goes down the drain: # for i in {1..12}; do echo -n "ggate$i.eli "; dd if=/dev/ggate$i.eli of=/dev/null bs=64k skip=$((RANDOM*100)) count=100 2>&1 | grep transferred; done ggate1.eli 6553600 bytes transferred in 0.323291 secs (20271516 bytes/sec) ggate2.eli 6553600 bytes transferred in 0.278870 secs (23500567 bytes/sec) ggate3.eli 6553600 bytes transferred in 0.541883 secs (12094124 bytes/sec) ggate4.eli 6553600 bytes transferred in 30.822238 secs (212626 bytes/sec) ggate5.eli 6553600 bytes transferred in 0.927459 secs (7066188 bytes/sec) ggate6.eli 6553600 bytes transferred in 0.346056 secs (18937976 bytes/sec) ggate7.eli 6553600 bytes transferred in 0.279477 secs (23449525 bytes/sec) ggate8.eli 6553600 bytes transferred in 29.028139 secs (225767 bytes/sec) ggate9.eli 6553600 bytes transferred in 0.422382 secs (15515817 bytes/sec) ggate10.eli 6553600 bytes transferred in 0.278718 secs (23513372 bytes/sec) ggate11.eli 6553600 bytes transferred in 0.308250 secs (21260652 bytes/sec) ggate12.eli 6553600 bytes transferred in 29.901194 secs (219175 bytes/sec) (geli eats half of the raw speeds, but that I expected) The second I stop the scrub, everything goes back to normal. I decided to ride it out for a while (despite it making this pool basically unusable), as I've heard scrub speed is abysmal in the beginning and significantly speeds up. But that still didn't happen. Anyone seen similar behaviour? Any advice (other than bringing the disks locally) to fix this?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTim6z5KiceXQE-DHt51TBs%2BSO8NNLpi7fYBTwXpE>