Date: Sat, 27 Mar 2021 17:06:43 +0100 From: Mathieu Chouquet-Stringer <me+freebsd@mathieu.digital> To: freebsd-fs@freebsd.org Subject: Re: Scrub incredibly slow with 13.0-RC3 (as well as RC1 & 2) Message-ID: <YF9YE282kOWN7ubr@weirdfishes> In-Reply-To: <YF8eL0dmSh6H8HX2@ceres.zyxst.net> References: <YFhuxr0qRzchA7x8@weirdfishes> <YF8eL0dmSh6H8HX2@ceres.zyxst.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, TL;DR: read_max makes no difference, don't use hpet with KVM, use scan_legacy. On Sat, Mar 27, 2021 at 11:59:43AM +0000, tech-lists wrote: > Have you tried the following sysctl: > > vfs.read_max=128 It's something I haven't so I gave it a shot. Note that I first ran my tests on zroot which is on a virtio controller (the image file is on mirror made of two SSDs). This pool is small with only 12.1GB allocated out of 21.9GB: NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zroot 21.9G 12.1G 9.88G - - 37% 54% 1.00x ONLINE - Scrubbing is hard to bench because the status seems stuck forever: scan: scrub in progress since Sat Mar 27 15:16:27 2021 11.8G scanned at 146M/s, 1.23M issued at 15.2K/s, 12.1G total 0B repaired, 0.01% done, no estimated completion time scan: scrub in progress since Sat Mar 27 15:16:27 2021 12.0G scanned at 145M/s, 3.58M issued at 43.2K/s, 12.1G total 0B repaired, 0.03% done, no estimated completion time scan: scrub in progress since Sat Mar 27 15:16:27 2021 12.1G scanned at 142M/s, 409M issued at 4.70M/s, 12.1G total 0B repaired, 3.31% done, no estimated completion time Look how it already read 12.0G so pretty much 100% of the allocated space in the pool and it still shows 0.03% done... And with a bs of 128k (zfs record size), dd with iflags=direct tells me I can read this thing at: 23266197504 bytes (23 GB, 22 GiB) transferred 23.042s, 1010 MB/s With a bs of 32k I still get 310MB/s. So please find below scrubbing duration with vfs.read_max 512, 256, 128 and 64: it makes no difference. Please not it seems it's completely cpu bound again as I see this process using 100% of the CPU the whole time scrub is running: 31 root -8 - 0B 3952K CPU0 0 20:05 100.00% zfskern{txg_thread_enter} - vfs.read_max=512 vfs.read_max: 64 -> 512 - Scrub 1 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:10:41 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:13:33 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:54 with 0 errors on Sat Mar 27 15:16:27 2021 - vfs.read_max=256 vfs.read_max: 512 -> 256 - Scrub 1 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:19:19 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:22:12 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:25:04 2021 - vfs.read_max=128 vfs.read_max: 256 -> 128 - Scrub 1 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:27:56 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:30:49 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:33:41 2021 - vfs.read_max=64 vfs.read_max: 128 -> 64 - Scrub 1 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:36:34 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Mar 27 15:39:26 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:53 with 0 errors on Sat Mar 27 15:42:19 2021 I am attaching a flamegraph for that busy process: again a lot of hpet_get_timecount... So I went and disabled hpet in kvm: I gained 30 seconds and again, playing with vfs.read_max made no difference but then again it's a ssd. And given I still am cpu bound I'm not sure I'll gain anything by playing with block sizes... But the first lesson is to not use hpet with KVM. - vfs.read_max=512 vfs.read_max: 64 -> 512 - Scrub 1 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:14:25 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:16:44 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:19:02 2021 - vfs.read_max=256 vfs.read_max: 512 -> 256 - Scrub 1 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:21:20 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:23:39 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:25:57 2021 - vfs.read_max=128 vfs.read_max: 256 -> 128 - Scrub 1 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:28:15 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:30:33 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:32:51 2021 - vfs.read_max=64 vfs.read_max: 128 -> 64 - Scrub 1 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:35:10 2021 - Scrub 2 scan: scrub repaired 0B in 00:02:18 with 0 errors on Sat Mar 27 16:37:28 2021 - Scrub 3 scan: scrub repaired 0B in 00:02:19 with 0 errors on Sat Mar 27 16:39:47 2021 An interesting thing is vfs.zfs.scan_legacy=1 makes the progress start right away. See how I read almost 50% of the disk and I'm already 49% done while with the non legacy scan, progress only moves after reading all the allocated space. scan: scrub in progress since Sat Mar 27 16:41:33 2021 5.85G scanned at 133M/s, 5.85G issued at 133M/s, 12.1G total 0B repaired, 48.50% done, 00:00:47 to go It's altogether way quicker (I gain 54 seconds on the no hpet test!!!) and totally reproducible: scan: scrub repaired 0B in 00:01:35 with 0 errors on Sat Mar 27 16:43:08 2021 scan: scrub repaired 0B in 00:01:35 with 0 errors on Sat Mar 27 16:46:10 2021 I will start a scrub of my zraid2 zpool and tweak read_max to see if it makes a difference but at this point I doubt it. And scan_legacy could be quicker because I see as many busy threads as there are vdevs while if I start a scan with the non legacy scan, I only see one from the get go. PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 0 root -16 - 0B 13M CPU0 0 8:40 99.17% kernel{z_rd_iss_4} 0 root -16 - 0B 13M CPU9 9 8:40 99.15% kernel{z_rd_iss_6} 0 root -16 - 0B 13M CPU7 7 8:40 98.97% kernel{z_rd_iss_5} 0 root -16 - 0B 13M CPU4 4 8:40 98.96% kernel{z_rd_iss_7} 0 root -16 - 0B 13M CPU6 6 8:40 98.59% kernel{z_rd_iss_1} 0 root -16 - 0B 13M CPU5 5 8:40 98.56% kernel{z_rd_iss_2} 0 root -16 - 0B 13M CPU14 14 8:40 98.33% kernel{z_rd_iss_3} 0 root -16 - 0B 13M CPU11 11 8:40 97.89% kernel{z_rd_iss_0} -- Mathieu Chouquet-Stringer The sun itself sees not till heaven clears. -- William Shakespeare --
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YF9YE282kOWN7ubr>