Date: Fri, 26 Mar 2021 13:29:45 +0100 From: Michael Gmelin <freebsd@grem.de> To: Mathieu Chouquet-Stringer <me+freebsd@mathieu.digital> Cc: Matt Churchyard <matt.churchyard@userve.net>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, current@freebsd.org Subject: Re: Scrub incredibly slow with 13.0-RC3 (as well as RC1 & 2) Message-ID: <20210326132945.3274687e@bsd64.grem.de> In-Reply-To: <YF2raxOUeN8Y23eT@weirdfishes> References: <YFhuxr0qRzchA7x8@weirdfishes> <202103221515.12MFFHRK015188@higson.cam.lispworks.com> <YFi6Lwh3ISn8UMvS@weirdfishes> <YFk11A/j7URClN/l@weirdfishes> <YFm3BTK/J9XY/mCN@weirdfishes> <202103241230.12OCUqur030001@higson.cam.lispworks.com> <YFs3jFT7sEaGeQCe@weirdfishes> <33eb78e2de404a77b271880dbee4c22e@SERVER.ad.usd-group.com> <YF2raxOUeN8Y23eT@weirdfishes>
index | next in thread | previous in thread | raw e-mail
On Fri, 26 Mar 2021 10:37:47 +0100
Mathieu Chouquet-Stringer <me+freebsd@mathieu.digital> wrote:
> On Thu, Mar 25, 2021 at 08:55:12AM +0000, Matt Churchyard wrote:
> > Just an a aside, I did post a message a few weeks ago with a similar
> > problem on 13 (as well as snapshot issues). Scrub seemed ok for a
> > short while, but then ground to a halt. It would take 10+ minutes to
> > go 0.01%, with everything appearing fairly idle. I finally gave up
> > and stopped it after about 20 hours. Moving to 12.2 and rebuilding
> > the pool, the system scrubbed the same data in an hour, and I've
> > just scrubbed the same system after a month of use with about 4
> > times the data in 3 hours 20. As far as I'm aware, both should be
> > using effectively the same "new" scrub code.
> >
> > Will be interesting if you find a cause as I didn't get any response
> > to what for me was a complete showstopper for moving to 13.
>
> Bear with me, I'm slowly resilvering now... But same thing, it's not
> even maxing out my slow drives... Looks like it'll take 2 days...
>
> I did some flame graphs using dtrace. The first one is just the output
> of that:
> dtrace -x stackframes=100 -n 'profile-99 /arg0/ { @[stack()] =
> count(); } tick-60s { exit(0); }'
>
> Clearly my machine is not busy at all.
> And the second is the output of pretty much the same thing except I'm
> only capturing pid 31 which is the one busy.
> dtrace -x stackframes=100 -n 'profile-99 /arg0 && pid == 31/ {
> @[stack()] = count(); } tick-60s { exit(0); }'
>
> One striking thing is how many times hpet_get_timecount is present...
Does tuning of
- vfs.zfs.scrub_delay
- vfs.zfs.resilver_min_time_ms
- vfs.zfs.resilver_delay
make a difference?
Best,
Michael
--
Michael Gmelin
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210326132945.3274687e>
