From owner-freebsd-fs@freebsd.org Fri Mar 26 09:37:52 2021 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B2B245B932E; Fri, 26 Mar 2021 09:37:52 +0000 (UTC) (envelope-from mchouque@thi.eu.com) Received: from relay10.mail.gandi.net (relay10.mail.gandi.net [217.70.178.230]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4F6H2R3yfyz4S1C; Fri, 26 Mar 2021 09:37:50 +0000 (UTC) (envelope-from mchouque@thi.eu.com) Received: from weirdfishes.localdomain (62-210-143-248.rev.poneytelecom.eu [62.210.143.248]) (Authenticated sender: m@thi.eu.com) by relay10.mail.gandi.net (Postfix) with ESMTPSA id 2B85B240016; Fri, 26 Mar 2021 09:37:47 +0000 (UTC) Received: by weirdfishes.localdomain (Postfix, from userid 1000) id D14847203B9D5; Fri, 26 Mar 2021 10:37:47 +0100 (CET) Date: Fri, 26 Mar 2021 10:37:47 +0100 From: Mathieu Chouquet-Stringer To: Matt Churchyard Cc: "freebsd-fs@freebsd.org" , current@freebsd.org Subject: Re: Scrub incredibly slow with 13.0-RC3 (as well as RC1 & 2) Message-ID: References: <202103221515.12MFFHRK015188@higson.cam.lispworks.com> <202103241230.12OCUqur030001@higson.cam.lispworks.com> <33eb78e2de404a77b271880dbee4c22e@SERVER.ad.usd-group.com> MIME-Version: 1.0 In-Reply-To: <33eb78e2de404a77b271880dbee4c22e@SERVER.ad.usd-group.com> X-Face: %JOeya=Dg!}[/#Go&*&cQ+)){p1c8}u\Fg2Q3&)kothIq|JnWoVzJtCFo~4X List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2021 09:37:52 -0000 On Thu, Mar 25, 2021 at 08:55:12AM +0000, Matt Churchyard wrote: > Just an a aside, I did post a message a few weeks ago with a similar > problem on 13 (as well as snapshot issues). Scrub seemed ok for a > short while, but then ground to a halt. It would take 10+ minutes to > go 0.01%, with everything appearing fairly idle. I finally gave up and > stopped it after about 20 hours. Moving to 12.2 and rebuilding the > pool, the system scrubbed the same data in an hour, and I've just > scrubbed the same system after a month of use with about 4 times the > data in 3 hours 20. As far as I'm aware, both should be using > effectively the same "new" scrub code. > > Will be interesting if you find a cause as I didn't get any response > to what for me was a complete showstopper for moving to 13. Bear with me, I'm slowly resilvering now... But same thing, it's not even maxing out my slow drives... Looks like it'll take 2 days... I did some flame graphs using dtrace. The first one is just the output of that: dtrace -x stackframes=100 -n 'profile-99 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' Clearly my machine is not busy at all. And the second is the output of pretty much the same thing except I'm only capturing pid 31 which is the one busy. dtrace -x stackframes=100 -n 'profile-99 /arg0 && pid == 31/ { @[stack()] = count(); } tick-60s { exit(0); }' One striking thing is how many times hpet_get_timecount is present... -- Mathieu Chouquet-Stringer mchouque@free.fr The sun itself sees not till heaven clears. -- William Shakespeare --