From owner-freebsd-fs@freebsd.org Sun Mar 8 19:01:00 2020 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id CAC582726AA for ; Sun, 8 Mar 2020 19:01:00 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48b9fy0HmYz4MVd for ; Sun, 8 Mar 2020 19:00:57 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 07F4F40002 for ; Sun, 8 Mar 2020 20:00:55 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id E530440008; Sun, 8 Mar 2020 20:00:54 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED, AWL, HTML_MESSAGE autolearn=disabled version=3.4.2 X-Spam-Score: -1.0 Received: from [IPv6:2001:9b1:28ff:d901:8541:7a72:29d1:9a64] (unknown [IPv6:2001:9b1:28ff:d901:8541:7a72:29d1:9a64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 0E9DD40002 for ; Sun, 8 Mar 2020 20:00:53 +0100 (CET) From: Peter Eriksson Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.60.0.2.5\)) Subject: Re: Ways to "pause" ZFS resilver? Date: Sun, 8 Mar 2020 20:00:52 +0100 References: To: freebsd-fs In-Reply-To: Message-Id: <73CD8E11-76D7-4286-B2C4-1D3835F910B6@lysator.liu.se> X-Mailer: Apple Mail (2.3608.60.0.2.5) X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 48b9fy0HmYz4MVd X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se X-Spamd-Result: default: False [-4.42 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.85)[-0.854,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; MIME_TRACE(0.00)[0:+,1:+,2:~]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[4]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; MV_CASE(0.50)[]; IP_SCORE(-3.07)[ip: (-7.92), ipnet: 130.236.0.0/16(-4.11), asn: 2843(-3.28), country: SE(-0.03)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; MID_RHS_MATCH_FROM(0.00)[] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Mar 2020 19:01:00 -0000 Data drives are 12 HGST 10TB 7200rpm spinning rust=E2=80=A6 = (2xRAIDZ2(4+2)) Well, except for the log (dual Intel DC S3700) and cache (Intel 750 = Series PCIe) devices. But I=E2=80=99m not seeing any errors on those. (The NFS-hickups seem to be happening in =E2=80=9Cnfsmsleep()=E2=80=9D = for some reason. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244665 = Dtrace output: 27 54842 nfsrvd_dorpc:entry Start 27 47273 nfsv4_lock:entry Start(lp->nfslock_lock=3D6, = iwantlock=3D0) 27 37590 nfsmsleep:entry Start(ffffffff81e9982c, = ffffffff81e998a0, 99) 27 54396 _sleep:entry Start(prio=3D99, timeo=3D0) 7 54397 _sleep:return 7171965 =C2=B5s 7 37591 nfsmsleep:return 7171972 =C2=B5s 7 47274 nfsv4_lock:return 7171979 =C2=B5s But it would be really nice to be able to have some way to temporarily = pause a running resilver while I=E2=80=99m investigating this issue) - Peter > On 8 Mar 2020, at 19:37, Warner Losh wrote: >=20 >=20 >=20 > On Sun, Mar 8, 2020 at 12:35 PM Peter Eriksson > wrote: > I=E2=80=99m looking for ideas on how to pause a running ZFS resilver = on a FreeBSD 11.3-RELEASE-p6 system. >=20 > The reason is we have a system where a running such causes severe NFS = =E2=80=9Chiccups=E2=80=9D for our users (like 5-20s delays more or less = often) and thus I=E2=80=99d like to figure out some way to =E2=80=9Cpause=E2= =80=9D it during office hours until either we=E2=80=99ve found and fixed = the problem or the resilver is done (1D15H to go)... >=20 > Since there isn=E2=80=99t any =E2=80=9Czfs=E2=80=9D command to pause a = running resilver I=E2=80=99m pondering alternative more =E2=80=9Ccreative=E2= =80=9D ways. >=20 > /usr/src/cddl/contrib/opensolaris/uts/common/fs/zfs: >=20 > > if (zio_flags & ZIO_FLAG_RESILVER) > > scan_delay =3D zfs_resilver_delay; > > else { > > ASSERT(zio_flags & ZIO_FLAG_SCRUB); > > scan_delay =3D zfs_scrub_delay; > > } > > > > if (scan_delay && (ddi_get_lbolt64() - spa->spa_last_io <=3D = zfs_scan_idle)) > > delay(MAX((int)scan_delay, 0)); >=20 > Settings vfs.zfs.scan_idle to something high and then = vfs.zfs.resilver_delay to 10*60*60*kern.hz (10 hours) and hoping the = =E2=80=9Cif" statement will trigger? But that assumes nothing can/will = interrupt delay(). Hmmm... >=20 > Any other suggestions? >=20 > (I don=E2=80=99t want to abort the resilver). >=20 > If you are dealing with SSDs, you might look to see if BIO_DELETE = (trim) traffic is causing delays. If so, you can temporarily disable = TRIM on the disk being resilvered. In the resilver case, trim doesn't = help much anyway since you're rewriting the entire drive. If not, then = I'm not sure what else to recommend... >=20 > Warner=20