Date: Sun, 8 Mar 2020 20:00:52 +0100 From: Peter Eriksson <pen@lysator.liu.se> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Ways to "pause" ZFS resilver? Message-ID: <73CD8E11-76D7-4286-B2C4-1D3835F910B6@lysator.liu.se> In-Reply-To: <CANCZdfo7BDLJYko1wW8L5%2BwxzMY6L-Rd0RbYK8XSL1b0DgQ6qw@mail.gmail.com> References: <BDFFC0E8-9D5A-4E45-835F-9D00CDAE8829@lysator.liu.se> <CANCZdfo7BDLJYko1wW8L5%2BwxzMY6L-Rd0RbYK8XSL1b0DgQ6qw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Data drives are 12 HGST 10TB 7200rpm spinning rust=E2=80=A6 = (2xRAIDZ2(4+2)) Well, except for the log (dual Intel DC S3700) and cache (Intel 750 = Series PCIe) devices. But I=E2=80=99m not seeing any errors on those. (The NFS-hickups seem to be happening in =E2=80=9Cnfsmsleep()=E2=80=9D = for some reason. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244665 = <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244665> Dtrace output: 27 54842 nfsrvd_dorpc:entry Start 27 47273 nfsv4_lock:entry Start(lp->nfslock_lock=3D6, = iwantlock=3D0) 27 37590 nfsmsleep:entry Start(ffffffff81e9982c, = ffffffff81e998a0, 99) 27 54396 _sleep:entry Start(prio=3D99, timeo=3D0) 7 54397 _sleep:return 7171965 =C2=B5s 7 37591 nfsmsleep:return 7171972 =C2=B5s 7 47274 nfsv4_lock:return 7171979 =C2=B5s But it would be really nice to be able to have some way to temporarily = pause a running resilver while I=E2=80=99m investigating this issue) - Peter > On 8 Mar 2020, at 19:37, Warner Losh <imp@bsdimp.com> wrote: >=20 >=20 >=20 > On Sun, Mar 8, 2020 at 12:35 PM Peter Eriksson <pen@lysator.liu.se = <mailto:pen@lysator.liu.se>> wrote: > I=E2=80=99m looking for ideas on how to pause a running ZFS resilver = on a FreeBSD 11.3-RELEASE-p6 system. >=20 > The reason is we have a system where a running such causes severe NFS = =E2=80=9Chiccups=E2=80=9D for our users (like 5-20s delays more or less = often) and thus I=E2=80=99d like to figure out some way to =E2=80=9Cpause=E2= =80=9D it during office hours until either we=E2=80=99ve found and fixed = the problem or the resilver is done (1D15H to go)... >=20 > Since there isn=E2=80=99t any =E2=80=9Czfs=E2=80=9D command to pause a = running resilver I=E2=80=99m pondering alternative more =E2=80=9Ccreative=E2= =80=9D ways. >=20 > /usr/src/cddl/contrib/opensolaris/uts/common/fs/zfs: >=20 > > if (zio_flags & ZIO_FLAG_RESILVER) > > scan_delay =3D zfs_resilver_delay; > > else { > > ASSERT(zio_flags & ZIO_FLAG_SCRUB); > > scan_delay =3D zfs_scrub_delay; > > } > > > > if (scan_delay && (ddi_get_lbolt64() - spa->spa_last_io <=3D = zfs_scan_idle)) > > delay(MAX((int)scan_delay, 0)); >=20 > Settings vfs.zfs.scan_idle to something high and then = vfs.zfs.resilver_delay to 10*60*60*kern.hz (10 hours) and hoping the = =E2=80=9Cif" statement will trigger? But that assumes nothing can/will = interrupt delay(). Hmmm... >=20 > Any other suggestions? >=20 > (I don=E2=80=99t want to abort the resilver). >=20 > If you are dealing with SSDs, you might look to see if BIO_DELETE = (trim) traffic is causing delays. If so, you can temporarily disable = TRIM on the disk being resilvered. In the resilver case, trim doesn't = help much anyway since you're rewriting the entire drive. If not, then = I'm not sure what else to recommend... >=20 > Warner=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?73CD8E11-76D7-4286-B2C4-1D3835F910B6>