Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 8 Mar 2020 20:00:52 +0100
From:      Peter Eriksson <pen@lysator.liu.se>
To:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Ways to "pause" ZFS resilver?
Message-ID:  <73CD8E11-76D7-4286-B2C4-1D3835F910B6@lysator.liu.se>
In-Reply-To: <CANCZdfo7BDLJYko1wW8L5%2BwxzMY6L-Rd0RbYK8XSL1b0DgQ6qw@mail.gmail.com>
References:  <BDFFC0E8-9D5A-4E45-835F-9D00CDAE8829@lysator.liu.se> <CANCZdfo7BDLJYko1wW8L5%2BwxzMY6L-Rd0RbYK8XSL1b0DgQ6qw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Data drives are 12 HGST 10TB 7200rpm spinning rust=E2=80=A6 =
(2xRAIDZ2(4+2))

Well, except for the log (dual Intel DC S3700) and cache (Intel 750 =
Series PCIe) devices. But I=E2=80=99m not seeing any errors on those.


(The NFS-hickups seem to be happening in =E2=80=9Cnfsmsleep()=E2=80=9D =
for some reason.

  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244665 =
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244665>;

Dtrace output:
 27  54842               nfsrvd_dorpc:entry Start
 27  47273                 nfsv4_lock:entry Start(lp->nfslock_lock=3D6, =
iwantlock=3D0)
 27  37590                  nfsmsleep:entry Start(ffffffff81e9982c, =
ffffffff81e998a0, 99)
 27  54396                     _sleep:entry Start(prio=3D99, timeo=3D0)
  7  54397                    _sleep:return    7171965 =C2=B5s
  7  37591                 nfsmsleep:return    7171972 =C2=B5s
  7  47274                nfsv4_lock:return    7171979 =C2=B5s

But it would be really nice to be able to have some way to temporarily =
pause a running resilver while I=E2=80=99m investigating this issue)

- Peter


> On 8 Mar 2020, at 19:37, Warner Losh <imp@bsdimp.com> wrote:
>=20
>=20
>=20
> On Sun, Mar 8, 2020 at 12:35 PM Peter Eriksson <pen@lysator.liu.se =
<mailto:pen@lysator.liu.se>> wrote:
> I=E2=80=99m looking for ideas on how to pause a running ZFS resilver =
on a FreeBSD 11.3-RELEASE-p6 system.
>=20
> The reason is we have a system where a running such causes severe NFS =
=E2=80=9Chiccups=E2=80=9D for our users (like 5-20s delays more or less =
often) and thus I=E2=80=99d like to figure out some way to =E2=80=9Cpause=E2=
=80=9D it during office hours until either we=E2=80=99ve found and fixed =
the problem or the resilver is done (1D15H to go)...
>=20
> Since there isn=E2=80=99t any =E2=80=9Czfs=E2=80=9D command to pause a =
running resilver I=E2=80=99m pondering alternative more =E2=80=9Ccreative=E2=
=80=9D ways.
>=20
> /usr/src/cddl/contrib/opensolaris/uts/common/fs/zfs:
>=20
> >       if (zio_flags & ZIO_FLAG_RESILVER)
> >                scan_delay =3D zfs_resilver_delay;
> >        else {
> >                ASSERT(zio_flags & ZIO_FLAG_SCRUB);
> >                scan_delay =3D zfs_scrub_delay;
> >        }
> >
> >        if (scan_delay && (ddi_get_lbolt64() - spa->spa_last_io <=3D =
zfs_scan_idle))
> >                delay(MAX((int)scan_delay, 0));
>=20
> Settings vfs.zfs.scan_idle to something high and then =
vfs.zfs.resilver_delay to 10*60*60*kern.hz (10 hours) and hoping the =
=E2=80=9Cif" statement will trigger? But that assumes nothing can/will =
interrupt delay(). Hmmm...
>=20
> Any other suggestions?
>=20
> (I don=E2=80=99t want to abort the resilver).
>=20
> If you are dealing with SSDs, you might look to see if BIO_DELETE =
(trim) traffic is causing delays. If so, you can temporarily disable =
TRIM on the disk being resilvered. In the resilver case, trim doesn't =
help much anyway since you're rewriting the entire drive. If not, then =
I'm not sure what else to recommend...
>=20
> Warner=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?73CD8E11-76D7-4286-B2C4-1D3835F910B6>