Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Apr 2020 13:46:28 +0200
From:      Peter Eriksson <pen@lysator.liu.se>
To:        freebsd-fs <freebsd-fs@freebsd.org>
Cc:        Andriy Gapon <avg@FreeBSD.org>
Subject:   Re: ZFS server has gone crazy slow
Message-ID:  <747B75C0-73D7-42B2-9910-9E16FCAE23C4@lysator.liu.se>
In-Reply-To: <575c01de-b503-f4f9-2f13-f57f428f53ec@FreeBSD.org>
References:  <2182C27C-A5D3-41BF-9CE9-7C6883E43074@distal.com> <20200411174831.GA54397@fuz.su> <6190573D-BCA7-44F9-86BD-0DCBB1F69D1D@distal.com> <6fd7a561-462e-242d-5057-51c52d716d68@wp.pl> <7AA1EA07-6041-464A-A39A-158ACD1DC11C@distal.com> <FE84C045-89B1-4772-AF1F-35F78B9877D8@lysator.liu.se> <575c01de-b503-f4f9-2f13-f57f428f53ec@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
You are probably right.=20

However - we have seen (thru experimentation :-) that =E2=80=9Czfs =
destroy -d=E2=80=9D for recursive snapshot destruction on many =
filesystems (recursively) seemed to allow it to be done much faster (Ie =
the command finished much quicker) on our servers. But it also meant =
that a lot of I/O seemed to be happening quite some time after the last =
=E2=80=9Czfs destroy -d=E2=80=9D command was issued (and a really long =
time when there were near-quota-full filesystems). No clones or =E2=80=9Cu=
ser holds=E2=80=9D in use here as far as I know. Why that is I don=E2=80=99=
t know. With =E2=80=9Czfs destroy=E2=80=9D (no =E2=80=9C-d=E2=80=9D) =
things seems to be much more synchronous.

We=E2=80=99ve stopped using =E2=80=9C-d=E2=80=9D now since we=E2=80=99d =
rather not have that type of I/O load be happening during daytime and we =
had some issues with some nightly snapshot cleanup jobs not finishing in =
time.

Anyway, the =E2=80=9Cseems to be writing out a lot of queued up ZIL =
data=E2=80=9D at =E2=80=9Czfs mount -a=E2=80=9D time was definitely a =
real problem - it mounted most of the filesystems pretty quickly but =
then was =E2=80=9Cextremely slow=E2=80=9D for a couple of them (and was =
causing a lot of I/O). Like 4-6 hours. Luckily that one was one of our =
backup servers and during a time when the only one it frustrated was =
me=E2=80=A6 I=E2=80=99d hate that to happen for one of the frontend =
(NFS/SMB-serving) servers during office hours :-)

- Peter


> On 12 Apr 2020, at 13:26, Andriy Gapon <avg@FreeBSD.org> wrote:
>=20
>=20
> On 12/04/2020 00:24, Peter Eriksson wrote:
>> Another fun thing that might happen is if you reboot your server and =
happen
>> to have a lot of queued up writes in the ZIL (for example if you did =
a =E2=80=9Czfs
>> destroy -d -r POOL@snapshots=E2=80=9D (deferred(background) destroys =
of snapshots)
>> and do a hard reboot while it=E2=80=99s busy it will =E2=80=9Cwrite =
out=E2=80=9D those queued
>> transactions at filesystem mount time during the boot sequence
>=20
> Just nitpicking on two bits of incorrect information here.
> First, zfs destroy never uses ZIL.  Never.  ZIL is used only for ZPL =
operations
> like file writes, renames, removes, etc.  The things that you can do =
with Posix
> system calls (~ VFS KPI).
>=20
> Second, zfs destroy -d is not a background destroy.  It is a deferred =
destroy.
> That means that either the destroy is done immediately if a snapshot =
has no
> holds which means no user holds and no clones.  Or the destroy is =
postponed
> until holds are gone, that is, the last clone or the last user hold is =
removed.
>=20
> Note, however, that unless you have a very ancient pool version =
destroying a
> snapshot means that the snapshot object is removed and all blocks =
belonging to
> the snapshot are queued for freeing.  Their actual freeing is done
> asynchronously ("in background") and can be spread over multiple TXG =
periods.
> That's done regardless of whether -d was used.
>=20
> --=20
> Andriy Gapon




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?747B75C0-73D7-42B2-9910-9E16FCAE23C4>