Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 May 2021 09:37:24 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        mike tancsa <mike@sentex.net>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: speeding up zfs send | recv
Message-ID:  <CAOtMX2gifUmgqwSKpRGcfzCm_=BX_szNF1AF8WTMfAmbrJ5UWA@mail.gmail.com>
In-Reply-To: <866d6937-a4e8-bec3-d61b-07df3065fca9@sentex.net>
References:  <866d6937-a4e8-bec3-d61b-07df3065fca9@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 13, 2021 at 8:45 AM mike tancsa <mike@sentex.net> wrote:

> For offsite storage, I have been doing a zfs send across a 10G link and
> noticed something I don't understand with respect to speed.  I have a
> number of datasets I am sending, one which contains a great many
> Maildirs / files and others that a few large 30-60G files (vm disk
> images).  When I am sending the data set with the mail spool (many small
> files and directories), transfer tends to be markedly slower.  Looking
> at the cacti graph, it seems to hover around 500Mb/s through an
> aes128-gcm cipher when sending the mail spool, vs sending the dataset
> that has the VMs on it, around 2.5Gb/s (both on a 5min average)...
>
> Why would the mail spool send be so slow compared to the sends where
> datasets only have a few large files ?
>
> One thing I am wondering is, could it be due to the amount of snapshots
> I have ? For each, I have about 60-100 snapshots. I am only sending a
> copy based on the latest snapshot, but I guess that's a lot of
> calculations to go through in order to get a complete image. However, I
> would have thought that would impact both types of datasets equally ?
> e.g. on my oldest mailspool snapshot, I see 60G of difference from the
> oldest snapshot on a dataset that's about 600GB in size
>
> By contrast, the dataset with VM images, is 300G and the oldest snapshot
> shows just 16G of difference and has a total of 93 snapshots.
>
> Is there anything I can do to speed up the send ? The recv side has lots
> of spare CPU. I dont see the disk blocking at all.  The sending side is
> pretty busy, but I would imagine equally busy across all data sets.
>
> sender is a recent RELENG_12, recv side is RELENG_13
>
> as a side note, is zstd ever nice!  On a different dataset that has a
> lot of big ass json files. I am seeing refcompressratio  22.15x   vs
> 13.19x  for the old lz4.
>
>     ---Mike
>

Is this a high latency link?  ZFS send streams can be bursty.  Piping the
stream through mbuffer helps with that.  Just google "zfs send mbuffer" for
some examples.  And be aware that your speed may be limited by the sender.
Especially if those small files are randomly spread across the platter,
your sending server's disks may be the limiting factor.  Use gstat to check.
-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gifUmgqwSKpRGcfzCm_=BX_szNF1AF8WTMfAmbrJ5UWA>