FreeBSD Mail Archives

Date:      Mon, 23 Aug 2021 09:54:39 +1000
From:      George Michaelson <ggm@algebras.org>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Ben RUBSON <ben.rubson@gmx.com>, freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: ZFS on high-latency devices
Message-ID:  <CAKr6gn2RG_PQCj=2dT7Ntqp0nZRLGOOhdXgbm2e%2B%2BtJvL5WfKg@mail.gmail.com>
In-Reply-To: <CAOtMX2hRuh_9ZOOoQufNT2QG3Ui0S3rJq%2BL-ox2kxsq1oJMSMA@mail.gmail.com>
References:  <YR4mY%2Bb6o7fBJqEN@server.rulingia.com> <023225AD-2A97-47C5-9FE4-3ABF1BFD66F1@gmx.com> <CAKr6gn0r8xG9HNGOFh1A_usU4tPAYezeZv1chOG_bBMqy_HtXw@mail.gmail.com> <CAOtMX2hRuh_9ZOOoQufNT2QG3Ui0S3rJq%2BL-ox2kxsq1oJMSMA@mail.gmail.com>

I don't think its sensible to mesh long-delay file constructs into a
pool. Maybe there is a model which permits this but I think higher
abstractions like CEPH may be a better fit for the kind of distributed
filestore this implies.

I use ZFS send/receive to make "clones" of a zpool/zfs structure
exist, and they are not bound in the delay problem for live FS
delivery: you use mbuffer to make the transport of the entire ZFS
datastate more effective.

A long time ago, 25+ years ago I ran NFS over X25, as well as SMB. (ip
over X25) and it was very painful. Long-haul, long delay networks are
not a good fit for direct FS semantics of read/write if you care about
speed. There isn't enough cacheing in the world to make the distance
fully transparent. When I have to use FS-over-<thing> now it's
typically to mount virtual ISO images for console-FS recovery of a
host, and its bad enough over 150ms delay fibre links not to want to
depend on it beyond the bootstrap phase.

-G

On Mon, Aug 23, 2021 at 9:48 AM Alan Somers <asomers@freebsd.org> wrote:
>
> mbuffer is not going to help the OP.  He's trying to create a pool on top of a networked block device.  And if I understand correctly, he's connecting over a WAN, not a LAN.  ZFS will never achieve decent performance in such a setup.  It's designed as a local file system, and assumes it can quickly read metadata off of the disks at any time.  The OP's best option is to go with "a": encrypt each dataset and send them with "zfs send --raw".  I don't know why he thinks that it would be "very difficult".  It's quite easy, if he doesn't care about old snapshots.  Just:
>
> $ zfs create <crypto options> pool/new_dataset
> $ cp -a pool/old_dataset/* pool/new_dataset/
>
> -Alan
>
> On Sun, Aug 22, 2021 at 5:40 PM George Michaelson <ggm@algebras.org> wrote:
>>
>> I don't want to abuse the subject line too much, but I can highly
>> recommend the mbuffer approach, I've used this repeatedly, BSD-BSD and
>> BSD-Linux. It definitely feels faster than SSH, since the 'no cipher'
>> options were removed, and in the confusion of the HPC buffer changes.
>> But, its not crypted on-the-wire.
>>
>> Mbuffer tuning is a bit of a black art: it would help enormously if
>> there was some guidance on this, and personally I've never found the
>> mbuffer -v option to work well: I get no real sense of how full or
>> empty the buffer "is" or, if the use of sendmsg/recvmsg type buffer
>> chains is better or worse.
>>
>> -G
>>
>> On Fri, Aug 20, 2021 at 6:19 PM Ben RUBSON <ben.rubson@gmx.com> wrote:
>> >
>> > > On 19 Aug 2021, at 11:37, Peter Jeremy <peter@rulingia.com> wrote:
>> > >
>> > > (...) or a way to improve throughput doing "zfs recv" to a pool with a high RTT.
>> >
>> > You should use zfs send/receive through mbuffer, which will allow to sustain better throughput over high latency links.
>> > Feel free to play with its buffer size parameters to find the better settings, depending on your link characteristics.
>> >
>> > Ben
>> >
>> >
>>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKr6gn2RG_PQCj=2dT7Ntqp0nZRLGOOhdXgbm2e%2B%2BtJvL5WfKg>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation