Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jul 2016 17:50:34 +0200
From:      Ben RUBSON <ben.rubson@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   Re: HAST + ZFS + NFS + CARP
Message-ID:  <28C56E3E-E72E-4FD0-A6BB-CE3FC4277A10@gmail.com>
In-Reply-To: <3009ac40-5a29-6f05-ced3-326c9a87c9b2@rlwinm.de>
References:  <20160630144546.GB99997@mordor.lan> <71b8da1e-acb2-9d4e-5d11-20695aa5274a@internetx.com> <AD42D8FD-D07B-454E-B79D-028C1EC57381@gmail.com> <20160630153747.GB5695@mordor.lan> <63C07474-BDD5-42AA-BF4A-85A0E04D3CC2@gmail.com> <678321AB-A9F7-4890-A8C7-E20DFDC69137@gmail.com> <20160630185701.GD5695@mordor.lan> <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com> <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan> <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <3009ac40-5a29-6f05-ced3-326c9a87c9b2@rlwinm.de>

next in thread | previous in thread | raw e-mail | index | archive | help

> On 12 Jul 2016, at 15:15, Jan Bramkamp <crest@rlwinm.de> wrote:
>=20
> On 04/07/16 19:55, Jordan Hubbard wrote:
>>=20
>>> On Jul 3, 2016, at 11:05 PM, Ben RUBSON <ben.rubson@gmail.com> =
wrote:
>>>=20
>>> Of course Jordan, in this topic, we (well at least me :) make the =
following assumption :
>>> one iSCSI target/disk =3D one real physical disk (a SAS disk, a SSD =
disk...), from a server having its own JBOD, no RAID adapter or =
whatever, just what ZFS likes !
>>=20
>> I certainly wouldn=E2=80=99t make that assumption.  Once you allow =
iSCSI to be the back-end in any solution, end-users will avail =
themselves of the flexibility to also export arbitrary or synthetic =
devices (like zvols / RAID devices) as =E2=80=9Cdisks=E2=80=9D.  You =
can=E2=80=99t stop them from doing so, so you might as well incorporate =
that scenario into your design.  Even if you could somehow enforce the =
1:1 mapping of LUN to disk, iSCSI itself is still going to impose a =
serialization / performance / reporting (iSCSI LUNs don=E2=80=99t report =
SMART status) penalty that removes a lot of the advantages of having =
direct physical access to the media, so one might also ask what you=E2=80=99=
re gaining by imposing those restrictions.
>=20
>=20
> How about 3way ZFS mirrors spread over three SAS JBODs with =
dual-ported expanders connected to two FreeBSD servers with SAS HBAs and =
a *reliable* arbiter to the disks. This could either be an external =
locking server e.g. consul/etcd/zookeeper and/or SCSI reservations. If =
more than two head servers are to share the disks a pair of SAS switches =
should do the job.

It would be nice if it could work without a third server, so one =
important / interesting thing to test would be the SCSI reservations : =
be sure that when the pool is imported on MASTER, SLAVE can't use the =
disks anymore.
(this is the case with iSCSI, when SLAVE exports its disks through CTL, =
it can't import them using ZFS as CTL locks them as soon as it it =
started)

> If N-1 disk redundancy is enough two JBODs and 2way mirrors would work =
as well.

Or if we only have 2 JBODs (for whatever reason), we could (should =
certainly :) use 4way mirrors so that if one JBOD dies, we're still =
confident with the pool.

> While you can't prevent stupid operators from blowing their feet of it =
doesn't offer the same "flexibility" as iSCSI if only because you can't =
conveniently hookup everything talking Ethernet offering itself als =
iSCSI target. That is until someone implements a SAS target with CTL and =
a suitable HBA in FreeBSD ;-).

Why would you prefer a SAS target over an iSCSI target ?
How would it fit ?

> This kind of setup should also preserve all assumptions ZFS has =
regarding disks.

Yep, although AFAIR no one demonstrated ZFS suffers from iSCSI :) (devs =
on #openzfs stated it does not)

Anyway, this is nice SAS-only setup, which avoids an additional =
protocol, a very good reason to go with it.
One good reason for iSCSI is that it allows servers to be in different =
racks (well there are long SAS cables) / different rooms / buildings.

> I have the required spare hardware to build a two JBOD test setup [1] =
and could run some tests if anyone is interested in such a setup.
>=20
>=20
> [1]: Test setup
>=20
>    +-----------+    +-----------+
>    | MASTER    |    | SLAVE     |
>    |           |    |           |
>    | HBA0 HBA1 |    | HBA0 HBA1 |
>    +--+----+---+    +--+----+---+
>       ^    ^           ^    ^
>       |    |           |    |
>       |    |           |    +------+
>       |    |           |           |
>       |    |           +----+      |
>       |    |                |      |
>       |    +-----------+    |      |
>       |                |    |      |
>       v                v    v      |
>    +--+--------+    +--+----+---+  |
>    | JBOD 0    |    | JBOD 1    |  |
>    +-------+---+    +-----------+  |
>            ^                       |
>            |                       |
>            +-----------------------+




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?28C56E3E-E72E-4FD0-A6BB-CE3FC4277A10>