From owner-freebsd-fs@freebsd.org  Thu Jun 30 15:42:13 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D328CB86615
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 30 Jun 2016 15:42:13 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x231.google.com (mail-wm0-x231.google.com
 [IPv6:2a00:1450:400c:c09::231])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 68666214B
 for <freebsd-fs@freebsd.org>; Thu, 30 Jun 2016 15:42:13 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x231.google.com with SMTP id a66so124581142wme.0
 for <freebsd-fs@freebsd.org>; Thu, 30 Jun 2016 08:42:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:content-transfer-encoding:mime-version:subject:message-id:date
 :references:in-reply-to:to;
 bh=EI3XykD+h8ujZGRsw/tzjnPL+7Orw+s8umEpNec4vgE=;
 b=CBBnLykSiIUEWDUIxHTGVFcks1MDg3e6WUnwjaMfLiM8AFVVZy5jOmITNL/XsM9bhl
 cG+GZhPc2YYLkaiJHZ1K+1uzl9egPRhjf9SD90cbMTUtTc6fjy3a3leifToZ2oHOOXJe
 SzXiwEmCwTBr+kNd7aYaYrlgaq386XPkyCV4p/uX9UtFoRQJocmsN3ixms/RlXPxKlko
 S5FLpEIPV33FL4bs90ldZ8bwgYXrYFVxjvJnD3eMFnAJOSyQMZP07GW5TKnenpnC1bnN
 g6ijcm/chfGaFzowx/tTnW53QvAgbK2xbgqeqSwJNppVlkoQa02egGWeUiuSbfGIntUQ
 I1Dg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:content-transfer-encoding:mime-version
 :subject:message-id:date:references:in-reply-to:to;
 bh=EI3XykD+h8ujZGRsw/tzjnPL+7Orw+s8umEpNec4vgE=;
 b=LUg7cOhNuUlwhl21kiFXFu7Uq1KkdrYxNlmHKuSi5iC8zpZesXTVNKjJJD56qVY7n5
 xo3C43wkS8c1kdF2Ghna4sX96snxgmYTUFwfz6qnWyOkQMSrfrKgKaID5702DPidrhU7
 TCn5D4PYgZkWdvYLJGwdXPr0kQnNKV0yuvx26CBy/JzSlQ9JUC9FhxuGHUogWGoUTmgf
 C2dJYyGisVlAsLVN46oddA4bALEvYV5poXDGpNkKjL3GCwCIn+rthXkZotZT5lpMvfl8
 kN/C4kkYdSubEXcbszg+/jJBG6NLuEVvQ70sMXtgj/PCcG0wrDG9mcDCGMp89FR+QEfW
 3eIw==
X-Gm-Message-State: ALyK8tK7p1dovE/s9lbkhuC+pXJA1s7VrGkUZ1e/fuv7UOobxffZu0AFoCbcTLzyeHfUJg==
X-Received: by 10.194.110.234 with SMTP id id10mr14402863wjb.17.1467301331537; 
 Thu, 30 Jun 2016 08:42:11 -0700 (PDT)
Received: from [10.149.45.183] ([80.12.43.183])
 by smtp.gmail.com with ESMTPSA id bh4sm3887451wjc.43.2016.06.30.08.42.10
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Thu, 30 Jun 2016 08:42:10 -0700 (PDT)
From: Ben RUBSON <ben.rubson@gmail.com>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (1.0)
Subject: Re: HAST + ZFS + NFS + CARP
Message-Id: <63C07474-BDD5-42AA-BF4A-85A0E04D3CC2@gmail.com>
Date: Thu, 30 Jun 2016 17:42:04 +0200
References: <20160630144546.GB99997@mordor.lan>
 <71b8da1e-acb2-9d4e-5d11-20695aa5274a@internetx.com>
 <AD42D8FD-D07B-454E-B79D-028C1EC57381@gmail.com>
 <20160630153747.GB5695@mordor.lan>
In-Reply-To: <20160630153747.GB5695@mordor.lan>
To: freebsd-fs@freebsd.org
X-Mailer: iPhone Mail (13F69)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2016 15:42:13 -0000


> On 30 Jun 2016, at 17:37, Julien Cigar <julien@perdition.city> wrote:
>=20
>> On Thu, Jun 30, 2016 at 05:28:41PM +0200, Ben RUBSON wrote:
>>=20
>>> On 30 Jun 2016, at 17:14, InterNetX - Juergen Gotteswinter <jg@internetx=
.com> wrote:
>>>=20
>>>=20
>>>=20
>>>> Am 30.06.2016 um 16:45 schrieb Julien Cigar:
>>>> Hello,
>>>>=20
>>>> I'm always in the process of setting a redundant low-cost storage for=20=

>>>> our (small, ~30 people) team here.
>>>>=20
>>>> I read quite a lot of articles/documentations/etc and I plan to use HAS=
T
>>>> with ZFS for the storage, CARP for the failover and the "good old NFS"
>>>> to mount the shares on the clients.
>>>>=20
>>>> The hardware is 2xHP Proliant DL20 boxes with 2 dedicated disks for the=

>>>> shared storage.
>>>>=20
>>>> Assuming the following configuration:
>>>> - MASTER is the active node and BACKUP is the standby node.
>>>> - two disks in each machine: ada0 and ada1.
>>>> - two interfaces in each machine: em0 and em1
>>>> - em0 is the primary interface (with CARP setup)
>>>> - em1 is dedicated to the HAST traffic (crossover cable)
>>>> - FreeBSD is properly installed in each machine.
>>>> - a HAST resource "disk0" for ada0p2.
>>>> - a HAST resource "disk1" for ada1p2.
>>>> - a zpool create zhast mirror /dev/hast/disk0 /dev/hast/disk1 is create=
d
>>>> on MASTER
>>>>=20
>>>> A couple of questions I am still wondering:
>>>> - If a disk dies on the MASTER I guess that zpool will not see it and
>>>> will transparently use the one on BACKUP through the HAST ressource..
>>>=20
>>> thats right, as long as writes on $anything have been successful hast is=

>>> happy and wont start whining
>>>=20
>>>> is it a problem?=20
>>>=20
>>> imho yes, at least from management view
>>>=20
>>>> could this lead to some corruption?
>>>=20
>>> probably, i never heard about anyone who uses that for long time in
>>> production
>>>=20
>>> At this stage the
>>>> common sense would be to replace the disk quickly, but imagine the
>>>> worst case scenario where ada1 on MASTER dies, zpool will not see it=20=

>>>> and will transparently use the one from the BACKUP node (through the=20=

>>>> "disk1" HAST ressource), later ada0 on MASTER dies, zpool will not=20
>>>> see it and will transparently use the one from the BACKUP node=20
>>>> (through the "disk0" HAST ressource). At this point on MASTER the two=20=

>>>> disks are broken but the pool is still considered healthy ... What if=20=

>>>> after that we unplug the em0 network cable on BACKUP? Storage is
>>>> down..
>>>> - Under heavy I/O the MASTER box suddently dies (for some reasons),=20
>>>> thanks to CARP the BACKUP node will switch from standy -> active and=20=

>>>> execute the failover script which does some "hastctl role primary" for
>>>> the ressources and a zpool import. I wondered if there are any
>>>> situations where the pool couldn't be imported (=3D data corruption)?
>>>> For example what if the pool hasn't been exported on the MASTER before
>>>> it dies?
>>>> - Is it a problem if the NFS daemons are started at boot on the standby=

>>>> node, or should they only be started in the failover script? What
>>>> about stale files and active connections on the clients?
>>>=20
>>> sometimes stale mounts recover, sometimes not, sometimes clients need
>>> even reboots
>>>=20
>>>> - A catastrophic power failure occur and MASTER and BACKUP are suddentl=
y
>>>> powered down. Later the power returns, is it possible that some
>>>> problem occur (split-brain scenario ?) regarding the order in which the=

>>>=20
>>> sure, you need an exact procedure to recover
>>>=20
>>>> two machines boot up?
>>>=20
>>> best practice should be to keep everything down after boot
>>>=20
>>>> - Other things I have not thought?
>>>>=20
>>>=20
>>>=20
>>>=20
>>>> Thanks!
>>>> Julien
>>>>=20
>>>=20
>>>=20
>>> imho:
>>>=20
>>> leave hast where it is, go for zfs replication. will save your butt,
>>> sooner or later if you avoid this fragile combination
>>=20
>> I was also replying, and finishing by this :
>> Why don't you set your slave as an iSCSI target and simply do ZFS mirrori=
ng ?
>=20
> Yes that's another option, so a zpool with two mirrors (local +=20
> exported iSCSI) ?

Yes, you would then have a real time replication solution (as HAST), compare=
d to ZFS send/receive which is not.
Depends on what you need :)

>=20
>> ZFS would then know as soon as a disk is failing.
>> And if the master fails, you only have to import (-f certainly, in case o=
f a master power failure) on the slave.
>>=20
>> Ben