From owner-freebsd-fs@freebsd.org  Thu Aug 11 11:22:10 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 01555BB5B9E
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 11 Aug 2016 11:22:10 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com
 [195.16.150.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7DC981A09
 for <freebsd-fs@freebsd.org>; Thu, 11 Aug 2016 11:22:09 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop03.sare.net (Postfix) with ESMTPSA id D48709DC696;
 Thu, 11 Aug 2016 13:22:05 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <20160811110235.GN70364@mordor.lan>
Date: Thu, 11 Aug 2016 13:22:05 +0200
Cc: freebsd-fs@freebsd.org,
 Jordan Hubbard <jkh@ixsystems.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F46B3811-52E3-4D31-AA19-5D0D2E023D3A@sarenet.es>
References: <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <20160811101539.GM70364@mordor.lan> <20160811110235.GN70364@mordor.lan>
To: Julien Cigar <julien@perdition.city>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Aug 2016 11:22:10 -0000


> On 11 Aug 2016, at 13:02, Julien Cigar <julien@perdition.city> wrote:
>=20
> On Thu, Aug 11, 2016 at 12:15:39PM +0200, Julien Cigar wrote:
>> On Thu, Aug 11, 2016 at 11:24:40AM +0200, Borja Marcos wrote:
>>>=20
>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> =
wrote:
>>>>=20
>>>> As I said in a previous post I tested the zfs send/receive approach =
(with
>>>> zrep) and it works (more or less) perfectly.. so I concur in all =
what you
>>>> said, especially about off-site replicate and synchronous =
replication.
>>>>=20
>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the =
moment,=20
>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM =
it=20
>>>> works as expected, I havent' managed to corrupt the zpool.
>>>=20
>>> I must be too old school, but I don=E2=80=99t quite like the idea of =
using an essentially unreliable transport
>>> (Ethernet) for low-level filesystem operations.
>>>=20
>>> In case something went wrong, that approach could risk corrupting a =
pool. Although, frankly,
>=20
> Now I'm thinking of the following scenario:
> - filer1 is the MASTER, filer2 the BACKUP
> - on filer1 a zpool data mirror over loc1, loc2, rem1, rem2 (where =
rem1=20
> and rem2 are iSCSI disks)
> - the pool is mounted on MASTER
>=20
> Now imagine that the replication interface corrupts packets silently,
> but data are still written on rem1 and rem2. Does ZFS will detect=20
> immediately that written blocks on rem1 and rem2 are corrupted?

As far as I know ZFS does not read after write. It can detect silent =
corruption when reading a file
or a metadata block, but that will happen only when requested (file), =
when needed (metadata)
or in a scrub. It doesn=E2=80=99t do preemptive read-after-write, I =
think. Or I don=E2=80=99t recall having read it.

Silent corruption can be overcome by ZFS as long as it isn=E2=80=99t too =
much. In my case with the
evil HBA it was like a block operation error in an hour of intensive =
I/O. In normal operation it could
be a block error in a week or so. With that error rate, the chance of a =
random I/O error corrupting the
same block in three different devices (it=E2=80=99s a raidz2 vdev) are =
really remote.=20

But, again, and I won=E2=80=99t push more at the risk of annoying you to =
death. Just, think that your I/O=20
throughput will be bound by your network and iSCSI performance, anyway =
;)


Borja.


P.D: I forgot to reply to this before:

>> Yeah.. although you could have silent data corruption with any broken
>> hardware too. Some years ago I suffered a silent data corruption due =
to=20
>> a broken RAID card, and had to restore from backups..

Ethernet hardware is designed with the assumption that the loss of a =
packet is not such a big deal.=20
Shit happens on SAS and other specialized storage networks of course, =
but you should expect it to be=20
at least a bit less. ;)