From owner-freebsd-fs@freebsd.org Thu Aug 11 11:22:10 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 01555BB5B9E for ; Thu, 11 Aug 2016 11:22:10 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com [195.16.150.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7DC981A09 for ; Thu, 11 Aug 2016 11:22:09 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11]) by proxypop03.sare.net (Postfix) with ESMTPSA id D48709DC696; Thu, 11 Aug 2016 13:22:05 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Borja Marcos In-Reply-To: <20160811110235.GN70364@mordor.lan> Date: Thu, 11 Aug 2016 13:22:05 +0200 Cc: freebsd-fs@freebsd.org, Jordan Hubbard Content-Transfer-Encoding: quoted-printable Message-Id: References: <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <20160811101539.GM70364@mordor.lan> <20160811110235.GN70364@mordor.lan> To: Julien Cigar X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Aug 2016 11:22:10 -0000 > On 11 Aug 2016, at 13:02, Julien Cigar wrote: >=20 > On Thu, Aug 11, 2016 at 12:15:39PM +0200, Julien Cigar wrote: >> On Thu, Aug 11, 2016 at 11:24:40AM +0200, Borja Marcos wrote: >>>=20 >>>> On 11 Aug 2016, at 11:10, Julien Cigar = wrote: >>>>=20 >>>> As I said in a previous post I tested the zfs send/receive approach = (with >>>> zrep) and it works (more or less) perfectly.. so I concur in all = what you >>>> said, especially about off-site replicate and synchronous = replication. >>>>=20 >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the = moment,=20 >>>> I'm in the early tests, haven't done any heavy writes yet, but ATM = it=20 >>>> works as expected, I havent' managed to corrupt the zpool. >>>=20 >>> I must be too old school, but I don=E2=80=99t quite like the idea of = using an essentially unreliable transport >>> (Ethernet) for low-level filesystem operations. >>>=20 >>> In case something went wrong, that approach could risk corrupting a = pool. Although, frankly, >=20 > Now I'm thinking of the following scenario: > - filer1 is the MASTER, filer2 the BACKUP > - on filer1 a zpool data mirror over loc1, loc2, rem1, rem2 (where = rem1=20 > and rem2 are iSCSI disks) > - the pool is mounted on MASTER >=20 > Now imagine that the replication interface corrupts packets silently, > but data are still written on rem1 and rem2. Does ZFS will detect=20 > immediately that written blocks on rem1 and rem2 are corrupted? As far as I know ZFS does not read after write. It can detect silent = corruption when reading a file or a metadata block, but that will happen only when requested (file), = when needed (metadata) or in a scrub. It doesn=E2=80=99t do preemptive read-after-write, I = think. Or I don=E2=80=99t recall having read it. Silent corruption can be overcome by ZFS as long as it isn=E2=80=99t too = much. In my case with the evil HBA it was like a block operation error in an hour of intensive = I/O. In normal operation it could be a block error in a week or so. With that error rate, the chance of a = random I/O error corrupting the same block in three different devices (it=E2=80=99s a raidz2 vdev) are = really remote.=20 But, again, and I won=E2=80=99t push more at the risk of annoying you to = death. Just, think that your I/O=20 throughput will be bound by your network and iSCSI performance, anyway = ;) Borja. P.D: I forgot to reply to this before: >> Yeah.. although you could have silent data corruption with any broken >> hardware too. Some years ago I suffered a silent data corruption due = to=20 >> a broken RAID card, and had to restore from backups.. Ethernet hardware is designed with the assumption that the loss of a = packet is not such a big deal.=20 Shit happens on SAS and other specialized storage networks of course, = but you should expect it to be=20 at least a bit less. ;)