From owner-freebsd-fs@freebsd.org  Sun Aug 14 05:53:35 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 447D7BB96AA;
 Sun, 14 Aug 2016 05:53:35 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-yw0-x22e.google.com (mail-yw0-x22e.google.com
 [IPv6:2607:f8b0:4002:c05::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 061121B7B;
 Sun, 14 Aug 2016 05:53:35 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: by mail-yw0-x22e.google.com with SMTP id j12so12734128ywb.2;
 Sat, 13 Aug 2016 22:53:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:from:date:message-id:subject:to;
 bh=RaM3dVYlvXRPMFZLb8yN69GZIDDEa5ha9zG8kaX9QNU=;
 b=gGpbo1vAycUyWOTwN02KNZ5YKpusSMB0rYfBUvAfwNy5tabl1AjFljSKeUrtCF5IAA
 U9ojzgVyXbxvVOC2N2pcn30N8N+fLvfKf1Uf4Tv5RWSc+s6hujaDjFO2/Gc7NnsHkgEl
 N+8N4fN3wda5S4dJw54f1SpRUbAK9KKG0rmorWTZqKiowx8EhGGMdLPJ8HGPHgrV2wml
 9c+0IXchmshN/oJrMPkS7I6p3MS2mvFIZqHXJ4uyi+FT0Am8PNbq4YtGVZGLRx9PB1R+
 b27XX5CEonddvbLoEWKbGOGcn1UyawauDXFQxRfHhZ5Y3TA0DSo3D+hAfa0Q7PYMuJGE
 JSWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=RaM3dVYlvXRPMFZLb8yN69GZIDDEa5ha9zG8kaX9QNU=;
 b=BQtdntLXEB4JvMeJ/P1GERqzamebjq/qhYnJalYGRAuyCYXP4PeS3lOQDXwZX7iKyM
 bEq91OD16GoiQu51oEORkfuq6ZpVcucoIsLhI5ffW4KKK7JINLzFURpqULwJ1O649h0n
 WeJUOJVUWQjYncc0SvjjgYZ3HDu69g1kNy7JfzWQj6BxTYubnRheiFgJsL1zN92NXZAx
 icIN2jw0zYUpVoWmMZVlXiCwFubC/nPOG7LsZ6dR3H8ECRLTTTVe7UaocOZwlGRfBW7A
 jcJNPXWvUi7sL8QOimznNtrOF8/JUUDW6uCBEzEBnTIqm8r3ikxH0QLUYMDkn+1SZ+bZ
 UVSA==
X-Gm-Message-State: AEkooutP2hEOoCavoZ8o3dDfH+7KENe7Cap3UwX6YWkw1p3iGKZ1vhD8QfDXF9wehwyBiP3DRsWE6CA6DmogsQ==
X-Received: by 10.13.195.67 with SMTP id f64mr15761801ywd.1.1471154013865;
 Sat, 13 Aug 2016 22:53:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.161.37 with HTTP; Sat, 13 Aug 2016 22:53:33 -0700 (PDT)
From: Zaphod Beeblebrox <zbeeble@gmail.com>
Date: Sun, 14 Aug 2016 01:53:33 -0400
Message-ID: <CACpH0McAj=H9sgndnZXJXX1i1=0hbBKGu0Qzr9xmSOS_+FDdqg@mail.gmail.com>
Subject: ZFS corrupt DVA panic: can it be fixed?
To: freebsd-fs <freebsd-fs@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Aug 2016 05:53:35 -0000

Before this problem, I had a few crashes... which may have been hardware
related.  The hardware is (I think) fixed, but this problem remains.  My
searches seem to indicate that this has happened to other people.

... I've pasted here only the first two lines of the last 3 panic's I've
had.

panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144
#3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1
panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144
#3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1
panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144
#3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1

I gather that the machine runs until something causes the kernel to
encounter the corrupt DVA.  I gather from reading stuff that this is part
of the structure that holds free space on the drive.  Since the numbers are
the same in each panic, I'm assuming that each panic is encountering the
same one.  This is also the panic that is not dumping properly to either
USB or spinning disk.

I have zdb -uuumcD running right now.  It seems to estimate that it's going
to take an awfully long time, but the estimation might be broken because
it's on 159 of 171 of whatever it's reading.

Now... question: is this fixable?  Can I just mark off the space as
unusable, maybe?

Since this has happened to more than one person, I gather it's a
significant hole in the claim that ZFS is crashproof (or that it doesn't
need repair after crashing).

Maybe this check can be added to scrub (or scrub + an option)?  Or maybe
when we run across it, we fix it?  Does fixing it (in the theoretical
sense) require knowing all the free space on the drive?  Doesn't scrub do
that?

From owner-freebsd-fs@freebsd.org  Sun Aug 14 20:04:09 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 90D06BBAD1E
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 14 Aug 2016 20:04:09 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-yw0-x22b.google.com (mail-yw0-x22b.google.com
 [IPv6:2607:f8b0:4002:c05::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 52D441B9E
 for <freebsd-fs@freebsd.org>; Sun, 14 Aug 2016 20:04:09 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: by mail-yw0-x22b.google.com with SMTP id j12so17942397ywb.2
 for <freebsd-fs@freebsd.org>; Sun, 14 Aug 2016 13:04:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:from:date:message-id:subject:to;
 bh=KeA8ggAFEDS1QhKdaVX+tKXmWMPLFexTuiAUZpO3A6E=;
 b=Pfp3BC3y2bCt9vuLBEwWXSi+13dSFJPQMOPkIKdPgkbCde4gLEZnJebUnqal0ai0dA
 2GohEuQX9eZWkEMnYpE7KK+C4MQ+tflZhn75bhMjUsH9GGaaaLwXPxQkA5XsekUQLp3X
 eDb5nXi+UtG5e182b4BzdXMDOLsl5YpdxzCUUH26ZLImX8Xx87JDMWi4bqOQgSZIO1fv
 wk6mE/1t2BWzFyz3bIMSpazJcqUs6UgJbDaXulFedXLdZ7eslnH86kYocui51glFNPES
 lErxPRCdUC7xZKKYo/+Rkju+b45JmQIDhnJKj7DOcHalTSVYpJxJQ7v0hqOqqO7aBd6h
 mzQg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=KeA8ggAFEDS1QhKdaVX+tKXmWMPLFexTuiAUZpO3A6E=;
 b=V6EODAJYjWB1FunRKy4mFL9523HGGwm3cXZ8dYnm3kSOzB4fonDDx0HRme7gxylJcI
 y0+UVMkpoBBd2BfxbzB9BO5UdhfeiyhW49xl2rMeQMq4UqPXbUkw0TNE/ZIZzNY0GfWw
 1gWbrQrjFs/eeJsTDVMdd+yktTPd1ReRWoYeiYEhPpjmqdbQiyzZBL4R1vXqXI0zuJ8I
 p9IsitavId0MIohFJgkBmeHIe9y3oQhR5dXIJfhghrIGBSVf1AaiJ+1c1z4AUmO2epQO
 7CLfioD1HDgDDCJiJFNAFMDDiesznRbne9x6laiN+5IRIJehWJqeAvQTEk5OAK61fDfl
 aXsw==
X-Gm-Message-State: AEkoouvwPpyWki6r/lf1qSdKS4el6YxV4UcB1yAYz3fzvWiwk9Np1M3Akr2sJP52XJ52P/8Il0ueuXqwacTVQA==
X-Received: by 10.129.115.131 with SMTP id o125mr17405024ywc.99.1471205047769; 
 Sun, 14 Aug 2016 13:04:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.161.37 with HTTP; Sun, 14 Aug 2016 13:04:07 -0700 (PDT)
From: Zaphod Beeblebrox <zbeeble@gmail.com>
Date: Sun, 14 Aug 2016 16:04:07 -0400
Message-ID: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>
Subject: zfs_recovery=1, zdb, mounted pool?
To: freebsd-fs <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Aug 2016 20:04:09 -0000

So... I found 319 of the errno 122 errors by running zdb.  My question is
this:

Can I run with zfs_recovery=1 and have zdb fix these (which are free space
leaked errors) while the system is running?

From owner-freebsd-fs@freebsd.org  Sun Aug 14 20:21:52 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88A48BBAFCA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 14 Aug 2016 20:21:52 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com
 [IPv6:2a00:1450:400c:c09::241])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1BFEB12F0
 for <freebsd-fs@freebsd.org>; Sun, 14 Aug 2016 20:21:52 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x241.google.com with SMTP id o80so7898744wme.0
 for <freebsd-fs@freebsd.org>; Sun, 14 Aug 2016 13:21:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=5R6cu/LHZsRzYjaOMPwyV7bv6xVtn+EQldW/WrjO9vk=;
 b=dT2l8/2YJBu0B63ZdHdXf8jzTF442PZMPuZpAg2v6gj9Bx1/4TEjhrbk5krXa8hHcn
 ULCaieAX9zSnKn+CT9FNu2TWQz6mvIwLLChMFK3IlLBLlHeLLlV34WXyB0jluZS4yi6N
 VrHFOa00HzU6npFZ/g24rHJ1lIlAXx7J6K8TGpI9RDDxGyEdVuhQFHoZGTtBSRPrZiZM
 YIO21NmbIw8GRT8gNUnj+NxgLNZ7WOsBsaXPF4K9qI1s/RNNTEwPCHHt3Li6o/X94oRQ
 mhH8RQny6Jdy1tC22SZdMIbPSlRKdlJ1HjGcxsSkfDKLfsnxASnnlxgfJWV2ITsEybx/
 HoZw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=5R6cu/LHZsRzYjaOMPwyV7bv6xVtn+EQldW/WrjO9vk=;
 b=D9AbMtvcuxienCGuIK4GbzWOTTRSt51dn5XoN6GxM5dqrnZcMseIAXOXi1jF/6W1zU
 yr/3fNgqHND/qV9cwRtN22PCd1wc6xfKIssgopDEPBCDVlAhe58mH4pS/lcOfyEUbATl
 DiAzQhp01G3pSlUPP50zHkDM7Ggpv8SzUZFgBCZndlRPHnlixcRAwXfjjHbwjocdfNVR
 tzHyCa9Y9J1/CbJSMIRqklkPupMetV0bPWWVxdua0WLiHUC+GuCxzWM0BVuvRJobovil
 3G9YvAbwG9BfyPuXVm2TkK2agaf99URu27VLlnx3BdA6u+xLL+nHB207duXExyiKraIo
 c5Sw==
X-Gm-Message-State: AEkoouuIegt27R4DI13nlofqi5fIwjge/ixhreFWms/jUZyLJnbMVIvHHnQhi1R18/LjwQ==
X-Received: by 10.194.184.148 with SMTP id eu20mr26620245wjc.137.1471206110284; 
 Sun, 14 Aug 2016 13:21:50 -0700 (PDT)
Received: from macbook-air-de-benjamin.home
 (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85])
 by smtp.gmail.com with ESMTPSA id r67sm13134423wmb.14.2016.08.14.13.21.49
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Sun, 14 Aug 2016 13:21:49 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: [iSCSI] Trying to reach max disk throughput
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <7CE3E62B-8251-4390-BD90-CF2F76F57CA7@gmail.com>
Date: Sun, 14 Aug 2016 22:21:48 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <A28531DD-65FC-4CE0-A40F-CF9B569C3AEB@gmail.com>
References: <6B32251D-49B4-4E61-A5E8-08013B15C82B@gmail.com>
 <20160810114404.GA80485@brick>
 <7CE3E62B-8251-4390-BD90-CF2F76F57CA7@gmail.com>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Aug 2016 20:21:52 -0000


> On 10 Aug 2016, at 15:27, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
>>=20
>> On 10 Aug 2016, at 13:44, Edward Tomasz Napiera=C5=82a =
<trasz@FreeBSD.org> wrote:
>>=20
>> On 0810T1154, Ben RUBSON wrote:
>>> Hello,
>>>=20
>>> I'm facing something strange with iSCSI, I can't manage to reach the =
expected disk throughput using one (read or write) thread.
>>=20
>> [..]
>>=20
>>> ### Initiator : iscsi disk throughput :
>>>=20
>>> ## dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920
>>> 10737418240 bytes transferred in 34.731815 secs (309152234 =
bytes/sec) - 295MB/s
>>>=20
>>> With 2 parallel dd jobs : 345MB/s
>>> With 4 parallel dd jobs : 502MB/s
>>>=20
>>>=20
>>>=20
>>> ### Questions :
>>>=20
>>> Why such a difference ?
>>> Where are the 167MB/s (462-295) lost ?
>>=20
>> Network delays, I suppose.
>=20
> I just saw that iSER is available in FreeBSD 11, let's install BETA4 =
and give it a try.

OK, as a target I used Linux TGT, as iSER target (isert) is not FreeBSD =
available yet.

### Target : local disk throughput, one thread :
# dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920
10737418240 bytes (11 GB) copied, 21.3898 s, 502 MB/s

### Initiator : iscsi disk throughput, one thread :
# dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920
10737418240 bytes transferred in 34.938676 secs (307321843 bytes/sec) - =
293 MB/s

### Initiator : iSER disk throughput, one thread :
# dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920
10737418240 bytes transferred in 20.371947 secs (527068838 bytes/sec) - =
502 MB/s

No need to comment, let's wait for the FreeBSD iSER target then !

Ben


From owner-freebsd-fs@freebsd.org  Sun Aug 14 23:21:21 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96AC7BB939A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 14 Aug 2016 23:21:21 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8640D1180
 for <freebsd-fs@FreeBSD.org>; Sun, 14 Aug 2016 23:21:21 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7ENLLG3030929
 for <freebsd-fs@FreeBSD.org>; Sun, 14 Aug 2016 23:21:21 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211491] System hangs after "Uptime" on reboot with ZFS
Date: Sun, 14 Aug 2016 23:21:21 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA3
X-Bugzilla-Keywords: needs-qa
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: delphij@FreeBSD.org
X-Bugzilla-Status: Open
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: mfc-stable10? mfc-stable11?
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211491-3630-AFzIirlFvS@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211491-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211491-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Aug 2016 23:21:21 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211491

--- Comment #17 from Xin LI <delphij@FreeBSD.org> ---
I can't seem to be able to reproduce this anymore on -CURRENT (currently at
r304072), FYI.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Sun Aug 14 23:50:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8CF3FBB9C77
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 14 Aug 2016 23:50:28 +0000 (UTC)
 (envelope-from delphij@delphij.net)
Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "anubis.delphij.net",
 Issuer "StartCom Class 1 DV Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 65EA4151B
 for <freebsd-fs@freebsd.org>; Sun, 14 Aug 2016 23:50:28 +0000 (UTC)
 (envelope-from delphij@delphij.net)
Received: from Xins-MacBook-Pro.local (c-73-189-16-150.hsd1.ca.comcast.net
 [73.189.16.150])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by anubis.delphij.net (Postfix) with ESMTPSA id 6269E1D6D6;
 Sun, 14 Aug 2016 16:50:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net;
 s=anubis; t=1471218627; x=1471233027;
 bh=LOypc6xwXvwwp4k4yGYzcnv4yoVNX9VjN58cV1kwTps=;
 h=Subject:To:References:Cc:From:Date:In-Reply-To;
 b=YRwJYD9zDAy4wob3zje0fVIRyxhfpGv4v+oQKRIVveuHsRX7MhlP+zWc0BiSTOXs7
 WSE8rIgwVNL1Lq6Cv/jvUFWMSL3c6245yN6yS5M7WPiTkLUFcQbe31W5T+5Izd9L4M
 5CkGVRz9Cn6DVJw91svQZ1RQjfGyRAe1EkcaIC8A=
Subject: Re: zfs_recovery=1, zdb, mounted pool?
To: Zaphod Beeblebrox <zbeeble@gmail.com>, freebsd-fs <freebsd-fs@freebsd.org>
References: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>
Cc: d@delphij.net
From: Xin Li <delphij@delphij.net>
Message-ID: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net>
Date: Sun, 14 Aug 2016 16:50:22 -0700
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature";
 boundary="UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Aug 2016 23:50:28 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU
Content-Type: multipart/mixed; boundary="eFMcj68R53sJjl4IQb2Qkt31WeWffRswv"
From: Xin Li <delphij@delphij.net>
To: Zaphod Beeblebrox <zbeeble@gmail.com>, freebsd-fs <freebsd-fs@freebsd.org>
Cc: d@delphij.net
Message-ID: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net>
Subject: Re: zfs_recovery=1, zdb, mounted pool?
References: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>
In-Reply-To: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>

--eFMcj68R53sJjl4IQb2Qkt31WeWffRswv
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 8/14/16 13:04, Zaphod Beeblebrox wrote:
> So... I found 319 of the errno 122 errors by running zdb.  My question =
is
> this:
>=20
> Can I run with zfs_recovery=3D1 and have zdb fix these (which are free =
space
> leaked errors) while the system is running?

No.

If I was you I would definitely do a full backup to a different place,
recreate the pool and restore from the backup.

It's not safe to use your pool as-is, don't do it for everybody's sake.

Cheers,


--eFMcj68R53sJjl4IQb2Qkt31WeWffRswv--

--UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJXsQPBAAoJEJW2GBstM+nsD2QP/2uZvmOfbT4z1c8JeonIYpHi
8iZqKHUXI4LendkOlO42apcQrn/+TtRRg/gfjdxMKUta2S7bEMTpOB2Z0N6z7/5x
KBGKvnL79U1oeygOVPFKlPiNEiHzMi2pqQSAAZ6e+stiq7XpcJNgnNm0h8O7Vocn
gbD9b2vL3waMrzliJkARg2mdNx87qcVgqEGC6ErlU45I4S4la7/mGhF4fPDVwwIW
O2g3472RXxDQMgHeyIdzrb+UbPe/BecMLNfIK/oYYzddMXljfmIX/BpYShUW0Apk
0Fh5YwUwYjj/SpENFiGxvMwujqT3W3uBcNZXViOLy6V5Av4xDLcu9QXI4vQMBP1K
rdhy+49grp5Wg5qmgTa5ITur/Rr7NQly/VjXSfbEVSV/36iLr3JueYL6vg9jWt9n
nXgDBDgiMyw4Epds39gLCe+U9Vs7OICRlINmukWXB3J9vJZj0utuhadOvkRwfFig
iZ7qFTTIBrkb5E64w4Y+Q/JRR2otKDAjhOX8m0aeCIaWVobI1jRrGDu4EBLbdC/l
sSFnzA5wjbeYrV7V97PY1I6fRRcU5SQhQfJ1iuxORKT2IExCHiOoKEjOs9dlXBXR
umi4xCE2iimWBiQqCug6aDjo+1qyls5elCHw1RXO3vITkfN5jmqhDafVBnwOkN1L
3vr2XgOL8UPFhjj+ns+e
=Pcvg
-----END PGP SIGNATURE-----

--UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU--

From owner-freebsd-fs@freebsd.org  Mon Aug 15 04:46:23 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CC587BB7A9D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 15 Aug 2016 04:46:23 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-yw0-x242.google.com (mail-yw0-x242.google.com
 [IPv6:2607:f8b0:4002:c05::242])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 8BC4E1A93
 for <freebsd-fs@freebsd.org>; Mon, 15 Aug 2016 04:46:23 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: by mail-yw0-x242.google.com with SMTP id r9so1929456ywg.2
 for <freebsd-fs@freebsd.org>; Sun, 14 Aug 2016 21:46:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=j+jV/Pjx7AvBSN0jPZspiDandkMlwQSP6YXVWg3guJw=;
 b=pcQ53+Hjt+fvLv4nkpOtJx1No4UV6eCCAUgRiwT65crHzbtNF8pjcDJRc+OFX5xIyj
 1yVRds0kGOQCS/ig1FbSpbthIo18CxxetbTa8Lcnb57hRNLWa7i6F+BI2xK8sg/VD3ZD
 SWAQx23Q1IGdbGDvoKgbQCuORy9G25almXvuTehJvAO7I0nDa8uvFxG+YQkkSmSpjBsv
 O6bjdDi26W9HXPSDqGEyl9Z8is0/oEQ++gDNhBoCOjJZvLtqHhRYYMnUPcwLXpSABuJh
 zlVRa4vJtK50AdrDOSlWkdmT+lSynQmIcv/c5dGlDZbJosYXe8rvLIgw2MryPHnRpznA
 BxGA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=j+jV/Pjx7AvBSN0jPZspiDandkMlwQSP6YXVWg3guJw=;
 b=QwIl5SyoashtcY3N1JdCjYscMF4uQ3h6EldiejX37UOU/v8ESK1P5uGffwOiKXf6WW
 xjlP8n0EBFUSosnYKNrskFPpF2Y8caHecJSZS3OKI47s4PdOxKs73slmQEZss4kkJjGc
 9GBsrdnbp5FK8gzWGIWEdImmW7p4gRbNmNE8SVPzZy9kPD60WkLSOB56i8ZSU7tvJcic
 caO6fx+LbQbeq6eIDHELzXjllf7Fs72uTUJ0IO/GTVqx8ij29g4tVCozORat6MgeEPBY
 PACF77R1P+6BrREpGGg3CvIQ732EISA6P5Iucs2SB42GQ1ujF0MXOJka57ERHUp5nKEl
 A4iA==
X-Gm-Message-State: AEkoouvHDCDN025zWSIVhrM7dljEvdHXkPFyiuthxiIHMdwcMEqYxMnt/6ydBqlOckdmHbnueoUhz0/C8net8w==
X-Received: by 10.129.38.212 with SMTP id m203mr20140560ywm.169.1471236382752; 
 Sun, 14 Aug 2016 21:46:22 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.161.37 with HTTP; Sun, 14 Aug 2016 21:46:22 -0700 (PDT)
In-Reply-To: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net>
References: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>
 <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net>
From: Zaphod Beeblebrox <zbeeble@gmail.com>
Date: Mon, 15 Aug 2016 00:46:22 -0400
Message-ID: <CACpH0MdozEdZNQrz+bZfAnL-b-QH=Z1sCGxxjAA-4Xia-ZcVrw@mail.gmail.com>
Subject: Re: zfs_recovery=1, zdb, mounted pool?
To: Xin Li <delphij@delphij.net>
Cc: freebsd-fs <freebsd-fs@freebsd.org>, Xin LI <d@delphij.net>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 04:46:23 -0000

On Sun, Aug 14, 2016 at 7:50 PM, Xin Li <delphij@delphij.net> wrote:

>
>
> On 8/14/16 13:04, Zaphod Beeblebrox wrote:
> > So... I found 319 of the errno 122 errors by running zdb.  My question is
> > this:
> >
> > Can I run with zfs_recovery=1 and have zdb fix these (which are free
> space
> > leaked errors) while the system is running?
>
> No.
>
> If I was you I would definitely do a full backup to a different place,
> recreate the pool and restore from the backup.
>
> It's not safe to use your pool as-is, don't do it for everybody's sake.
>

 So, then, do I start a big bug on this issue?  Is there a bug on this
issue?  Seriously... it appears to have happened to multiple people.

From owner-freebsd-fs@freebsd.org  Mon Aug 15 05:02:00 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7A4A5BB7F19
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 15 Aug 2016 05:02:00 +0000 (UTC)
 (envelope-from delphij@delphij.net)
Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "anubis.delphij.net",
 Issuer "StartCom Class 1 DV Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5B1B41616
 for <freebsd-fs@freebsd.org>; Mon, 15 Aug 2016 05:02:00 +0000 (UTC)
 (envelope-from delphij@delphij.net)
Received: from Xins-MBP.home.us.delphij.net (unknown
 [IPv6:2601:646:8880:a197:c892:fffe:6584:2452])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by anubis.delphij.net (Postfix) with ESMTPSA id 9361A171B3;
 Sun, 14 Aug 2016 22:01:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net;
 s=anubis; t=1471237314; x=1471251714;
 bh=K6M0CWpskkKz4S8fuMuDRMeRUehDNoDmfrxuNeT2IsM=;
 h=Subject:To:References:Cc:From:Date:In-Reply-To;
 b=3XtH2RuT7Dgz2va15o7VZDsBPv3O/hgwzrvWTUU/egLVqqsLeVfjy9XKTRoagAilp
 5Dh3+d3GXsiZGKD+JjI78b/tPcEmIJcKVxEngb4xqkZ6aKExs9wDxHN0BKB3j5sifh
 UEj53UmgQhiR1+MBOnbF0ujzRs2HlnFul93PT7A0=
Subject: Re: zfs_recovery=1, zdb, mounted pool?
To: Zaphod Beeblebrox <zbeeble@gmail.com>
References: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>
 <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net>
 <CACpH0MdozEdZNQrz+bZfAnL-b-QH=Z1sCGxxjAA-4Xia-ZcVrw@mail.gmail.com>
Cc: d@delphij.net, freebsd-fs <freebsd-fs@freebsd.org>
From: Xin Li <delphij@delphij.net>
Message-ID: <3b1ad8c4-f073-998b-84ed-f906029572ba@delphij.net>
Date: Sun, 14 Aug 2016 22:01:50 -0700
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <CACpH0MdozEdZNQrz+bZfAnL-b-QH=Z1sCGxxjAA-4Xia-ZcVrw@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature";
 boundary="8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 05:02:00 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn
Content-Type: multipart/mixed; boundary="1pN9GAoBnfQJ1Bh3cEgdCVRkHI8hALxOB"
From: Xin Li <delphij@delphij.net>
To: Zaphod Beeblebrox <zbeeble@gmail.com>
Cc: d@delphij.net, freebsd-fs <freebsd-fs@freebsd.org>
Message-ID: <3b1ad8c4-f073-998b-84ed-f906029572ba@delphij.net>
Subject: Re: zfs_recovery=1, zdb, mounted pool?
References: <CACpH0Md01cwOTpbvaeGgEKWx3atfT8uZ3hAjJRbT4QQqDHShOA@mail.gmail.com>
 <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net>
 <CACpH0MdozEdZNQrz+bZfAnL-b-QH=Z1sCGxxjAA-4Xia-ZcVrw@mail.gmail.com>
In-Reply-To: <CACpH0MdozEdZNQrz+bZfAnL-b-QH=Z1sCGxxjAA-4Xia-ZcVrw@mail.gmail.com>

--1pN9GAoBnfQJ1Bh3cEgdCVRkHI8hALxOB
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


On 8/14/16 21:46, Zaphod Beeblebrox wrote:
> On Sun, Aug 14, 2016 at 7:50 PM, Xin Li <delphij@delphij.net
> <mailto:delphij@delphij.net>> wrote:
>=20
>=20
>=20
>     On 8/14/16 13:04, Zaphod Beeblebrox wrote:
>     > So... I found 319 of the errno 122 errors by running zdb.  My
>     question is
>     > this:
>     >
>     > Can I run with zfs_recovery=3D1 and have zdb fix these (which are=

>     free space
>     > leaked errors) while the system is running?
>=20
>     No.
>=20
>     If I was you I would definitely do a full backup to a different pla=
ce,
>     recreate the pool and restore from the backup.
>=20
>     It's not safe to use your pool as-is, don't do it for everybody's s=
ake.
>=20
>=20
>  So, then, do I start a big bug on this issue?  Is there a bug on this
> issue?  Seriously... it appears to have happened to multiple people.

I don't think so -- zfs_recovery is the last resort option that disables
certain assertions, which implies that your pool is already damaged
beyond repair (i.e. beyond the redundancy margin that ZFS have had built
in, e.g. multiple copies of metadata, RAID-Z, etc.), typically as a
result of RAM issues.

In theory it is possible to rebuild space map and recover the space, but
note that space map have sufficient redundancy that, if you have see
errors in it that can not be corrected by ZFS's self-healing, it's
highly likely that there are much more damage to the pool already.  If
you don't have a reproduction case for this one that can reliably
trigger a leak without hardware issue, I think it would be just a waste
of time to file a bug.

Cheers,


--1pN9GAoBnfQJ1Bh3cEgdCVRkHI8hALxOB--

--8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJXsUzBAAoJEJW2GBstM+ns4/sQAKGk4r1/OkABoUAZHc+v5HT6
g/0ttoSHFK7jorK/Zz1jNHBG27WhN9eRokBZzy9bGvudjmhvbT0BVGJCzqtBo4yo
maghRKiQO/9DL4Wwcy+P7w8tWrZrbEQOz8FKp31WO1NiFCw5Pq782kXwBiMcIG+E
z9Up7IHIN77cQCsQUZkfzxMS/ZfiVFXex8glOxuXHAnWWK+1uNl+/62fQeYEN93Y
JkXYGVGRrdZKD5gDsr6lGIEUyGCInj/QI5g99q5DfumLaZ016xE2GZmmIZnQbRH9
NKbE8O1abBFRt69eOVi6v1ojXm5RAq7td7OLejYcs/RuKsTYZpZxq1xWOFWB6HV6
ql73SBnnt5R3andKE4gP4xjl+3/eWv+hHVdNtGHmmVjth2rRTtUzDeGIdA0TKpcB
haHWB3H3xiJwkKLGiCSVLxaI22v48UcQ0GUR6PMHlyQojjWcyoRnZTKZoNNhu/zM
6ZncKWW3J+D9alNA3VAKWOZ0rkaShyWgvM8GGgHSu2GemiR++rN7OpBj3K6ZIYaD
I3yh1V0zULDgOQAfXyIeEovdelZy2aLNwtlKg1/nl2Wn7WU3Jo5Zp9q3BH49ZhAN
ZyG1bBCWQI9nckb2LF5JJq9jk5SJfboYrwgL1S9aBCcv0YWyMi1FcxURoaRZbMRM
gIZEX4XA6oUympA34CSO
=xBiX
-----END PGP SIGNATURE-----

--8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn--

From owner-freebsd-fs@freebsd.org  Mon Aug 15 12:11:33 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 950C9BBBD31
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 15 Aug 2016 12:11:33 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 84C1816BE
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Aug 2016 12:11:33 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7FCBXqg048877
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Aug 2016 12:11:33 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211381] L2ARC degraded, repeatedly, on Samsung SSD 950 Pro nvme
Date: Mon, 15 Aug 2016 12:11:33 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: braddeicide@hotmail.com
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211381-3630-vbLhy9X5jX@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211381-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211381-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 12:11:33 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211381

--- Comment #12 from braddeicide@hotmail.com ---
Been running r300039 in previously problematic mismatched sector size
configuration for a week, looks good.

# geli is 4k, underlying device is 512
diskinfo -v /dev/nvd0p3.eli | grep sectorsize
        4096            # sectorsize
diskinfo -v /dev/nvd0p3 | grep sectorsize
        512             # sectorsize

# zfs-stats
L2 ARC Summary: (HEALTHY)

# Cache is growing
pool           alloc   free   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
cache              -      -      -      -      -      -
  nvd0p3.eli    139G   311G      3      3  40.0K   427K

# guess compress_failures weren't a valid indicator
kstat.zfs.misc.arcstats.l2_compress_failures: 11644501
kstat.zfs.misc.arcstats.l2_writes_error: 0

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Mon Aug 15 13:30:19 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C522CBBA83D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 15 Aug 2016 13:30:19 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B5054152E
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Aug 2016 13:30:19 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7FDUJnh039970
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Aug 2016 13:30:19 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211381] L2ARC degraded, repeatedly, on Samsung SSD 950 Pro nvme
Date: Mon, 15 Aug 2016 13:30:19 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: avg@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211381-3630-1Ao2gAjDby@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211381-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211381-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 13:30:19 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211381

--- Comment #13 from Andriy Gapon <avg@FreeBSD.org> ---
(In reply to braddeicide from comment #12)
"Compress failure" only means that the compression didn't save any space and
thus a buffer eligible for the compression was placed in L2ARC without it.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Mon Aug 15 13:31:03 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9352BBA97A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 15 Aug 2016 13:31:03 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C9064161D
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Aug 2016 13:31:03 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7FDV3jd042156
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Aug 2016 13:31:03 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211381] L2ARC degraded, repeatedly, on Samsung SSD 950 Pro nvme
Date: Mon, 15 Aug 2016 13:31:03 +0000
X-Bugzilla-Reason: AssignedTo CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: avg@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: assigned_to cc
Message-ID: <bug-211381-3630-DCFbe19SYJ@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211381-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211381-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 13:31:03 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211381

Andriy Gapon <avg@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|freebsd-fs@FreeBSD.org      |avg@FreeBSD.org
                 CC|                            |freebsd-fs@FreeBSD.org

--=20
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Tue Aug 16 09:09:40 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9BCAABBABFD
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 16 Aug 2016 09:09:40 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com
 [IPv6:2a00:1450:400c:c09::241])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2B082130B
 for <freebsd-fs@freebsd.org>; Tue, 16 Aug 2016 09:09:40 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x241.google.com with SMTP id i138so15208734wmf.3
 for <freebsd-fs@freebsd.org>; Tue, 16 Aug 2016 02:09:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:content-transfer-encoding:subject:message-id:date:to
 :mime-version; bh=YzHYA+vFcmMaXFMLD7do891dojllXu46gyMNBQBp9W8=;
 b=trZkUelkodfmGSP9Bv1df9DZI2h2rvBjC6rE+u1OSjuyqtZGAzBqWYEwt7fH1afvc8
 P52+y0xgWglgEjKUl2MePPviIEozOODZgnYi4JCQsA6dEMo7kglgTR5Gi/BILh4Ls+xe
 2FZ8Y1O7BuIb1jFYDyfC4Y33jPuZSx949TkJOVJVg5Vvd9PZxOExs/i3/AAdgS1+yroI
 egMzyxsSHrpAIpEu/EWDL5r/nV4mXnzSmsqwzTT/K8K6x0Rpma6X0Sy1+gI3pMxBoTQw
 mzAIxOfqixZay9EyHgWkMUQzAfd/atSVBxXk/Ia7w/NYHfIsYwhrgEPr6K9C50Grc+qu
 XmoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:content-transfer-encoding:subject
 :message-id:date:to:mime-version;
 bh=YzHYA+vFcmMaXFMLD7do891dojllXu46gyMNBQBp9W8=;
 b=J6B0sfh8Pv7kecE8PaTR3wvCi2JyqAxjISDppHEG0CB52/PkyxOiyLSlVCsePgEvLi
 +xvIVqT+xk2HHSp/kI+pQQQo7hieC5Nds+qxZ6j337dLJKeC2sUjtR3CnbWOWEzD6QSW
 Hdwj/HPGjTAqGKDPQaWrJDPGONPlG4A/nSBdYJFpLxYRHQ+N1VSpSmd7zBMRX2nnspPJ
 jfeAT2jVSXQgYDKxufv3vRfXu/scTdbDCMul8n8fo14arYO2WOOwyqGAK3bc4BQkTVtf
 sqi7RUZ59fKetJ/50TahGljZ34JqdmVYnXCbsNrCHOweQjs+vpOQMT7cIWmtKLa+sxc4
 YwEw==
X-Gm-Message-State: AEkooutCkP81RD6eYyrAOSaHnNOKVb8MqtdHg2OrjgRIgFVk5wPm9wGlJpd6wCyi2yKzLw==
X-Received: by 10.28.146.211 with SMTP id u202mr20628860wmd.54.1471338578374; 
 Tue, 16 Aug 2016 02:09:38 -0700 (PDT)
Received: from macbook-air-de-benjamin-1.home
 (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85])
 by smtp.gmail.com with ESMTPSA id h7sm25683963wjd.17.2016.08.16.02.09.37
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 16 Aug 2016 02:09:37 -0700 (PDT)
From: Ben RUBSON <ben.rubson@gmail.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: ZFS does not correctly import by label
Message-Id: <39AA40F6-1387-4A1D-857D-C43F03FDC240@gmail.com>
Date: Tue, 16 Aug 2016 11:09:36 +0200
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 09:09:40 -0000

Hello,

Sounds like ZFS does not correctly import cache and spares by label.

Example :

# zpool add home cache label/G12000KU2RVJAch
# zpool add home spare label/G1207PFGPTKDXhm=20
# zpool status home
  (...)
  home                       ONLINE       0     0     0
    mirror-0                 ONLINE       0     0     0
      label/G1203PGGLJWZXhm  ONLINE       0     0     0
      label/G1204PHKXHJ2Xhm  ONLINE       0     0     0
  (...)
  logs
    mirror-2                 ONLINE       0     0     0
      label/G12000KU2RVJAlg  ONLINE       0     0     0
      label/G12010KU22RVAlg  ONLINE       0     0     0
  (...)
  cache
    label/G12000KU2RVJAch    ONLINE       0     0     0
  spares
    label/G1207PFGPTKDXhm    AVAIL

# zpool export home
# zpool import -d /dev/label/ home
# zpool status home
  (...)
  home                       ONLINE       0     0     0
    mirror-0                 ONLINE       0     0     0
      label/G1203PGGLJWZXhm  ONLINE       0     0     0
      label/G1204PHKXHJ2Xhm  ONLINE       0     0     0
  (...)
  logs
    mirror-2                 ONLINE       0     0     0
      label/G12000KU2RVJAlg  ONLINE       0     0     0
      label/G12010KU22RVAlg  ONLINE       0     0     0
  (...)
  cache
    da5p7                    ONLINE       0     0     0
  spares
    da4p1                    AVAIL

# uname -v
FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016     =
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC=20

Did I miss something ?

Thank you very much !

Best regards,

Ben


From owner-freebsd-fs@freebsd.org  Tue Aug 16 10:54:55 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0005BBB18B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 16 Aug 2016 10:54:55 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com
 [IPv6:2a00:1450:400c:c09::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6DBF316E0
 for <freebsd-fs@freebsd.org>; Tue, 16 Aug 2016 10:54:55 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22c.google.com with SMTP id i5so158823531wmg.0
 for <freebsd-fs@freebsd.org>; Tue, 16 Aug 2016 03:54:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=Jv3ZAbJDAVkN5huSZfIdJKgL4q7n633iJ/hhieIvFu4=;
 b=RLQFZELStLTFH+Z0UbK+R2dTSBPaDB/eMotmwn+uTqBfkTyaMIkID1XLls1zrEF6vh
 +wDlQIfJei7tB7DBywcOPYjck/yZp6U3hfiZ8t6w70G3ZW4cpdQoENw+pau37k4iy5iL
 qoIhz4bUBmwZ+Rl7OoQRAniUR9Z6ZAgO62gB7xiVrkqahsIFHaOoZYtRNHEIfIN4bkEY
 OhY1cfXah4fQS2Blk9vZ3V7s4tHw7FH8gPfgpRqsAju1/OZAofA9uh1k1Kg99g/urCTu
 aYQC2+uYKsOhirubKqVm6mkg2P0AoeUY1ImgGvpH1I1k05IAUcbdlbDzjf2Rjp0tRdMi
 AeHQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=Jv3ZAbJDAVkN5huSZfIdJKgL4q7n633iJ/hhieIvFu4=;
 b=e4xNPMc/DoFV0lohVpLOIN+CtKuBFNWR576CPOIfFrwtYvatveylnCn3nk2x/dxutr
 K09jaSWe5Zb1c3BXq5Ro8tvihSHFpxEojFEQK8vMMUguQxf0qSKRcLb+HETarzCLWB7z
 scJEHADWe/6kIGbB0U6pFKJCONgKB0gmxacPcybgaNx0nF4abiU708XnkV7E48lV3WCH
 3cBls/krpAY29xBhGBAR4S/m3lbs/v/226PEzGUSNXP6rIeDPsCwIVSMIIq1HBg4iJXR
 HRPZUsUjZyPBRM08g46AsQ2kPf0CTc3XEyx8bXOGxhx0S89+pJF98XHMCCbdzMNXF7Yw
 YRrQ==
X-Gm-Message-State: AEkoouvi3BbMPio/RA8XxniA1v0wuO1gX/FEEFIby0N4p70uDijg/Uyb4UAUnDujCudk0w==
X-Received: by 10.28.179.139 with SMTP id c133mr5608490wmf.104.1471344893400; 
 Tue, 16 Aug 2016 03:54:53 -0700 (PDT)
Received: from macbook-air-de-benjamin-1.home
 (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85])
 by smtp.gmail.com with ESMTPSA id f3sm26163136wjh.2.2016.08.16.03.54.52
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 16 Aug 2016 03:54:52 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: ZFS does not correctly import by label
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <39AA40F6-1387-4A1D-857D-C43F03FDC240@gmail.com>
Date: Tue, 16 Aug 2016 12:54:51 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <2B3ED90E-8659-49F4-BAE4-624724D2FA10@gmail.com>
References: <39AA40F6-1387-4A1D-857D-C43F03FDC240@gmail.com>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 10:54:56 -0000


> On 16 Aug 2016, at 11:09, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> Hello,
>=20
> Sounds like ZFS does not correctly import cache and spares by label.
>=20
> Example :
>=20
> # zpool add home cache label/G12000KU2RVJAch
> # zpool add home spare label/G1207PFGPTKDXhm=20
> # zpool status home
>  (...)
>  home                       ONLINE       0     0     0
>    mirror-0                 ONLINE       0     0     0
>      label/G1203PGGLJWZXhm  ONLINE       0     0     0
>      label/G1204PHKXHJ2Xhm  ONLINE       0     0     0
>  (...)
>  logs
>    mirror-2                 ONLINE       0     0     0
>      label/G12000KU2RVJAlg  ONLINE       0     0     0
>      label/G12010KU22RVAlg  ONLINE       0     0     0
>  (...)
>  cache
>    label/G12000KU2RVJAch    ONLINE       0     0     0
>  spares
>    label/G1207PFGPTKDXhm    AVAIL
>=20
> # zpool export home
> # zpool import -d /dev/label/ home
> # zpool status home
>  (...)
>  home                       ONLINE       0     0     0
>    mirror-0                 ONLINE       0     0     0
>      label/G1203PGGLJWZXhm  ONLINE       0     0     0
>      label/G1204PHKXHJ2Xhm  ONLINE       0     0     0
>  (...)
>  logs
>    mirror-2                 ONLINE       0     0     0
>      label/G12000KU2RVJAlg  ONLINE       0     0     0
>      label/G12010KU22RVAlg  ONLINE       0     0     0
>  (...)
>  cache
>    da5p7                    ONLINE       0     0     0
>  spares
>    da4p1                    AVAIL
>=20
> # uname -v
> FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016     =
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC=20

I just tested my pool under FreeBSD 11-RC1, the issue does not occur, =
cache and spares are correctly imported by label.
Could it be possible to backport required changes to 10.3 ?

Many thanks !

Ben


From owner-freebsd-fs@freebsd.org  Tue Aug 16 19:34:20 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D68C4BBCCFB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 16 Aug 2016 19:34:20 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9E30E1F3C
 for <freebsd-fs@freebsd.org>; Tue, 16 Aug 2016 19:34:20 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD))
 (envelope-from <slw@zxy.spb.ru>) id 1bZk7s-000P7Q-88
 for freebsd-fs@freebsd.org; Tue, 16 Aug 2016 22:34:16 +0300
Date: Tue, 16 Aug 2016 22:34:16 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: freebsd-fs@freebsd.org
Subject: ZFS ARC under memory pressure
Message-ID: <20160816193416.GM8192@zxy.spb.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.24 (2015-08-30)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 19:34:20 -0000

I see issuses with ZFS ARC inder memory pressure.
ZFS ARC size can be dramaticaly reduced, up to arc_min.

As I see memory pressure event cause call arc_lowmem and set needfree:

arc.c:arc_lowmem

        needfree = btoc(arc_c >> arc_shrink_shift);

After this, arc_available_memory return negative vaules (PAGESIZE *
(-needfree)) until needfree is zero. Independent how too much memory
freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every
loop interation).

arc_c droped to minimum value if arc_size fast enough droped.

No control current to initial memory allocation.

As result, I can see needless arc reclaim, from 10x to 100x times.

Can some one check me and comment this?

From owner-freebsd-fs@freebsd.org  Tue Aug 16 21:02:34 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CD09BBB7FA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 16 Aug 2016 21:02:34 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4331118D4
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Aug 2016 21:02:34 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7GL2YYw095143
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Aug 2016 21:02:34 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Tue, 16 Aug 2016 21:02:34 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: commit-hook@freebsd.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211013-3630-1vEOM1gQh8@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 21:02:34 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

--- Comment #1 from commit-hook@freebsd.org ---
A commit references this bug:

Author: mckusick
Date: Tue Aug 16 21:02:30 UTC 2016
New revision: 304239
URL: https://svnweb.freebsd.org/changeset/base/304239

Log:
  Bug 211013 reports that a write error to a UFS filesystem running
  with softupdates panics the kernel. The problem that has been pointed
  out is that when there is a transient write error on certain metadata
  blocks, specifically directory blocks (PAGEDEP), inode blocks
  (INODEDEP), indirect pointer blocks (INDIRDEPS), and cylinder group
  (BMSAFEMAP, but only when journaling is enabled), we get a panic
  in one of the routines called by softdep_disk_io_initiation that
  the I/O is "already started" when we retry the write.

  These dependency types potentially need to do roll-backs when called
  by softdep_disk_io_initiation before doing a write and then a
  roll-forward when called by softdep_disk_write_complete after the
  I/O completes.  The panic happens when there is a transient error.
  At the top of softdep_disk_write_complete we check to see if the
  write had an error and if an error occurred we just return.  This
  return is correct most of the time because the main role of the routines
  called by softdep_disk_write_complete is to process the now-completed
  dependencies so that the next I/O steps can happen.

  But for the four types listed above, they do not get to do their
  rollback operations. This causes the panic when softdep_disk_io_initiation
  gets called on the second attempt to do the write and the roll-back
  routines find that the roll-backs have already been done. As an
  aside I note that there is also the problem that the buffer will
  have been unlocked and thus made visible to the filesystem and to
  user applications with the roll-backs in place.

  The way to resolve the problem is to add a flag to the routines called
  by softdep_disk_write_complete for the four dependency types noted
  that indicates whether the write was successful (WRITESUCCEEDED).
  If the write does not succeed, they do just the roll-backs and then
  return. If the write was successful they also do their usual
  processing of the now-completed dependencies.

  The fix was tested by selectively injecting write errors for buffers
  holding dependencies of each of the four types noted above and then
  verifying that the kernel no longer paniced and that following the
  successful retry of the write that the filesystem could be unmounted
  and successfully checked cleanly.

  PR: 211013
  Reviewed by: kib

Changes:
  head/sys/ufs/ffs/ffs_softdep.c
  head/sys/ufs/ffs/softdep.h

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Tue Aug 16 21:25:24 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D26CBBBBFDE
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 16 Aug 2016 21:25:24 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B45E21C05
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Aug 2016 21:25:24 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7GLPOuF092484
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Aug 2016 21:25:24 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Tue, 16 Aug 2016 21:25:24 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: mckusick@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc bug_status
Message-ID: <bug-211013-3630-UVUlKcMPsv@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 21:25:24 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

Kirk McKusick <mckusick@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mckusick@FreeBSD.org
             Status|New                         |In Progress

--- Comment #2 from Kirk McKusick <mckusick@FreeBSD.org> ---
Patch has been applied. If no further problems are reported, bug will be
closed.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 02:01:07 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA1E2BBCF86
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 02:01:07 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9948319F7
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 02:01:07 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7H217WO082129
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 02:01:07 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Wed, 17 Aug 2016 02:01:07 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: karl@denninger.net
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211013-3630-3pt6zki13h@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 02:01:07 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

--- Comment #3 from karl@denninger.net ---
I trashed the card that caused this, but will see if I can reproduce and wi=
ll
update in any event.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 02:03:31 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA273BBC126
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 02:03:31 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 996441D23
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 02:03:31 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7H23V6v068788
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 02:03:31 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Wed, 17 Aug 2016 02:03:31 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: karl@denninger.net
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211013-3630-HNmI9arW5b@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 02:03:31 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

--- Comment #4 from karl@denninger.net ---
(In reply to karl from comment #3)

Is this expected to be MFC'd back against 11.0-PRE (and should it apply
cleanly?)

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 05:39:00 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57591BBCAAB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 05:39:00 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 462861EAB
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 05:39:00 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7H5d0sV052423
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 05:39:00 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Wed, 17 Aug 2016 05:39:00 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: mckusick@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211013-3630-QxDwwkbUYC@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 05:39:00 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

--- Comment #5 from Kirk McKusick <mckusick@FreeBSD.org> ---
Though I did not specify an MFC, it should apply easily to 11.0. I do plan =
to
do an MFC to 11.0 once it has been released. Since it is not a common bug I
don't want to slow the process of getting 11.0 out the door hence the pause=
 to
MFC.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 07:18:51 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1541BBB3D1
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 07:18:51 +0000 (UTC)
 (envelope-from mgamsjager@gmail.com)
Received: from mail-io0-x230.google.com (mail-io0-x230.google.com
 [IPv6:2607:f8b0:4001:c06::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7CA241EBA
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 07:18:51 +0000 (UTC)
 (envelope-from mgamsjager@gmail.com)
Received: by mail-io0-x230.google.com with SMTP id q83so129090645iod.1
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 00:18:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:cc;
 bh=x4WMCXJNByJ+2wnajUZDpldZMVbWJTAxjhdsCp+F8KA=;
 b=IDEBGyWxgEMGm7GnHUZyts679E6/oZR7rqknrzGDsr98OrH3r68LyJhvRPnH01pRK3
 w2/MASgcj9/Pof9mPzeaY9s3hq7YdMRYlwX/6PMLkyPDpsMkF8BEvQU8pEZC7MwQTY0y
 RpWY8GzUthLlppKBtSLdcYWPb3bRoAd5ig85QcDX84MsGE2AspWeMXB4VIobAF9UhxG4
 eBB4SoazqMpfoxYHKgrcBNDlXh8c1/hTnJ3SOrV8WLjq/HS6NC1U7kgLQ9+YVV18ZLSf
 vqwQD2TZ0PeKhUZG5gdPA8OQvKtkXTcOWciOGgFhs4ArUlYfGuPHmiBgo0dK6X+YWWuy
 MOUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:cc;
 bh=x4WMCXJNByJ+2wnajUZDpldZMVbWJTAxjhdsCp+F8KA=;
 b=EMqksGdE/o3vcmdwjA/qhVEf9KIpazsMW7jtwXd3LJWACuWNZyHwxc48QtL18T36Ba
 TU5qBhIAGXQetJcTLec8pDTJI6EpnMEG/82F6qbUeC3G3Gydec9cPvcIExTfLbXJGFIm
 f4SrCpTtVsLwbE0UjOlXE4e9I5F9xRd1nFTH+CNUjjPKqVTVjnj9nx6erdHJiPevAVVj
 KLth+C8uowmxFL7fVb1dJeeYcxRiVEYptZAba7rsGjNJhBD7v1NjbF/0Dlzqt48kqql4
 77AbQ00+K6rNejkgRi2j+HqBpyazL6py4h/qiKmpwMkgpu4lrch3JsZhZ57AlxGN01u3
 uezQ==
X-Gm-Message-State: AEkoouuzKdXx0R/iYtLQXxfo0/EWscY0JiZH4r37fxz+oIbsJN0UXi+w0aogSkrDQHMHNcHxdc0daoeCcKrz3A==
X-Received: by 10.107.139.8 with SMTP id n8mr44439153iod.96.1471418330671;
 Wed, 17 Aug 2016 00:18:50 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.64.63.197 with HTTP; Wed, 17 Aug 2016 00:18:20 -0700 (PDT)
In-Reply-To: <20160816193416.GM8192@zxy.spb.ru>
References: <20160816193416.GM8192@zxy.spb.ru>
From: Matthias Gamsjager <mgamsjager@gmail.com>
Date: Wed, 17 Aug 2016 09:18:20 +0200
Message-ID: <CA+D9Qhv0MW6mkEWDhGuB-S7W_6oxV5bKJ6GGpSzdqQYnNPeyhA@mail.gmail.com>
Subject: Re: ZFS ARC under memory pressure
Cc: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 07:18:51 -0000

On 16 August 2016 at 21:34, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:

> I see issuses with ZFS ARC inder memory pressure.
> ZFS ARC size can be dramaticaly reduced, up to arc_min.
>
> As I see memory pressure event cause call arc_lowmem and set needfree:
>
> arc.c:arc_lowmem
>
>         needfree = btoc(arc_c >> arc_shrink_shift);
>
> After this, arc_available_memory return negative vaules (PAGESIZE *
> (-needfree)) until needfree is zero. Independent how too much memory
> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every
> loop interation).
>
> arc_c droped to minimum value if arc_size fast enough droped.
>
> No control current to initial memory allocation.
>
> As result, I can see needless arc reclaim, from 10x to 100x times.
>
> Can some one check me and comment this?
> _______________________________________________
>


What version are you on?

From owner-freebsd-fs@freebsd.org  Wed Aug 17 07:31:56 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F30A8BBB9BD
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 07:31:56 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 by mx1.freebsd.org (Postfix) with ESMTP id B41CA1A79
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 07:31:55 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id 4D78F4C4C83E;
 Wed, 17 Aug 2016 09:25:33 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id hDxnviEepP0m; Wed, 17 Aug 2016 09:25:31 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id 1B6424C4C839;
 Wed, 17 Aug 2016 09:25:31 +0200 (CEST)
Reply-To: juergen.gotteswinter@internetx.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com>
 <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan>
 <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
To: Borja Marcos <borjam@sarenet.es>, freebsd-fs@freebsd.org
From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Organization: InterNetX GmbH
Message-ID: <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
Date: Wed, 17 Aug 2016 09:25:30 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 07:31:57 -0000


Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> 
>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> wrote:
>>
>> As I said in a previous post I tested the zfs send/receive approach (with
>> zrep) and it works (more or less) perfectly.. so I concur in all what you
>> said, especially about off-site replicate and synchronous replication.
>>
>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment, 
>> I'm in the early tests, haven't done any heavy writes yet, but ATM it 
>> works as expected, I havent' managed to corrupt the zpool.
> 
> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport
> (Ethernet) for low-level filesystem operations.
> 
> In case something went wrong, that approach could risk corrupting a pool. Although, frankly,
> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some
> silent corruption.

try dual split import :D i mean, zpool -f import on 2 machines hooked up
to the same disk chassis.

kaboom, really ugly kaboom. thats what is very likely to happen sooner
or later especially when it comes to homegrown automatism solutions.
even the commercial parts where much more time/work goes into such
solutions fail in a regular manner

> 
> The advantage of ZFS send/receive of datasets is, however, that you can consider it
> essentially atomic. A transport corruption should not cause trouble (apart from a failed
> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back
> zpool replications :)
> 
> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback
> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets
> aren’t mounted and zfs receive won’t complain.
> 
> 
> Cheers,
> 
> 
> 
> 
> Borja.
> 
> 
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 

From owner-freebsd-fs@freebsd.org  Wed Aug 17 08:53:27 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F88EBBA6EC
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 08:53:27 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com
 [195.16.151.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0D1031016
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 08:53:26 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop01.sare.net (Postfix) with ESMTPSA id C69B99DCA35;
 Wed, 17 Aug 2016 10:53:17 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
Date: Wed, 17 Aug 2016 10:53:17 +0200
Cc: freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es>
References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com>
 <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan>
 <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
To: juergen.gotteswinter@internetx.com
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 08:53:27 -0000


> On 17 Aug 2016, at 09:25, InterNetX - Juergen Gotteswinter =
<juergen.gotteswinter@internetx.com> wrote:
> try dual split import :D i mean, zpool -f import on 2 machines hooked =
up
> to the same disk chassis.
>=20
> kaboom, really ugly kaboom. thats what is very likely to happen sooner
> or later especially when it comes to homegrown automatism solutions.
> even the commercial parts where much more time/work goes into such
> solutions fail in a regular manner

Well, don=E2=80=99t expect to father children after shooting your balls! =
;)

I am not a big fan of such closely coupled solutions. There are quite
some failure modes that can break such a configuration, not just a =
brainless
=E2=80=9Cdual split import=E2=80=9D as you say :)

Misbehaving software (read, a ZFS bug) can render the pool unusable and, =
no matter how many
redundant servers you have connected to your chassis, you are toast. =
Using incremental replication
over a network is much more robust, and it offers a lot of fault =
isolation. Moreover, you can place the
servers in different buildings, etc.

Networks even offer a more than reasonable protection from electrical =
problems. Especially if you get
paranoid and use fiber, in which case protection is absolute.


Borja.


From owner-freebsd-fs@freebsd.org  Wed Aug 17 08:54:41 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80208BBA83D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 08:54:41 +0000 (UTC)
 (envelope-from julien@perdition.city)
Received: from relay-b01.edpnet.be (relay-b01.edpnet.be [212.71.1.221])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "edpnet.email",
 Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2CCCC1287
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 08:54:40 +0000 (UTC)
 (envelope-from julien@perdition.city)
X-ASG-Debug-ID: 1471424054-0a7ff569f634acc30001-3nHGF7
Received: from mordor.lan (213.219.167.114.bro01.dyn.edpnet.net
 [213.219.167.114]) by relay-b01.edpnet.be with ESMTP id mK3LosE254GpDeyp
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 17 Aug 2016 10:54:15 +0200 (CEST)
X-Barracuda-Envelope-From: julien@perdition.city
X-Barracuda-Effective-Source-IP: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114]
X-Barracuda-Apparent-Source-IP: 213.219.167.114
Date: Wed, 17 Aug 2016 10:54:13 +0200
From: Julien Cigar <julien@perdition.city>
To: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Cc: Borja Marcos <borjam@sarenet.es>, freebsd-fs@freebsd.org
Subject: Re: HAST + ZFS + NFS + CARP
Message-ID: <20160817085413.GE22506@mordor.lan>
X-ASG-Orig-Subj: Re: HAST + ZFS + NFS + CARP
References: <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="EXKGNeO8l0xGFBjy"
Content-Disposition: inline
In-Reply-To: <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Barracuda-Connect: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114]
X-Barracuda-Start-Time: 1471424054
X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384
X-Barracuda-URL: https://212.71.1.221:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 4432
X-Virus-Scanned: by bsmtpd at edpnet.be
X-Barracuda-BRTS-Status: 1
X-Barracuda-Bayes: INNOCENT GLOBAL 0.5000 1.0000 0.7500
X-Barracuda-Spam-Score: 0.75
X-Barracuda-Spam-Status: No, SCORE=0.75 using global scores of TAG_LEVEL=1000.0
 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests=
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32083
 Rule breakdown below
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 08:54:41 -0000


--EXKGNeO8l0xGFBjy
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter =
wrote:
>=20
>=20
> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> >=20
> >> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> wrote:
> >>
> >> As I said in a previous post I tested the zfs send/receive approach (w=
ith
> >> zrep) and it works (more or less) perfectly.. so I concur in all what =
you
> >> said, especially about off-site replicate and synchronous replication.
> >>
> >> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment,=
=20
> >> I'm in the early tests, haven't done any heavy writes yet, but ATM it=
=20
> >> works as expected, I havent' managed to corrupt the zpool.
> >=20
> > I must be too old school, but I don=E2=80=99t quite like the idea of us=
ing an essentially unreliable transport
> > (Ethernet) for low-level filesystem operations.
> >=20
> > In case something went wrong, that approach could risk corrupting a poo=
l. Although, frankly,
> > ZFS is extremely resilient. One of mine even survived a SAS HBA problem=
 that caused some
> > silent corruption.
>=20
> try dual split import :D i mean, zpool -f import on 2 machines hooked up
> to the same disk chassis.

Yes this is the first thing on the list to avoid .. :)

I'm still busy to test the whole setup here, including the=20
MASTER -> BACKUP failover script (CARP), but I think you can prevent
that thanks to:

- As long as ctld is running on the BACKUP the disks are locked=20
and you can't import the pool (even with -f) for ex (filer2 is the
BACKUP):
https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f

- The shared pool should not be mounted at boot, and you should ensure
that the failover script is not executed during boot time too: this is
to handle the case wherein both machines turn off and/or re-ignite at
the same time. Indeed, the CARP interface can "flip" it's status if both
machines are powered on at the same time, for ex:
https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
you will have a split-brain scenario

- Sometimes you'll need to reboot the MASTER for some $reasons
(freebsd-update, etc) and the MASTER -> BACKUP switch should not
happen, this can be handled with a trigger file or something like that

- I've still have to check if the order is OK, but I think that as long
as you shutdown the replication interface and that you adapt the
advskew (including the config file) of the CARP interface before the=20
zpool import -f in the failover script you can be relatively confident=20
that nothing will be written on the iSCSI targets

- A zpool scrub should be run at regular intervals

This is my MASTER -> BACKUP CARP script ATM
https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7

Julien

>=20
> kaboom, really ugly kaboom. thats what is very likely to happen sooner
> or later especially when it comes to homegrown automatism solutions.
> even the commercial parts where much more time/work goes into such
> solutions fail in a regular manner
>=20
> >=20
> > The advantage of ZFS send/receive of datasets is, however, that you can=
 consider it
> > essentially atomic. A transport corruption should not cause trouble (ap=
art from a failed
> > "zfs receive") and with snapshot retention you can even roll back. You =
can=E2=80=99t roll back
> > zpool replications :)
> >=20
> > ZFS receive does a lot of sanity checks as well. As long as your zfs re=
ceive doesn=E2=80=99t involve a rollback
> > to the latest snapshot, it won=E2=80=99t destroy anything by mistake. J=
ust make sure that your replica datasets
> > aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain.
> >=20
> >=20
> > Cheers,
> >=20
> >=20
> >=20
> >=20
> > Borja.
> >=20
> >=20
> >=20
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

--=20
Julien Cigar
Belgian Biodiversity Platform (http://www.biodiversity.be)
PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

--EXKGNeO8l0xGFBjy
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCgAGBQJXtCYyAAoJELK7NxCiBCPAjjgQAOF0zl3cvzfi6jXRoSS141wK
lWv3WeLLzjnzdq7k45i1LKRypyC8RRP4AlqCTcKIO/gbVWcKqTXb4VwTymyGhXvW
3dOYOcu38NIwzWZ95dEDT1dqCwKCvtlPzG+VJJ93Kr2jbCeoMxmZTZIgWGibjU46
ES7ozWvj9tMLWrg5blqiTVgsmR1OCEBhiahJvWPHHhOJmm8LAAh/HciT8tLM1Dd1
6skOIawLuGVKnGth12O9TpakuqBds8Ru3jry+1+EeERP6xDZRtJh0IUT2I57gJ2X
H8kyB4e4Dg9pVwtvLj7QLZcq7vK821pRrmvKkWo5OIQt8qPRjy2UxXoUbft1nPpK
RrMpo0J1Zb0riZoCLaVBkPSXNor9DXqwN2ExfxCq9WUBBYClBLdgxn1EAW0dmVwv
LearQLK4BdlCJrIJIQI2hpMiu0qAIfBuNlCsbifZQzbtjEPwk9s1MNDihMhydshc
PvSlqNIh1LkfQ4ka7FiYvGzaLfWTi7ZYYVl+SL4UvMX8YmvCdOGOUBf5bOjZkjRI
+0SHWic0JDM7R4chYGmTL9WFSFuBnqtNoQyy97c8bimqM2oV4pF7pEN1GfxR9w8Y
2pQ2ghSC40lhCTOUv8tGS3XKzkBp5J4BUSpu7fhhMSI52WJzIvNOwkTLmbnCoEku
hMfj6gWoa0TEYf6tj3di
=355Z
-----END PGP SIGNATURE-----

--EXKGNeO8l0xGFBjy--

From owner-freebsd-fs@freebsd.org  Wed Aug 17 09:03:03 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5827BBBC081
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 09:03:03 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 050BF1B18
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 09:03:02 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id D9EDB4C4C89E;
 Wed, 17 Aug 2016 11:02:59 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id gqFxgZu8qn7T; Wed, 17 Aug 2016 11:02:57 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id E05084C4C89D;
 Wed, 17 Aug 2016 11:02:57 +0200 (CEST)
Reply-To: juergen.gotteswinter@internetx.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com>
 <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan>
 <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es>
To: Borja Marcos <borjam@sarenet.es>
Cc: freebsd-fs@freebsd.org
From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Organization: InterNetX GmbH
Message-ID: <f64b0d15-08f1-b3ac-4c44-4f4a85d3d78f@internetx.com>
Date: Wed, 17 Aug 2016 11:02:57 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 09:03:03 -0000


Am 17.08.2016 um 10:53 schrieb Borja Marcos:
> 
>> On 17 Aug 2016, at 09:25, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> wrote:
>> try dual split import :D i mean, zpool -f import on 2 machines hooked up
>> to the same disk chassis.
>>
>> kaboom, really ugly kaboom. thats what is very likely to happen sooner
>> or later especially when it comes to homegrown automatism solutions.
>> even the commercial parts where much more time/work goes into such
>> solutions fail in a regular manner
> 
> Well, don’t expect to father children after shooting your balls! ;)
> 
> I am not a big fan of such closely coupled solutions. There are quite
> some failure modes that can break such a configuration, not just a brainless
> “dual split import” as you say :)
> 
> Misbehaving software (read, a ZFS bug) can render the pool unusable and, no matter how many
> redundant servers you have connected to your chassis, you are toast. Using incremental replication
> over a network is much more robust, and it offers a lot of fault isolation. Moreover, you can place the
> servers in different buildings, etc.

in my case it was caused by rsf-1 cluster software

> 
> Networks even offer a more than reasonable protection from electrical problems. Especially if you get
> paranoid and use fiber, in which case protection is absolute.
> 
> 
> 
> Borja.
> 

From owner-freebsd-fs@freebsd.org  Wed Aug 17 09:05:51 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E904BBC258
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 09:05:51 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 by mx1.freebsd.org (Postfix) with ESMTP id 121D21CE0
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 09:05:50 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id 0AFB945FC0FB;
 Wed, 17 Aug 2016 11:05:49 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 5tJoAb6RKY8e; Wed, 17 Aug 2016 11:05:46 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id 9E15A4C4C89E;
 Wed, 17 Aug 2016 11:05:46 +0200 (CEST)
Reply-To: juergen.gotteswinter@internetx.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
To: Julien Cigar <julien@perdition.city>
Cc: Borja Marcos <borjam@sarenet.es>, freebsd-fs@freebsd.org
From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Organization: InterNetX GmbH
Message-ID: <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
Date: Wed, 17 Aug 2016 11:05:46 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160817085413.GE22506@mordor.lan>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 09:05:51 -0000


Am 17.08.2016 um 10:54 schrieb Julien Cigar:
> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote:
>>
>>
>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>
>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> wrote:
>>>>
>>>> As I said in a previous post I tested the zfs send/receive approach (with
>>>> zrep) and it works (more or less) perfectly.. so I concur in all what you
>>>> said, especially about off-site replicate and synchronous replication.
>>>>
>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment, 
>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it 
>>>> works as expected, I havent' managed to corrupt the zpool.
>>>
>>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport
>>> (Ethernet) for low-level filesystem operations.
>>>
>>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly,
>>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some
>>> silent corruption.
>>
>> try dual split import :D i mean, zpool -f import on 2 machines hooked up
>> to the same disk chassis.
> 
> Yes this is the first thing on the list to avoid .. :)
> 
> I'm still busy to test the whole setup here, including the 
> MASTER -> BACKUP failover script (CARP), but I think you can prevent
> that thanks to:
> 
> - As long as ctld is running on the BACKUP the disks are locked 
> and you can't import the pool (even with -f) for ex (filer2 is the
> BACKUP):
> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
> 
> - The shared pool should not be mounted at boot, and you should ensure
> that the failover script is not executed during boot time too: this is
> to handle the case wherein both machines turn off and/or re-ignite at
> the same time. Indeed, the CARP interface can "flip" it's status if both
> machines are powered on at the same time, for ex:
> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
> you will have a split-brain scenario
> 
> - Sometimes you'll need to reboot the MASTER for some $reasons
> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
> happen, this can be handled with a trigger file or something like that
> 
> - I've still have to check if the order is OK, but I think that as long
> as you shutdown the replication interface and that you adapt the
> advskew (including the config file) of the CARP interface before the 
> zpool import -f in the failover script you can be relatively confident 
> that nothing will be written on the iSCSI targets
> 
> - A zpool scrub should be run at regular intervals
> 
> This is my MASTER -> BACKUP CARP script ATM
> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
> 
> Julien
> 

100€ question without detailed looking at that script. yes from a first
view its super simple, but: why are solutions like rsf-1 such more
powerful / featurerich. Theres a reason for, which is that they try to
cover every possible situation (which makes more than sense for this).

That script works for sure, within very limited cases imho

>>
>> kaboom, really ugly kaboom. thats what is very likely to happen sooner
>> or later especially when it comes to homegrown automatism solutions.
>> even the commercial parts where much more time/work goes into such
>> solutions fail in a regular manner
>>
>>>
>>> The advantage of ZFS send/receive of datasets is, however, that you can consider it
>>> essentially atomic. A transport corruption should not cause trouble (apart from a failed
>>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back
>>> zpool replications :)
>>>
>>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback
>>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets
>>> aren’t mounted and zfs receive won’t complain.
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>>
>>> Borja.
>>>
>>>
>>>
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 

From owner-freebsd-fs@freebsd.org  Wed Aug 17 09:11:11 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62487BBC574
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 09:11:11 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com
 [IPv6:2a00:1450:400c:c09::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E75451440
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 09:11:10 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: by mail-wm0-x22c.google.com with SMTP id o80so219354795wme.1
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 02:11:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=2/mQLGUGA0xLtkVNsiIpOFEvVb4XC3yo3Cor9uoHpHg=;
 b=Fj7dlse4etxfhovQJY5YSDwV658pe8wE/ykwyCPRly37kj41Jz9LX/UChUjMV7bast
 3ISzag94nXCxqkkTEqUyxqidxIUbOON26Bc9Yylilzb39VK24va3cfx70W3yS0vQ76zV
 naNiRZZEbY5xhtFrAKgZ/rPWfAWzuYo3M7SKxUGzDO+UU++wzkSlDK9OOVQsyGxdqh4y
 p4kExMjsUuAssuv8U1oofjLt9PfQEC4o0xcPzbD2Mei1/ze82Cm+w0T0SScb8zttdsvs
 6GvFWupEYJ/M+nZM3bssBDRktoOBCZEW6yzB01v0J553IjufAhD8aPFBNVSaE61IPGif
 DPYg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=2/mQLGUGA0xLtkVNsiIpOFEvVb4XC3yo3Cor9uoHpHg=;
 b=I5ntNkNPzHNgWwxDjzZsq3di3Kd46YCnrAy5WuFRklGzram7prwGqkeWWIkOzUYvzO
 aRgCfOrBdavdOZ0tRoq7lE4kvXmdTssXwTqVs4HCnL7b/NHgiCAlMY51F0+AWit3Z1B5
 KzeXAEKLIpF15KlxV+kSI5ExfID2TYrk44UEtWH/rCDu96yPjzV3K+GYUijDBXwWwQa7
 LEV+1TTh0M2+BEqzAK5P+N4qQy9GgJUBCPNlQTSx30dd7wVBJaeGTxKTg4LHqv7E1UqC
 TPffA5ZGzZ4mNnWxneFKU6se3AL/j81dfiIoh2lmAogiHYEGZ61VW/YaLW2XiOEKcxeJ
 2Jqw==
X-Gm-Message-State: AEkoouubrunXDFO1SOPGMa8bgUHSWkEjilpJoqtXSM+7x6CM/bMGv+nfc5ZUiEG5Le4PfBq5ILUM1UmICEQe7Q==
X-Received: by 10.194.175.106 with SMTP id bz10mr42852112wjc.42.1471425069456; 
 Wed, 17 Aug 2016 02:11:09 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.54.202 with HTTP; Wed, 17 Aug 2016 02:11:08 -0700 (PDT)
In-Reply-To: <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es>
References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com>
 <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan>
 <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es>
From: krad <kraduk@gmail.com>
Date: Wed, 17 Aug 2016 10:11:08 +0100
Message-ID: <CALfReycz5SNg9fCCMBb=+zs_tEG10wjxBLBh2yXYQHQFbHMp0g@mail.gmail.com>
Subject: Re: HAST + ZFS + NFS + CARP
To: Borja Marcos <borjam@sarenet.es>
Cc: juergen.gotteswinter@internetx.com, FreeBSD FS <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 09:11:11 -0000

I totally agree here i would used some batch replication in general. Yes it
doesnt provide the ha you require, but then if you need that maybe a
different approach like a distributed file system is a better solution.
Even then though I would still have my standard replication to a box not
part of the distributed filesystem via rsync or something, just for ass
covering. Admittedly this gets problematic when the datasets have large
deltas and/or objects.

On 17 August 2016 at 09:53, Borja Marcos <borjam@sarenet.es> wrote:

>
> > On 17 Aug 2016, at 09:25, InterNetX - Juergen Gotteswinter <
> juergen.gotteswinter@internetx.com> wrote:
> > try dual split import :D i mean, zpool -f import on 2 machines hooked u=
p
> > to the same disk chassis.
> >
> > kaboom, really ugly kaboom. thats what is very likely to happen sooner
> > or later especially when it comes to homegrown automatism solutions.
> > even the commercial parts where much more time/work goes into such
> > solutions fail in a regular manner
>
> Well, don=E2=80=99t expect to father children after shooting your balls! =
;)
>
> I am not a big fan of such closely coupled solutions. There are quite
> some failure modes that can break such a configuration, not just a
> brainless
> =E2=80=9Cdual split import=E2=80=9D as you say :)
>
> Misbehaving software (read, a ZFS bug) can render the pool unusable and,
> no matter how many
> redundant servers you have connected to your chassis, you are toast. Usin=
g
> incremental replication
> over a network is much more robust, and it offers a lot of fault
> isolation. Moreover, you can place the
> servers in different buildings, etc.
>
> Networks even offer a more than reasonable protection from electrical
> problems. Especially if you get
> paranoid and use fiber, in which case protection is absolute.
>
>
>
> Borja.
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@freebsd.org  Wed Aug 17 09:15:52 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E02C2BBC8EB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 09:15:52 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu1176c.smtpx.saremail.com (cu1176c.smtpx.saremail.com
 [195.16.148.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9F6EC1A88
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 09:15:52 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop02.sare.net (Postfix) with ESMTPSA id 35D0B9DC642;
 Wed, 17 Aug 2016 11:15:49 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <CALfReycz5SNg9fCCMBb=+zs_tEG10wjxBLBh2yXYQHQFbHMp0g@mail.gmail.com>
Date: Wed, 17 Aug 2016 11:15:48 +0200
Cc: juergen.gotteswinter@internetx.com,
 FreeBSD FS <freebsd-fs@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <7EECBD48-5980-4387-8AAE-91D89F576DA1@sarenet.es>
References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com>
 <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan>
 <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com>
 <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com>
 <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es>
 <CALfReycz5SNg9fCCMBb=+zs_tEG10wjxBLBh2yXYQHQFbHMp0g@mail.gmail.com>
To: krad <kraduk@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 09:15:53 -0000


> On 17 Aug 2016, at 11:11, krad <kraduk@gmail.com> wrote:
>=20
> I totally agree here i would used some batch replication in general. =
Yes it doesnt provide the ha you require, but then if you need that =
maybe a different approach like a distributed file system is a better =
solution. Even then though I would still have my standard replication to =
a box not part of the distributed filesystem via rsync or something, =
just for ass covering. Admittedly this gets problematic when the =
datasets have large deltas and/or objects.

If your deltas are large you need a network with enough bandwidth to =
support it anyway. And rsync can be a nightmare depending on
the number of files you keep and their sizes. That=E2=80=99s an =
advantage of ZFS. In simple terms, an incremental send just copies a =
portion
of a transaction log together with its associated data blocks. The =
number of files does not hurt performance so much as it does
with rsync, which can be unusable.

And if you have real time requirements for replication (databases) using =
the built-in mechanisms in your DBMS
will be generally more robust.


Borja.


From owner-freebsd-fs@freebsd.org  Wed Aug 17 10:05:23 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05C67BBDC85
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 10:05:23 +0000 (UTC)
 (envelope-from julien@perdition.city)
Received: from relay-b02.edpnet.be (relay-b02.edpnet.be [212.71.1.222])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "edpnet.email",
 Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A051015BA
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 10:05:22 +0000 (UTC)
 (envelope-from julien@perdition.city)
X-ASG-Debug-ID: 1471427542-0a7b8d2a6d1db9eb0001-3nHGF7
Received: from mordor.lan (213.219.167.114.bro01.dyn.edpnet.net
 [213.219.167.114]) by relay-b02.edpnet.be with ESMTP id rmxDiDGxUMqjvQIQ
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 17 Aug 2016 11:52:24 +0200 (CEST)
X-Barracuda-Envelope-From: julien@perdition.city
X-Barracuda-Effective-Source-IP: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114]
X-Barracuda-Apparent-Source-IP: 213.219.167.114
Date: Wed, 17 Aug 2016 11:52:22 +0200
From: Julien Cigar <julien@perdition.city>
To: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Cc: Borja Marcos <borjam@sarenet.es>, freebsd-fs@freebsd.org
Subject: Re: HAST + ZFS + NFS + CARP
Message-ID: <20160817095222.GG22506@mordor.lan>
X-ASG-Orig-Subj: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="I3tAPq1Rm2pUxvsp"
Content-Disposition: inline
In-Reply-To: <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Barracuda-Connect: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114]
X-Barracuda-Start-Time: 1471427543
X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384
X-Barracuda-URL: https://212.71.1.222:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 5396
X-Virus-Scanned: by bsmtpd at edpnet.be
X-Barracuda-BRTS-Status: 1
X-Barracuda-Bayes: INNOCENT GLOBAL 0.5000 1.0000 0.0100
X-Barracuda-Spam-Score: 0.01
X-Barracuda-Spam-Status: No, SCORE=0.01 using global scores of TAG_LEVEL=1000.0
 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests=
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32085
 Rule breakdown below
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 10:05:23 -0000


--I3tAPq1Rm2pUxvsp
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter =
wrote:
>=20
>=20
> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
> > On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswin=
ter wrote:
> >>
> >>
> >> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> >>>
> >>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> wrote:
> >>>>
> >>>> As I said in a previous post I tested the zfs send/receive approach =
(with
> >>>> zrep) and it works (more or less) perfectly.. so I concur in all wha=
t you
> >>>> said, especially about off-site replicate and synchronous replicatio=
n.
> >>>>
> >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment=
,=20
> >>>> I'm in the early tests, haven't done any heavy writes yet, but ATM i=
t=20
> >>>> works as expected, I havent' managed to corrupt the zpool.
> >>>
> >>> I must be too old school, but I don=E2=80=99t quite like the idea of =
using an essentially unreliable transport
> >>> (Ethernet) for low-level filesystem operations.
> >>>
> >>> In case something went wrong, that approach could risk corrupting a p=
ool. Although, frankly,
> >>> ZFS is extremely resilient. One of mine even survived a SAS HBA probl=
em that caused some
> >>> silent corruption.
> >>
> >> try dual split import :D i mean, zpool -f import on 2 machines hooked =
up
> >> to the same disk chassis.
> >=20
> > Yes this is the first thing on the list to avoid .. :)
> >=20
> > I'm still busy to test the whole setup here, including the=20
> > MASTER -> BACKUP failover script (CARP), but I think you can prevent
> > that thanks to:
> >=20
> > - As long as ctld is running on the BACKUP the disks are locked=20
> > and you can't import the pool (even with -f) for ex (filer2 is the
> > BACKUP):
> > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
> >=20
> > - The shared pool should not be mounted at boot, and you should ensure
> > that the failover script is not executed during boot time too: this is
> > to handle the case wherein both machines turn off and/or re-ignite at
> > the same time. Indeed, the CARP interface can "flip" it's status if both
> > machines are powered on at the same time, for ex:
> > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
> > you will have a split-brain scenario
> >=20
> > - Sometimes you'll need to reboot the MASTER for some $reasons
> > (freebsd-update, etc) and the MASTER -> BACKUP switch should not
> > happen, this can be handled with a trigger file or something like that
> >=20
> > - I've still have to check if the order is OK, but I think that as long
> > as you shutdown the replication interface and that you adapt the
> > advskew (including the config file) of the CARP interface before the=20
> > zpool import -f in the failover script you can be relatively confident=
=20
> > that nothing will be written on the iSCSI targets
> >=20
> > - A zpool scrub should be run at regular intervals
> >=20
> > This is my MASTER -> BACKUP CARP script ATM
> > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
> >=20
> > Julien
> >=20
>=20
> 100=E2=82=AC question without detailed looking at that script. yes from a=
 first
> view its super simple, but: why are solutions like rsf-1 such more
> powerful / featurerich. Theres a reason for, which is that they try to
> cover every possible situation (which makes more than sense for this).

I've never used "rsf-1" so I can't say much more about it, but I have=20
no doubts about it's ability to handle "complex situations", where=20
multiple nodes / networks are involved.

>=20
> That script works for sure, within very limited cases imho
>=20
> >>
> >> kaboom, really ugly kaboom. thats what is very likely to happen sooner
> >> or later especially when it comes to homegrown automatism solutions.
> >> even the commercial parts where much more time/work goes into such
> >> solutions fail in a regular manner
> >>
> >>>
> >>> The advantage of ZFS send/receive of datasets is, however, that you c=
an consider it
> >>> essentially atomic. A transport corruption should not cause trouble (=
apart from a failed
> >>> "zfs receive") and with snapshot retention you can even roll back. Yo=
u can=E2=80=99t roll back
> >>> zpool replications :)
> >>>
> >>> ZFS receive does a lot of sanity checks as well. As long as your zfs =
receive doesn=E2=80=99t involve a rollback
> >>> to the latest snapshot, it won=E2=80=99t destroy anything by mistake.=
 Just make sure that your replica datasets
> >>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain.
> >>>
> >>>
> >>> Cheers,
> >>>
> >>>
> >>>
> >>>
> >>> Borja.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> freebsd-fs@freebsd.org mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >>>
> >> _______________________________________________
> >> freebsd-fs@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >=20

--=20
Julien Cigar
Belgian Biodiversity Platform (http://www.biodiversity.be)
PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

--I3tAPq1Rm2pUxvsp
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCgAGBQJXtDPSAAoJELK7NxCiBCPAQWQP/RBRHxh6kwjEjfVRPQd3y9ky
omHqCV+ej068aB0J0D44wXdFKYWrIPNX28Mfg5muaIWZvRmwUH2zLKNgxLFKpzNS
y8XyY0SktMzsBYZVHicu6US/l+5+BTfNes2HTdB0592etvtPuSW/E6xZCwwe4mga
XZmc4vNByAViWqnH6+B7cQTviLx3K8ZQU2JRZMrrkLKOqjoOH5K6xrc4rq67jU0z
j9t2kQ90X8cdMEMdWuz8o4NCZtM3T70sjswHPvd/8GwBKdsVlJlQuhQNECIPYsGz
bvh4t37HK3SkL2k91JgPysWdqNxoUuF8Q4wg91Vn+0riWvdVxyJpWODu+y1qLXk9
eUNYU/bWAXz2iPuKw41JwclvQfFhG5+ND1Q9WyqR3I5QMxZub5T/64mgRNu2wTZ+
bXeKgjq6bhM55L2GzHyl5LGZOkxWK+HTpgBuPATE27Ya0Ass3EEB86aXBsylkMqD
dnNfht3QAv1xKsXzteoaiJ2t0Hcyzu2vqdScE9oJY8/k8aiHl9JXMoCo932MogYU
mZGkydJrT2BxqvAbSo83e+fg+IwVLsiKU1zFATTztT9fIXlYmlAjMaoC4h9yrYBb
pMo5X8ThyY8wduglq7V+zikRWWBohRn/jInDMKRWsExzAQvFAFyWyHafxOxMk8E2
bwPvxdjqwagH4b1S7a5D
=bUas
-----END PGP SIGNATURE-----

--I3tAPq1Rm2pUxvsp--

From owner-freebsd-fs@freebsd.org  Wed Aug 17 10:55:36 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CCEEBBCEB0
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 10:55:36 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4C6AD1959
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 10:55:36 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HAtZNv002911
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 10:55:36 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Wed, 17 Aug 2016 10:55:35 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: bdrewery@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-211013-3630-oP34cAeFhu@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 10:55:36 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

Bryan Drewery <bdrewery@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bdrewery@FreeBSD.org

--- Comment #6 from Bryan Drewery <bdrewery@FreeBSD.org> ---
Depending on how long you wanted to let this be tested in head, it may very
well make the timeline to MFC into releng/11.0.  Even if an unrare case, it
seems worth it to me
to merge this if it fits into the timeline.

How long were you wanting to let this bake in head?

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 11:33:45 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 06FFEBBC125
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 11:33:45 +0000 (UTC)
 (envelope-from julien@perdition.city)
Received: from relay-b01.edpnet.be (relay-b01.edpnet.be [212.71.1.221])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "edpnet.email",
 Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id ADFD314AE
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 11:33:44 +0000 (UTC)
 (envelope-from julien@perdition.city)
X-ASG-Debug-ID: 1471433619-0a7ff52c9e23b260001-3nHGF7
Received: from mordor.lan (213.219.167.114.bro01.dyn.edpnet.net
 [213.219.167.114]) by relay-b01.edpnet.be with ESMTP id Ml8DFxGODwBFIhWx
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 17 Aug 2016 13:33:41 +0200 (CEST)
X-Barracuda-Envelope-From: julien@perdition.city
X-Barracuda-Effective-Source-IP: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114]
X-Barracuda-Apparent-Source-IP: 213.219.167.114
Date: Wed, 17 Aug 2016 13:33:39 +0200
From: Julien Cigar <julien@perdition.city>
To: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Cc: freebsd-fs@freebsd.org
Subject: Re: HAST + ZFS + NFS + CARP
Message-ID: <20160817113339.GH22506@mordor.lan>
X-ASG-Orig-Subj: Re: HAST + ZFS + NFS + CARP
References: <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="qVHblb/y9DPlgkHs"
Content-Disposition: inline
In-Reply-To: <20160817095222.GG22506@mordor.lan>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Barracuda-Connect: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114]
X-Barracuda-Start-Time: 1471433620
X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384
X-Barracuda-URL: https://212.71.1.221:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 6138
X-Virus-Scanned: by bsmtpd at edpnet.be
X-Barracuda-BRTS-Status: 1
X-Barracuda-Bayes: INNOCENT GLOBAL 0.5000 1.0000 0.0100
X-Barracuda-Spam-Score: 0.01
X-Barracuda-Spam-Status: No, SCORE=0.01 using global scores of TAG_LEVEL=1000.0
 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests=
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32086
 Rule breakdown below
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 11:33:45 -0000


--qVHblb/y9DPlgkHs
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Aug 17, 2016 at 11:52:22AM +0200, Julien Cigar wrote:
> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinte=
r wrote:
> >=20
> >=20
> > Am 17.08.2016 um 10:54 schrieb Julien Cigar:
> > > On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gottesw=
inter wrote:
> > >>
> > >>
> > >> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> > >>>
> > >>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> wro=
te:
> > >>>>
> > >>>> As I said in a previous post I tested the zfs send/receive approac=
h (with
> > >>>> zrep) and it works (more or less) perfectly.. so I concur in all w=
hat you
> > >>>> said, especially about off-site replicate and synchronous replicat=
ion.
> > >>>>
> > >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the mome=
nt,=20
> > >>>> I'm in the early tests, haven't done any heavy writes yet, but ATM=
 it=20
> > >>>> works as expected, I havent' managed to corrupt the zpool.
> > >>>
> > >>> I must be too old school, but I don=E2=80=99t quite like the idea o=
f using an essentially unreliable transport
> > >>> (Ethernet) for low-level filesystem operations.
> > >>>
> > >>> In case something went wrong, that approach could risk corrupting a=
 pool. Although, frankly,
> > >>> ZFS is extremely resilient. One of mine even survived a SAS HBA pro=
blem that caused some
> > >>> silent corruption.
> > >>
> > >> try dual split import :D i mean, zpool -f import on 2 machines hooke=
d up
> > >> to the same disk chassis.
> > >=20
> > > Yes this is the first thing on the list to avoid .. :)
> > >=20
> > > I'm still busy to test the whole setup here, including the=20
> > > MASTER -> BACKUP failover script (CARP), but I think you can prevent
> > > that thanks to:
> > >=20
> > > - As long as ctld is running on the BACKUP the disks are locked=20
> > > and you can't import the pool (even with -f) for ex (filer2 is the
> > > BACKUP):
> > > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
> > >=20
> > > - The shared pool should not be mounted at boot, and you should ensure
> > > that the failover script is not executed during boot time too: this is
> > > to handle the case wherein both machines turn off and/or re-ignite at
> > > the same time. Indeed, the CARP interface can "flip" it's status if b=
oth
> > > machines are powered on at the same time, for ex:
> > > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
> > > you will have a split-brain scenario
> > >=20
> > > - Sometimes you'll need to reboot the MASTER for some $reasons
> > > (freebsd-update, etc) and the MASTER -> BACKUP switch should not
> > > happen, this can be handled with a trigger file or something like that
> > >=20
> > > - I've still have to check if the order is OK, but I think that as lo=
ng
> > > as you shutdown the replication interface and that you adapt the
> > > advskew (including the config file) of the CARP interface before the=
=20
> > > zpool import -f in the failover script you can be relatively confiden=
t=20
> > > that nothing will be written on the iSCSI targets
> > >=20
> > > - A zpool scrub should be run at regular intervals
> > >=20
> > > This is my MASTER -> BACKUP CARP script ATM
> > > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
> > >=20
> > > Julien
> > >=20
> >=20
> > 100=E2=82=AC question without detailed looking at that script. yes from=
 a first
> > view its super simple, but: why are solutions like rsf-1 such more
> > powerful / featurerich. Theres a reason for, which is that they try to
> > cover every possible situation (which makes more than sense for this).
>=20
> I've never used "rsf-1" so I can't say much more about it, but I have=20
> no doubts about it's ability to handle "complex situations", where=20
> multiple nodes / networks are involved.

BTW for simple cases (two nodes, same network, one active node, ...) we
could use both: ZFS + iSCSI + CARP on the two nodes, and=20
zfs send|zfs receive on a third one

>=20
> >=20
> > That script works for sure, within very limited cases imho
> >=20
> > >>
> > >> kaboom, really ugly kaboom. thats what is very likely to happen soon=
er
> > >> or later especially when it comes to homegrown automatism solutions.
> > >> even the commercial parts where much more time/work goes into such
> > >> solutions fail in a regular manner
> > >>
> > >>>
> > >>> The advantage of ZFS send/receive of datasets is, however, that you=
 can consider it
> > >>> essentially atomic. A transport corruption should not cause trouble=
 (apart from a failed
> > >>> "zfs receive") and with snapshot retention you can even roll back. =
You can=E2=80=99t roll back
> > >>> zpool replications :)
> > >>>
> > >>> ZFS receive does a lot of sanity checks as well. As long as your zf=
s receive doesn=E2=80=99t involve a rollback
> > >>> to the latest snapshot, it won=E2=80=99t destroy anything by mistak=
e. Just make sure that your replica datasets
> > >>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain.
> > >>>
> > >>>
> > >>> Cheers,
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Borja.
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> freebsd-fs@freebsd.org mailing list
> > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.or=
g"
> > >>>
> > >> _______________________________________________
> > >> freebsd-fs@freebsd.org mailing list
> > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> > >=20
>=20
> --=20
> Julien Cigar
> Belgian Biodiversity Platform (http://www.biodiversity.be)
> PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
> No trees were killed in the creation of this message.
> However, many electrons were terribly inconvenienced.


--=20
Julien Cigar
Belgian Biodiversity Platform (http://www.biodiversity.be)
PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

--qVHblb/y9DPlgkHs
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCgAGBQJXtEuQAAoJELK7NxCiBCPA09UP/0Z7hUd/IzKJJRZ038i0Js1c
4tWlU8vuN3wP3ASg1hX4+UZzGnT8U5IvW0jIWKj5BN4e4fS5MnMnz1CoM31eUJdM
/CZUOw6RMX/uRnKCQwgkGyvtOr3kSXmbJ6lFu9dBNbtj4YQosqR9GsaJOl4zJoXT
XQN7gbzVefFlO0FXeu9OtJwv1GYb0oFNcvOqqujM+nXrNfW0Y9jQF85QSZZmDnz9
LJrjy2JhZPQmqiM8QGSytl/XMKNbdlKijm5dLmBSUNMoSnPFW24zf7ORMzBgk25v
M/h2tnsg/pKY1iNDJAlbQ/Qa+4VSWw4sjdIiVyLjUUD6x9GbEJ48m0Bx9tIZJzH6
LzX0Q6cNtmluvPSQt2UGEqVGgdogSCkP8HNbaeYeRw38P172Muc5yZ535ej0Z8CJ
/pPxruN/yIZPCS0FLIFJyt8O7J/lNnKOzt7K5YDXPadLfXe23EatKAI3EerjY2vc
JdtTah2GKPp16Qag1sSK2wpdRIxXJUbuz5kRk6ZdgC/RmdsT63Q8h9X6RM7lx7W1
hW6Wlk/ApEpx2BRlUNWWKhfRdKvyDqQ0DW6tRQCDsXPk8usaaerUByUPjIEsAPD8
s8gh4CyJp5hbK47uGMthURRCmE6xzAbyefGy7TLVmDohgW6JAfAnB1JvcWbHjpMz
39snWvMT/9HH0UojgpqX
=Urre
-----END PGP SIGNATURE-----

--qVHblb/y9DPlgkHs--

From owner-freebsd-fs@freebsd.org  Wed Aug 17 13:33:59 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D3B7BBD1FF
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 13:33:59 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com
 [IPv6:2a00:1450:400c:c09::22d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B36A2168B
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 13:33:58 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: by mail-wm0-x22d.google.com with SMTP id q128so199330985wma.1
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 06:33:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=1pQ9J0u2kWdyDKgpDBmQZlA7DwF9tH9t6h79mTx4eb0=;
 b=jFh9FT/lawwdWOLWFE7m2Vqv4oeAgj2gKYn3IIrTn+v3XnP1fFkA8WnbdNfbzGUAS2
 fbrrrEUmshl+GZQ8KtFqBWOx8jP17+n/T2v9t0vD4fJTRBVcEbSSZbU/uz3XANKN2znL
 CFyjBXKoKi2lWmVOCV+BhTb5IlyC2X2eh/9UQgv1LbOwHp2FEDCnHc6SGiXw0/kEOvRa
 JEX5f+wuP1t9dwuY67Qpgtsn/qq6pHdsCIUn95TnxmLRi6By9N/rfWvY+UqR64ofNaV6
 GZ3CV3OG619PsIEKPYj5OeeqaPTIe0C/+1FDDSEghiUXuqZj+8ITnwboi8QeoLJRSCEC
 Xkkw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=1pQ9J0u2kWdyDKgpDBmQZlA7DwF9tH9t6h79mTx4eb0=;
 b=A9EQzV/S0XUZq+fqV5LIEHe3iRfX5YASinHbpX59Bp/jhz8ljNDCSKXeMx1/7ahpNl
 SZeDWOwfLkdeeZpalkO1b/jL0JPw+FsBfTayfIthIFle+j6OYYBdFymsNVMZlnedh7ri
 budfha7DWbXO/igSuNlti5/EzV0bMdCRUK10jZuPe/PNvG7VaTgYsRDd/MHZKzYJXtcr
 pLabTbDiagb/PCDEuvOSU23iWmJuwwppVThoyKQTEpNkZOmfebHQW7WCp6ejszKEniIq
 6Yz8WaPQrywi6trnvvqMWYIp+7inQbIVcM5wcf8lOkV+/3BzZnFGipxiNEc1Ki+dMtsx
 gNXA==
X-Gm-Message-State: AEkooutVFoX6aS9XVRwkrAdVdIRXg007jldAUEsYh4sLqaXDoQq2lhOhsnnr0HrgDAc7tEayDUJajKB3fAKm4g==
X-Received: by 10.194.175.106 with SMTP id bz10mr44195283wjc.42.1471440837173; 
 Wed, 17 Aug 2016 06:33:57 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.54.202 with HTTP; Wed, 17 Aug 2016 06:33:56 -0700 (PDT)
In-Reply-To: <20160817113339.GH22506@mordor.lan>
References: <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <20160817113339.GH22506@mordor.lan>
From: krad <kraduk@gmail.com>
Date: Wed, 17 Aug 2016 14:33:56 +0100
Message-ID: <CALfReye_f0_3kFF08KS0fCB5wbTKdZ5=ymh8WM5S18YEfbHqNg@mail.gmail.com>
Subject: Re: HAST + ZFS + NFS + CARP
To: Julien Cigar <julien@perdition.city>
Cc: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>,
 FreeBSD FS <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 13:33:59 -0000

What are peoples experiences on running something like moosfs on top of
zfs? It looks really compelling on certain levels, but i'm not sure about
the reality in a production network yet.

On 17 August 2016 at 12:33, Julien Cigar <julien@perdition.city> wrote:

> On Wed, Aug 17, 2016 at 11:52:22AM +0200, Julien Cigar wrote:
> > On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
> Gotteswinter wrote:
> > >
> > >
> > > Am 17.08.2016 um 10:54 schrieb Julien Cigar:
> > > > On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
> Gotteswinter wrote:
> > > >>
> > > >>
> > > >> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> > > >>>
> > > >>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city>
> wrote:
> > > >>>>
> > > >>>> As I said in a previous post I tested the zfs send/receive
> approach (with
> > > >>>> zrep) and it works (more or less) perfectly.. so I concur in all
> what you
> > > >>>> said, especially about off-site replicate and synchronous
> replication.
> > > >>>>
> > > >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
> moment,
> > > >>>> I'm in the early tests, haven't done any heavy writes yet, but
> ATM it
> > > >>>> works as expected, I havent' managed to corrupt the zpool.
> > > >>>
> > > >>> I must be too old school, but I don=E2=80=99t quite like the idea=
 of using
> an essentially unreliable transport
> > > >>> (Ethernet) for low-level filesystem operations.
> > > >>>
> > > >>> In case something went wrong, that approach could risk corrupting
> a pool. Although, frankly,
> > > >>> ZFS is extremely resilient. One of mine even survived a SAS HBA
> problem that caused some
> > > >>> silent corruption.
> > > >>
> > > >> try dual split import :D i mean, zpool -f import on 2 machines
> hooked up
> > > >> to the same disk chassis.
> > > >
> > > > Yes this is the first thing on the list to avoid .. :)
> > > >
> > > > I'm still busy to test the whole setup here, including the
> > > > MASTER -> BACKUP failover script (CARP), but I think you can preven=
t
> > > > that thanks to:
> > > >
> > > > - As long as ctld is running on the BACKUP the disks are locked
> > > > and you can't import the pool (even with -f) for ex (filer2 is the
> > > > BACKUP):
> > > > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
> > > >
> > > > - The shared pool should not be mounted at boot, and you should
> ensure
> > > > that the failover script is not executed during boot time too: this
> is
> > > > to handle the case wherein both machines turn off and/or re-ignite =
at
> > > > the same time. Indeed, the CARP interface can "flip" it's status if
> both
> > > > machines are powered on at the same time, for ex:
> > > > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
> and
> > > > you will have a split-brain scenario
> > > >
> > > > - Sometimes you'll need to reboot the MASTER for some $reasons
> > > > (freebsd-update, etc) and the MASTER -> BACKUP switch should not
> > > > happen, this can be handled with a trigger file or something like
> that
> > > >
> > > > - I've still have to check if the order is OK, but I think that as
> long
> > > > as you shutdown the replication interface and that you adapt the
> > > > advskew (including the config file) of the CARP interface before th=
e
> > > > zpool import -f in the failover script you can be relatively
> confident
> > > > that nothing will be written on the iSCSI targets
> > > >
> > > > - A zpool scrub should be run at regular intervals
> > > >
> > > > This is my MASTER -> BACKUP CARP script ATM
> > > > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
> > > >
> > > > Julien
> > > >
> > >
> > > 100=E2=82=AC question without detailed looking at that script. yes fr=
om a first
> > > view its super simple, but: why are solutions like rsf-1 such more
> > > powerful / featurerich. Theres a reason for, which is that they try t=
o
> > > cover every possible situation (which makes more than sense for this)=
.
> >
> > I've never used "rsf-1" so I can't say much more about it, but I have
> > no doubts about it's ability to handle "complex situations", where
> > multiple nodes / networks are involved.
>
> BTW for simple cases (two nodes, same network, one active node, ...) we
> could use both: ZFS + iSCSI + CARP on the two nodes, and
> zfs send|zfs receive on a third one
>
> >
> > >
> > > That script works for sure, within very limited cases imho
> > >
> > > >>
> > > >> kaboom, really ugly kaboom. thats what is very likely to happen
> sooner
> > > >> or later especially when it comes to homegrown automatism solution=
s.
> > > >> even the commercial parts where much more time/work goes into such
> > > >> solutions fail in a regular manner
> > > >>
> > > >>>
> > > >>> The advantage of ZFS send/receive of datasets is, however, that
> you can consider it
> > > >>> essentially atomic. A transport corruption should not cause
> trouble (apart from a failed
> > > >>> "zfs receive") and with snapshot retention you can even roll back=
.
> You can=E2=80=99t roll back
> > > >>> zpool replications :)
> > > >>>
> > > >>> ZFS receive does a lot of sanity checks as well. As long as your
> zfs receive doesn=E2=80=99t involve a rollback
> > > >>> to the latest snapshot, it won=E2=80=99t destroy anything by mist=
ake. Just
> make sure that your replica datasets
> > > >>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain.
> > > >>>
> > > >>>
> > > >>> Cheers,
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Borja.
> > > >>>
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> freebsd-fs@freebsd.org mailing list
> > > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@
> freebsd.org"
> > > >>>
> > > >> _______________________________________________
> > > >> freebsd-fs@freebsd.org mailing list
> > > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@
> freebsd.org"
> > > >
> >
> > --
> > Julien Cigar
> > Belgian Biodiversity Platform (http://www.biodiversity.be)
> > PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
> > No trees were killed in the creation of this message.
> > However, many electrons were terribly inconvenienced.
>
>
>
> --
> Julien Cigar
> Belgian Biodiversity Platform (http://www.biodiversity.be)
> PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
> No trees were killed in the creation of this message.
> However, many electrons were terribly inconvenienced.
>

From owner-freebsd-fs@freebsd.org  Wed Aug 17 15:38:06 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8C69BBC661
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 15:38:06 +0000 (UTC)
 (envelope-from joe@getsomewhere.net)
Received: from prak.gameowls.com (prak.gameowls.com
 [IPv6:2001:19f0:5c00:950b:5400:ff:fe14:46b7])
 by mx1.freebsd.org (Postfix) with ESMTP id 84A3D15EE
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 15:38:06 +0000 (UTC)
 (envelope-from joe@getsomewhere.net)
Received: from [IPv6:2001:470:c412:beef:135:c8df:2d0e:4ea6] (unknown
 [IPv6:2001:470:c412:beef:135:c8df:2d0e:4ea6])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by prak.gameowls.com (Postfix) with ESMTPSA id B6C001863E;
 Wed, 17 Aug 2016 10:37:58 -0500 (CDT)
From: Joe Love <joe@getsomewhere.net>
Message-Id: <B7E6B76A-38C9-40CF-8AD9-C0E77B92EA5D@getsomewhere.net>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re:  MooseFS on FreeBSD (was: HAST + ZFS + NFS + CARP)
Date: Wed, 17 Aug 2016 10:37:57 -0500
References: <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan> <20160817113339.GH22506@mordor.lan>
 <CALfReye_f0_3kFF08KS0fCB5wbTKdZ5=ymh8WM5S18YEfbHqNg@mail.gmail.com>
To: krad <kraduk@gmail.com>,
 FreeBSD FS <freebsd-fs@freebsd.org>
In-Reply-To: <CALfReye_f0_3kFF08KS0fCB5wbTKdZ5=ymh8WM5S18YEfbHqNg@mail.gmail.com>
X-Mailer: Apple Mail (2.3124)
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 15:38:06 -0000

On Aug 17, 2016, at 8:33 AM, krad <kraduk@gmail.com> wrote:
>=20
> What are peoples experiences on running something like moosfs on top =
of
> zfs? It looks really compelling on certain levels, but i'm not sure =
about
> the reality in a production network yet.
>=20

I did some experimenting with MooseFS on a test cluster (using ZFS as =
the local storage on the nodes).  That was nearly a year ago, when I =
decided it wasn=E2=80=99t a good fit as a storage backend to vmware =
(primarily because of the overhead involved with traversing from vmware =
over nfs to moosefs).

They did a bunch of tweaking back then as I kept prodding a bit for =
better operations on FreeBSD, but they ultimately ran into a stumbling =
block with I/O caching in FreeBSD.

Here is their analysis of what they ran into, and the solution they came =
up with for higher throughput:
https://sourceforge.net/p/moosefs/mailman/message/34483159/ =
<https://sourceforge.net/p/moosefs/mailman/message/34483159/>

I=E2=80=99d love to get around to testing MooseFS as a backing store for =
bhyve, but I currently lack the power & network ports to connect up my =
test nodes again.

-Joe


From owner-freebsd-fs@freebsd.org  Wed Aug 17 15:50:48 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 085AEBBCD64
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 15:50:48 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EC144143F
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 15:50:47 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HFolt7078797
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 15:50:47 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211491] System hangs after "Uptime" on reboot with ZFS
Date: Wed, 17 Aug 2016 15:50:47 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA3
X-Bugzilla-Keywords: needs-qa
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: vangyzen@FreeBSD.org
X-Bugzilla-Status: Open
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: mfc-stable10? mfc-stable11?
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211491-3630-9wFHyvbEKh@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211491-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211491-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 15:50:48 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211491

--- Comment #18 from Eric van Gyzen <vangyzen@FreeBSD.org> ---
I can still reproduce this on head at r304162.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 16:13:29 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C667ABBD6D8
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 16:13:29 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B62D117E5
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 16:13:29 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HGDT5b067181
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 16:13:29 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Wed, 17 Aug 2016 16:13:29 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: mckusick@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211013-3630-xZCFBWtP9X@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211013-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 16:13:29 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

--- Comment #7 from Kirk McKusick <mckusick@FreeBSD.org> ---
I would like a week or two in head just to be sure that it does not break a=
ny
existing code usage. Note that 304230 has to be merged at the same time as
this one (304239) since this one uses the new LIST_CONCAT added to queue.h =
in
304230.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 16:18:44 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 852BABBD79E
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 16:18:44 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: from mail-it0-x230.google.com (mail-it0-x230.google.com
 [IPv6:2607:f8b0:4001:c0b::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5157619A0
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 16:18:44 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: by mail-it0-x230.google.com with SMTP id e63so3988181ith.1
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 09:18:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=kateley-com.20150623.gappssmtp.com; s=20150623;
 h=reply-to:subject:references:to:from:organization:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=+AZDDlwiY4qKm3640B1HdOHHyaLkjajh2cy6kanlXJ0=;
 b=LnlukCi7ufpHeIBYKvfIpo0LfADSfRMid0gkDoyRQk5cvrvpXvpFX1EtZ7ZM8UAupH
 +LBUOceNnSZZ9cXD+PgHYJ74VNBqAvCxR4eAdnMZ9k0MwZ8BQ0yXgyLx2JKahLnTWGUs
 dQYmIMUWgd3C4lKtALB5Cmrp0+KR5CvfKlH4rrMXuLIPcmm6LEmB+4PtGzJEVpIKxeIU
 9RuS3HaGkZa5koskaj6JiIzr1yRNvkFqKf1qUBpPgtbINYl5atwk+7JjCK/OGV35Xfef
 CBnY1uEvqkKzDFu4Z485aDWYWA8Uho1VsrX0XvBuCY2eqsFKe5mA1doWEdD4RUdl9GY8
 X+JA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:reply-to:subject:references:to:from:organization
 :message-id:date:user-agent:mime-version:in-reply-to
 :content-transfer-encoding;
 bh=+AZDDlwiY4qKm3640B1HdOHHyaLkjajh2cy6kanlXJ0=;
 b=EgFlbtuhg6Rttxose3EGZUM1wiu6r4t+N0woTxnp2irauuFgnIFWKz29XNnX8gZd0X
 KBySFV8YzziBXkVYRYj3n2IV46vnJqvWpRMxz4J85bXXpd3UYE31nbP1Nybv4zidK+S0
 aqvOrfVtepH/Wp+eVg9p30ul8H4zZa56q9RuhL3yHJPBAQDkvVsR3BKccl7kC0wDcTrk
 +Zu+NnnhbPeUkk6bDNACXebBRpXYYfi415PFjwL/bxv1KcBepeOwmStals/FjxC+uHIC
 rggSFgR0y3LDbk0UipNGd9AoJr2BVJGe5SzfkgWFR9hQ4Gqlsdv59Yh1c4Su677+8ewB
 8ivg==
X-Gm-Message-State: AEkoouvY+XjHGHLT0pIJEJfFVVUE+AOHTx1DXEdhJXk4fe8sA3dO5bHa9z949aN3A+Y88w==
X-Received: by 10.36.53.83 with SMTP id k80mr28919210ita.59.1471450723555;
 Wed, 17 Aug 2016 09:18:43 -0700 (PDT)
Received: from Kateleyco-iMac.local (c-50-188-36-30.hsd1.mn.comcast.net.
 [50.188.36.30])
 by smtp.googlemail.com with ESMTPSA id o16sm265032itg.15.2016.08.17.09.18.42
 for <freebsd-fs@freebsd.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 17 Aug 2016 09:18:42 -0700 (PDT)
Reply-To: linda@kateley.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
To: freebsd-fs@freebsd.org
From: Linda Kateley <lkateley@kateley.com>
Organization: Kateley Company
Message-ID: <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
Date: Wed, 17 Aug 2016 11:18:42 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160817095222.GG22506@mordor.lan>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 16:18:44 -0000

The question I always ask, as an architect, is "can you lose 1 minute 
worth of data?" If you can, then batched replication is perfect. If you 
can't.. then HA. Every place I have positioned it, rsf-1 has worked 
extremely well. If i remember right, it works at the dmu. I would 
suggest try it. They have been trying to have a full freebsd solution, I 
have several customers running it well.

linda


On 8/17/16 4:52 AM, Julien Cigar wrote:
> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter wrote:
>>
>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote:
>>>>
>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> wrote:
>>>>>>
>>>>>> As I said in a previous post I tested the zfs send/receive approach (with
>>>>>> zrep) and it works (more or less) perfectly.. so I concur in all what you
>>>>>> said, especially about off-site replicate and synchronous replication.
>>>>>>
>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment,
>>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it
>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport
>>>>> (Ethernet) for low-level filesystem operations.
>>>>>
>>>>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly,
>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some
>>>>> silent corruption.
>>>> try dual split import :D i mean, zpool -f import on 2 machines hooked up
>>>> to the same disk chassis.
>>> Yes this is the first thing on the list to avoid .. :)
>>>
>>> I'm still busy to test the whole setup here, including the
>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>> that thanks to:
>>>
>>> - As long as ctld is running on the BACKUP the disks are locked
>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>> BACKUP):
>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>
>>> - The shared pool should not be mounted at boot, and you should ensure
>>> that the failover script is not executed during boot time too: this is
>>> to handle the case wherein both machines turn off and/or re-ignite at
>>> the same time. Indeed, the CARP interface can "flip" it's status if both
>>> machines are powered on at the same time, for ex:
>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>> you will have a split-brain scenario
>>>
>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>> happen, this can be handled with a trigger file or something like that
>>>
>>> - I've still have to check if the order is OK, but I think that as long
>>> as you shutdown the replication interface and that you adapt the
>>> advskew (including the config file) of the CARP interface before the
>>> zpool import -f in the failover script you can be relatively confident
>>> that nothing will be written on the iSCSI targets
>>>
>>> - A zpool scrub should be run at regular intervals
>>>
>>> This is my MASTER -> BACKUP CARP script ATM
>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>
>>> Julien
>>>
>> 100€ question without detailed looking at that script. yes from a first
>> view its super simple, but: why are solutions like rsf-1 such more
>> powerful / featurerich. Theres a reason for, which is that they try to
>> cover every possible situation (which makes more than sense for this).
> I've never used "rsf-1" so I can't say much more about it, but I have
> no doubts about it's ability to handle "complex situations", where
> multiple nodes / networks are involved.
>
>> That script works for sure, within very limited cases imho
>>
>>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner
>>>> or later especially when it comes to homegrown automatism solutions.
>>>> even the commercial parts where much more time/work goes into such
>>>> solutions fail in a regular manner
>>>>
>>>>> The advantage of ZFS send/receive of datasets is, however, that you can consider it
>>>>> essentially atomic. A transport corruption should not cause trouble (apart from a failed
>>>>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back
>>>>> zpool replications :)
>>>>>
>>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback
>>>>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets
>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Borja.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>>
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Wed Aug 17 16:29:49 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61C42BBDC16
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 16:29:49 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5165911D2
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 16:29:49 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HGTnDB096239
 for <freebsd-fs@FreeBSD.org>; Wed, 17 Aug 2016 16:29:49 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211939] ZFS does not correctly import cache and spares by label
Date: Wed, 17 Aug 2016 16:29:49 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: linimon@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: assigned_to
Message-ID: <bug-211939-3630-79BLYLgnoA@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211939-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211939-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 16:29:49 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211939

Mark Linimon <linimon@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|freebsd-bugs@FreeBSD.org    |freebsd-fs@FreeBSD.org

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Wed Aug 17 16:55:32 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A0F1BBD5C1
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 16:55:32 +0000 (UTC)
 (envelope-from bsdunix44@gmail.com)
Received: from mail-pf0-x22f.google.com (mail-pf0-x22f.google.com
 [IPv6:2607:f8b0:400e:c00::22f])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 134711583
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 16:55:32 +0000 (UTC)
 (envelope-from bsdunix44@gmail.com)
Received: by mail-pf0-x22f.google.com with SMTP id x72so39167264pfd.2
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 09:55:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=references:in-reply-to:mime-version:content-transfer-encoding
 :message-id:cc:from:subject:date:to;
 bh=xE4l3qDTiVBBpnuU2Ob3pRGs/CtQZQjzm8vKAY1p06U=;
 b=z4OTC7Ly3fCMDuOjEHpc9Cig/nogUrIXGgi3hQ41Lyptz1M6e54h3LT7md/vRPsfkP
 a68UZky+rNWcPEFmyXzettoZwmgSSrOe/0U9wQi/B2UWo5cYF9rI870ynSPxbF/2urmP
 e0eyAcizx58wfw/qdW049pB3Wn5B2ilC3pcxI6uqgl6Ftd03ELlDIrKLhtDD+CVE2La7
 ZaByEor91+XNvlxYf2wBvj7N7iyFONJKvCpSGP+C1ojGV0dNFAsHEAzkRkkodSSuYrpQ
 3WLskiVc6jV9Mtc+1R2gz6uVP3LVIgkq+HsXzJHH7J8YAHygqgGJhdG9AvNnKidbHhlB
 GByw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:references:in-reply-to:mime-version
 :content-transfer-encoding:message-id:cc:from:subject:date:to;
 bh=xE4l3qDTiVBBpnuU2Ob3pRGs/CtQZQjzm8vKAY1p06U=;
 b=hKgujPjfDZ0Qd665Ire/ugtEoD6HiCYUxNyNT/KQjFUaEK9MP5cn7YtG2x3XgPx8D3
 Dli3P1+zk4W895BUdiKG8lXd0PeSVYR/RLuSCC3tfO1ZHnul7bhjyOFs9PsTjeq4E1pu
 3Qx9grrci1IhlGKZLSQafuqg5U9f94lBE23xjS6jL2r+e1eWec360Pdi8mGvfE1qjEK3
 LZV8biCM/Ll2eO5H7Q8LiMihZ+XEacwUDTWN5agNiGqnCV5nzxAdWQ6kGkEn5se6/4me
 dmBAGge7zFTheBkuJHDsAERw7VRi6E8rbhFYtKerADWWhAyTMX+BmmqzcwNCEs0RJOzw
 NT1A==
X-Gm-Message-State: AEkoousl3AcukQv28quFXrYiJzK3oRSy8W7JbXT9P227jJ6m3xtxVbZiC5Q4sXk/XAD9zA==
X-Received: by 10.98.147.14 with SMTP id b14mr24930241pfe.103.1471452931348;
 Wed, 17 Aug 2016 09:55:31 -0700 (PDT)
Received: from [192.168.0.10] (cpe-70-118-225-173.kc.res.rr.com.
 [70.118.225.173])
 by smtp.gmail.com with ESMTPSA id n10sm11441pap.16.2016.08.17.09.55.29
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Wed, 17 Aug 2016 09:55:30 -0700 (PDT)
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
In-Reply-To: <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
Mime-Version: 1.0 (1.0)
Message-Id: <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
Cc: freebsd-fs@freebsd.org
X-Mailer: iPhone Mail (14A5341a)
From: Chris Watson <bsdunix44@gmail.com>
Subject: Re: HAST + ZFS + NFS + CARP
Date: Wed, 17 Aug 2016 11:55:27 -0500
To: linda@kateley.com
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 16:55:32 -0000

Of course, if you are willing to accept some amount of data loss that opens u=
p a lot more options. :)

Some may find that acceptable though. Like turning off fsync with PostgreSQL=
 to get much higher throughput. As little no as you are made *very* aware of=
 the risks.=20

It's good to have input in this thread from one with more experience with RS=
F-1 than the rest of us. You confirm what others have that said about RSF-1,=
 that it's stable and works well. What were you deploying it on?

Chris

Sent from my iPhone 5

> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com> wrote:
>=20
> The question I always ask, as an architect, is "can you lose 1 minute wort=
h of data?" If you can, then batched replication is perfect. If you can't.. t=
hen HA. Every place I have positioned it, rsf-1 has worked extremely well. I=
f i remember right, it works at the dmu. I would suggest try it. They have b=
een trying to have a full freebsd solution, I have several customers running=
 it well.
>=20
> linda
>=20
>=20
>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswint=
er wrote:
>>>=20
>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswi=
nter wrote:
>>>>>=20
>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city> wrote=
:
>>>>>>>=20
>>>>>>> As I said in a previous post I tested the zfs send/receive approach (=
with
>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in all wha=
t you
>>>>>>> said, especially about off-site replicate and synchronous replicatio=
n.
>>>>>>>=20
>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment=
,
>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM i=
t
>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>> I must be too old school, but I don=E2=80=99t quite like the idea of u=
sing an essentially unreliable transport
>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>=20
>>>>>> In case something went wrong, that approach could risk corrupting a p=
ool. Although, frankly,
>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA probl=
em that caused some
>>>>>> silent corruption.
>>>>> try dual split import :D i mean, zpool -f import on 2 machines hooked u=
p
>>>>> to the same disk chassis.
>>>> Yes this is the first thing on the list to avoid .. :)
>>>>=20
>>>> I'm still busy to test the whole setup here, including the
>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>>> that thanks to:
>>>>=20
>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>>> BACKUP):
>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>=20
>>>> - The shared pool should not be mounted at boot, and you should ensure
>>>> that the failover script is not executed during boot time too: this is
>>>> to handle the case wherein both machines turn off and/or re-ignite at
>>>> the same time. Indeed, the CARP interface can "flip" it's status if bot=
h
>>>> machines are powered on at the same time, for ex:
>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>>> you will have a split-brain scenario
>>>>=20
>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>> happen, this can be handled with a trigger file or something like that
>>>>=20
>>>> - I've still have to check if the order is OK, but I think that as long=

>>>> as you shutdown the replication interface and that you adapt the
>>>> advskew (including the config file) of the CARP interface before the
>>>> zpool import -f in the failover script you can be relatively confident
>>>> that nothing will be written on the iSCSI targets
>>>>=20
>>>> - A zpool scrub should be run at regular intervals
>>>>=20
>>>> This is my MASTER -> BACKUP CARP script ATM
>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>=20
>>>> Julien
>>>>=20
>>> 100=E2=82=AC question without detailed looking at that script. yes from a=
 first
>>> view its super simple, but: why are solutions like rsf-1 such more
>>> powerful / featurerich. Theres a reason for, which is that they try to
>>> cover every possible situation (which makes more than sense for this).
>> I've never used "rsf-1" so I can't say much more about it, but I have
>> no doubts about it's ability to handle "complex situations", where
>> multiple nodes / networks are involved.
>>=20
>>> That script works for sure, within very limited cases imho
>>>=20
>>>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner=

>>>>> or later especially when it comes to homegrown automatism solutions.
>>>>> even the commercial parts where much more time/work goes into such
>>>>> solutions fail in a regular manner
>>>>>=20
>>>>>> The advantage of ZFS send/receive of datasets is, however, that you c=
an consider it
>>>>>> essentially atomic. A transport corruption should not cause trouble (=
apart from a failed
>>>>>> "zfs receive") and with snapshot retention you can even roll back. Yo=
u can=E2=80=99t roll back
>>>>>> zpool replications :)
>>>>>>=20
>>>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs r=
eceive doesn=E2=80=99t involve a rollback
>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by mistake.=
 Just make sure that your replica datasets
>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain.
>>>>>>=20
>>>>>>=20
>>>>>> Cheers,
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>> Borja.
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>> _______________________________________________
>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=

>>>>>>=20
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@freebsd.org  Wed Aug 17 18:03:22 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5D116BBDB71
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 18:03:22 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: from mail-io0-x235.google.com (mail-io0-x235.google.com
 [IPv6:2607:f8b0:4001:c06::235])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 238BD1E0C
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 18:03:22 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: by mail-io0-x235.google.com with SMTP id b62so142309336iod.3
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 11:03:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=kateley-com.20150623.gappssmtp.com; s=20150623;
 h=reply-to:subject:references:to:cc:from:organization:message-id:date
 :user-agent:mime-version:in-reply-to;
 bh=W3CZ6snaEsSDrvxIVpZVNCrQBRMZ/J6cyLqa9EV80C8=;
 b=m4odc/p8dKvuk9pI2qKktnCxG0TXSMmJ9J4HDk9woDch65sNz3aorvH8mYlw/nQn4j
 dRGpZ/Sz6gQDfnkCwFnpsU5xGy+6NDETgP5TMikl8GMTfgs7Y1vh1tBdozE+d4ytMdHy
 /ig8gabKXaf7TPEgLjmcCSmurPzEPyCKpKcgknYktUvb7tP9mrhgTLqjUBrOO5C9H9Q5
 JE9W5zD9jJx8Obc1Co1aqtdZnUzv7z2yyyMAYg08m1JSyPfbUY19aHJvoa6veP40pUlZ
 VOW5oX3IRN8AwZhov/uyaBttWILOZ4qPRQarlgw11QCa/Zk6p7YCXX40me8kL5iffOKS
 cThw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:reply-to:subject:references:to:cc:from
 :organization:message-id:date:user-agent:mime-version:in-reply-to;
 bh=W3CZ6snaEsSDrvxIVpZVNCrQBRMZ/J6cyLqa9EV80C8=;
 b=Ishima+vkpdLgxxt4yajXhpNZyheJT9WhlTVm97a75crfu1hg279S1/gEFiAigW+NZ
 n9DReiIiqGrX0Bo4R9lIyUjCVFnKaNcpytkaYSt7DIecpjk2xF4k4UevAZZfyD0+OO4J
 QYjslJE52i0i+mzCgVUp+LXcbsgjyAHg1TqajeEUhAoYewiNch39ia5aMSTK50AaSfHU
 n+zzl1qcH/qbfGuaAgHesI5TZoMNkZDmmCsZ/y5v1cOPlzSbkkBv9IiE5Hzl+SsbTxDe
 NG96sT4C4GmGN5V/EFkkHlTlQClttHGMYVfj3nVsUNGklXBoc64jYs9/q10Ui8Cv9xrh
 bqZA==
X-Gm-Message-State: AEkoous5PE/JZw9Dj7MXD2Raoa76xi1bRt1RwCbMY3V6+rPdgu/oosucVd8rpQLVmzBSxw==
X-Received: by 10.107.152.201 with SMTP id a192mr54177775ioe.24.1471457001313; 
 Wed, 17 Aug 2016 11:03:21 -0700 (PDT)
Received: from Kateleyco-iMac.local (c-50-188-36-30.hsd1.mn.comcast.net.
 [50.188.36.30])
 by smtp.googlemail.com with ESMTPSA id v195sm438837itc.8.2016.08.17.11.03.19
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 17 Aug 2016 11:03:20 -0700 (PDT)
Reply-To: linda@kateley.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
To: Chris Watson <bsdunix44@gmail.com>, linda@kateley.com
Cc: freebsd-fs@freebsd.org
From: Linda Kateley <lkateley@kateley.com>
Organization: Kateley Company
Message-ID: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
Date: Wed, 17 Aug 2016 13:03:19 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 18:03:22 -0000

I just do consulting so I don't always get to see the end of the 
project. Although we are starting to do more ongoing support so we can 
see the progress..

I have worked with some of the guys from high-availability.com for maybe 
20 years. RSF-1 is the cluster that is bundled with nexenta. Does work 
beautifully with omni/illumos. The one customer I have running it in 
prod is an isp in south america running openstack and zfs on freebsd as 
iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i 
have some contacts there. Ping me offlist.

You do risk losing data if you batch zfs send. It is very hard to run 
that real time. You have to take the snap then send the snap. Most 
people run in cron, even if it's not in cron, you would want one to 
finish before you started the next. If you lose the sending host before 
the receive is complete you won't have a full copy. With zfs though you 
will probably still have the data on the sending host, however long it 
takes to bring it back up. RSF-1 runs in the zfs stack and send the 
writes to the second system. It's kind of pricey, but actually much less 
expensive than commercial alternatives.

Anytime you run anything sync it adds latency but makes things safer.. 
There is also a cool tool I like, called zerto for vmware that sits in 
the hypervisor and sends a sync copy of a write locally and then an 
async remotely. It's pretty cool. Although I haven't run it myself, have 
a bunch of customers running it. I believe it works with proxmox too.

Most people I run into (these days) don't mind losing 5 or even 30 
minutes of data. Small shops. They usually have a copy somewhere else. 
Or the cost of 5-30 minutes isn't that great. I used work as a 
datacenter architect for sun/oracle with only fortune 500. There losing 
1 sec could put large companies out of business. I worked with banks and 
exchanges. They couldn't ever lose a single transaction. Most people 
nowadays do the replication/availability in the application though and 
don't care about underlying hardware, especially disk.


On 8/17/16 11:55 AM, Chris Watson wrote:
> Of course, if you are willing to accept some amount of data loss that 
> opens up a lot more options. :)
>
> Some may find that acceptable though. Like turning off fsync with 
> PostgreSQL to get much higher throughput. As little no as you are made 
> *very* aware of the risks.
>
> It's good to have input in this thread from one with more experience 
> with RSF-1 than the rest of us. You confirm what others have that said 
> about RSF-1, that it's stable and works well. What were you deploying 
> it on?
>
> Chris
>
> Sent from my iPhone 5
>
> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com 
> <mailto:lkateley@kateley.com>> wrote:
>
>> The question I always ask, as an architect, is "can you lose 1 minute 
>> worth of data?" If you can, then batched replication is perfect. If 
>> you can't.. then HA. Every place I have positioned it, rsf-1 has 
>> worked extremely well. If i remember right, it works at the dmu. I 
>> would suggest try it. They have been trying to have a full freebsd 
>> solution, I have several customers running it well.
>>
>> linda
>>
>>
>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen 
>>> Gotteswinter wrote:
>>>>
>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen 
>>>>> Gotteswinter wrote:
>>>>>>
>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city 
>>>>>>>> <mailto:julien@perdition.city>> wrote:
>>>>>>>>
>>>>>>>> As I said in a previous post I tested the zfs send/receive 
>>>>>>>> approach (with
>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in 
>>>>>>>> all what you
>>>>>>>> said, especially about off-site replicate and synchronous 
>>>>>>>> replication.
>>>>>>>>
>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the 
>>>>>>>> moment,
>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but 
>>>>>>>> ATM it
>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>> I must be too old school, but I don’t quite like the idea of 
>>>>>>> using an essentially unreliable transport
>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>
>>>>>>> In case something went wrong, that approach could risk 
>>>>>>> corrupting a pool. Although, frankly,
>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA 
>>>>>>> problem that caused some
>>>>>>> silent corruption.
>>>>>> try dual split import :D i mean, zpool -f import on 2 machines 
>>>>>> hooked up
>>>>>> to the same disk chassis.
>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>
>>>>> I'm still busy to test the whole setup here, including the
>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>>>> that thanks to:
>>>>>
>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>>>> BACKUP):
>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>>
>>>>> - The shared pool should not be mounted at boot, and you should ensure
>>>>> that the failover script is not executed during boot time too: this is
>>>>> to handle the case wherein both machines turn off and/or re-ignite at
>>>>> the same time. Indeed, the CARP interface can "flip" it's status 
>>>>> if both
>>>>> machines are powered on at the same time, for ex:
>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>>>> you will have a split-brain scenario
>>>>>
>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>> happen, this can be handled with a trigger file or something like that
>>>>>
>>>>> - I've still have to check if the order is OK, but I think that as 
>>>>> long
>>>>> as you shutdown the replication interface and that you adapt the
>>>>> advskew (including the config file) of the CARP interface before the
>>>>> zpool import -f in the failover script you can be relatively confident
>>>>> that nothing will be written on the iSCSI targets
>>>>>
>>>>> - A zpool scrub should be run at regular intervals
>>>>>
>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>>
>>>>> Julien
>>>>>
>>>> 100€ question without detailed looking at that script. yes from a first
>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>> powerful / featurerich. Theres a reason for, which is that they try to
>>>> cover every possible situation (which makes more than sense for this).
>>> I've never used "rsf-1" so I can't say much more about it, but I have
>>> no doubts about it's ability to handle "complex situations", where
>>> multiple nodes / networks are involved.
>>>
>>>> That script works for sure, within very limited cases imho
>>>>
>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen 
>>>>>> sooner
>>>>>> or later especially when it comes to homegrown automatism solutions.
>>>>>> even the commercial parts where much more time/work goes into such
>>>>>> solutions fail in a regular manner
>>>>>>
>>>>>>> The advantage of ZFS send/receive of datasets is, however, that 
>>>>>>> you can consider it
>>>>>>> essentially atomic. A transport corruption should not cause 
>>>>>>> trouble (apart from a failed
>>>>>>> "zfs receive") and with snapshot retention you can even roll 
>>>>>>> back. You can’t roll back
>>>>>>> zpool replications :)
>>>>>>>
>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your 
>>>>>>> zfs receive doesn’t involve a rollback
>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. 
>>>>>>> Just make sure that your replica datasets
>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Borja.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to 
>>>>>>> "freebsd-fs-unsubscribe@freebsd.org 
>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>>>>
>>>>>> _______________________________________________
>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to 
>>>>>> "freebsd-fs-unsubscribe@freebsd.org 
>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>
>> _______________________________________________
>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org 
>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"


From owner-freebsd-fs@freebsd.org  Wed Aug 17 21:14:38 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D109CBBDD43
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 17 Aug 2016 21:14:38 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com
 [IPv6:2a00:1450:400c:c09::22d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5E3DC1BFD
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 21:14:38 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22d.google.com with SMTP id q128so332750wma.1
 for <freebsd-fs@freebsd.org>; Wed, 17 Aug 2016 14:14:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=iRwMJh7xj9RZqVd/43NQql03DdsVtfxdT39eMpReWfQ=;
 b=S0mt3nR4YsAdeBuBq2W6ZdhRmCCQb9ooZ3IEx7DTMs0Fwa6GB4lAY6GpakaFPsLLxw
 ZVaJE7zx60eThmKeTjzVQwVL5E4XK5nsPm5DOrz5FaqRjYZZkoDvi/RnCfEz8cqprWgN
 HeXPdJhBmORZkyZkrOm4OxJJLNh5Mo7q2ImvepBI7DE/JzmTxsNX+F2GQfqeoEsvF/k2
 1yGSyHUEow7Cn+mkZ2Qpg5a2soLvVARIXoMBm56jbYpvY6o2RZVDHluhkrvbxG5HKoyR
 jrQ1b7OZdUXllXyi6Fb/Zqq1V6HUV2LdOGyIXgeZ41k6+HFoq6i/F2agG5GRWe0+Jg5a
 33Pw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=iRwMJh7xj9RZqVd/43NQql03DdsVtfxdT39eMpReWfQ=;
 b=RyLmWGaJCDV0qtu5VonIxVqDDCRgje6/IgolOQeieJ9kxRGTTLaksGCO0lL2DnVjM9
 DOFbWgp9t+xu7DG++CW2G499eVjBq7YIcc6BTosNYM9IecNjGooU5tRDjXwpJIcZRhJx
 uj5HbgPCj3RLoWghYBcl5zm5lD1c+aDzIDGXLb3Xn0JtW3i5J6AXzvi2/ZM51mOnGwGE
 P1Q3KaJBSK570bwtcBcRctJbr22YbMxwHjrPR4Kyf+0wGO9jyOeedzWIN9MSmHjDda0R
 OPYZkZjQsf72N1hMfhsZ1Gg9pbN7sT4N/mffv9sJTvDKdLfVEA8/2UEz0yujRPB35OdX
 RlrQ==
X-Gm-Message-State: AEkoouvkPPofkyRyWZDEJ1yJYPcRnnYA5sZ3j9s3nvJ1mHn4PNY8Fei8ONPHAu3OUdBpKA==
X-Received: by 10.194.221.134 with SMTP id qe6mr45663864wjc.165.1471468476428; 
 Wed, 17 Aug 2016 14:14:36 -0700 (PDT)
Received: from macbook-air-de-benjamin.home
 (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85])
 by smtp.gmail.com with ESMTPSA id a9sm240189wjf.16.2016.08.17.14.14.35
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Wed, 17 Aug 2016 14:14:35 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
Date: Wed, 17 Aug 2016 23:14:34 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com>
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2016 21:14:38 -0000


> On 17 Aug 2016, at 20:03, Linda Kateley <lkateley@kateley.com> wrote:
>=20
> RSF-1 runs in the zfs stack and send the writes to the second system.

Linda, do you have any link to a documentation about this RSF-1 =
operation mode ?

According to what I red about RSF-1, storage is shared between nodes, =
and RSF-1 manages the failover, we do not have 2 different storages.
(so I don't really understand how writes are sent to the "second =
system")

In addition, RSF-1 does not seem to help with long-distance replication =
to a different storage.
But I may be wrong ?
This is where ZFS send/receive helps.
Or even a nicer solution I proposed a few weeks ago : =
https://www.illumos.org/issues/7166 (but a lot of work to achieve).

Ben

> On 8/17/16 11:55 AM, Chris Watson wrote:
>> Of course, if you are willing to accept some amount of data loss that =
opens up a lot more options. :)
>>=20
>> Some may find that acceptable though. Like turning off fsync with =
PostgreSQL to get much higher throughput. As little no as you are made =
*very* aware of the risks.
>>=20
>> It's good to have input in this thread from one with more experience =
with RSF-1 than the rest of us. You confirm what others have that said =
about RSF-1, that it's stable and works well. What were you deploying it =
on?
>>=20
>> Chris
>>=20
>> Sent from my iPhone 5
>>=20
>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com =
<mailto:lkateley@kateley.com>> wrote:
>>=20
>>> The question I always ask, as an architect, is "can you lose 1 =
minute worth of data?" If you can, then batched replication is perfect. =
If you can't.. then HA. Every place I have positioned it, rsf-1 has =
worked extremely well. If i remember right, it works at the dmu. I would =
suggest try it. They have been trying to have a full freebsd solution, I =
have several customers running it well.
>>>=20
>>> linda
>>>=20
>>>=20
>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen =
Gotteswinter wrote:
>>>>>=20
>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen =
Gotteswinter wrote:
>>>>>>>=20
>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city =
<mailto:julien@perdition.city>> wrote:
>>>>>>>>>=20
>>>>>>>>> As I said in a previous post I tested the zfs send/receive =
approach (with
>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in =
all what you
>>>>>>>>> said, especially about off-site replicate and synchronous =
replication.
>>>>>>>>>=20
>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the =
moment,
>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but =
ATM it
>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>> I must be too old school, but I don=E2=80=99t quite like the =
idea of using an essentially unreliable transport
>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>=20
>>>>>>>> In case something went wrong, that approach could risk =
corrupting a pool. Although, frankly,
>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA =
problem that caused some
>>>>>>>> silent corruption.
>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines =
hooked up
>>>>>>> to the same disk chassis.
>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>=20
>>>>>> I'm still busy to test the whole setup here, including the
>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can =
prevent
>>>>>> that thanks to:
>>>>>>=20
>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>> and you can't import the pool (even with -f) for ex (filer2 is =
the
>>>>>> BACKUP):
>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>>>=20
>>>>>> - The shared pool should not be mounted at boot, and you should =
ensure
>>>>>> that the failover script is not executed during boot time too: =
this is
>>>>>> to handle the case wherein both machines turn off and/or =
re-ignite at
>>>>>> the same time. Indeed, the CARP interface can "flip" it's status =
if both
>>>>>> machines are powered on at the same time, for ex:
>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf =
and
>>>>>> you will have a split-brain scenario
>>>>>>=20
>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>>> happen, this can be handled with a trigger file or something like =
that
>>>>>>=20
>>>>>> - I've still have to check if the order is OK, but I think that =
as long
>>>>>> as you shutdown the replication interface and that you adapt the
>>>>>> advskew (including the config file) of the CARP interface before =
the
>>>>>> zpool import -f in the failover script you can be relatively =
confident
>>>>>> that nothing will be written on the iSCSI targets
>>>>>>=20
>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>=20
>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>>>=20
>>>>>> Julien
>>>>>>=20
>>>>> 100=E2=82=AC question without detailed looking at that script. yes =
from a first
>>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>>> powerful / featurerich. Theres a reason for, which is that they =
try to
>>>>> cover every possible situation (which makes more than sense for =
this).
>>>> I've never used "rsf-1" so I can't say much more about it, but I =
have
>>>> no doubts about it's ability to handle "complex situations", where
>>>> multiple nodes / networks are involved.
>>>>=20
>>>>> That script works for sure, within very limited cases imho
>>>>>=20
>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen =
sooner
>>>>>>> or later especially when it comes to homegrown automatism =
solutions.
>>>>>>> even the commercial parts where much more time/work goes into =
such
>>>>>>> solutions fail in a regular manner
>>>>>>>=20
>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that =
you can consider it
>>>>>>>> essentially atomic. A transport corruption should not cause =
trouble (apart from a failed
>>>>>>>> "zfs receive") and with snapshot retention you can even roll =
back. You can=E2=80=99t roll back
>>>>>>>> zpool replications :)
>>>>>>>>=20
>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as =
your zfs receive doesn=E2=80=99t involve a rollback
>>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by =
mistake. Just make sure that your replica datasets
>>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain.
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Cheers,
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Borja.
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing =
list
>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to =
"freebsd-fs-unsubscribe@freebsd.org =
<mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>>>>>=20
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing =
list
>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to =
"freebsd-fs-unsubscribe@freebsd.org =
<mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>=20
>>> _______________________________________________
>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org =
<mailto:freebsd-fs-unsubscribe@freebsd.org>"
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Thu Aug 18 07:32:32 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 42D7ABBE15A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 07:32:32 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C02C51A25
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 07:32:31 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id 359D145FC0FB;
 Thu, 18 Aug 2016 09:32:23 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id fV-FbX44V-04; Thu, 18 Aug 2016 09:32:16 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id E59964C4C688;
 Thu, 18 Aug 2016 09:32:16 +0200 (CEST)
Reply-To: juergen.gotteswinter@internetx.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
To: linda@kateley.com, Chris Watson <bsdunix44@gmail.com>
Cc: freebsd-fs@freebsd.org
From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Organization: InterNetX GmbH
Message-ID: <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
Date: Thu, 18 Aug 2016 09:32:14 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 07:32:32 -0000


Am 17.08.2016 um 20:03 schrieb Linda Kateley:
> I just do consulting so I don't always get to see the end of the
> project. Although we are starting to do more ongoing support so we can
> see the progress..
> 
> I have worked with some of the guys from high-availability.com for maybe
> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work
> beautifully with omni/illumos. The one customer I have running it in
> prod is an isp in south america running openstack and zfs on freebsd as
> iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i
> have some contacts there. Ping me offlist.

no offense, but it sounds a bit like marketing.

here: running nexenta ha setup since several years with one catastrophic
failure due to split brain

> 
> You do risk losing data if you batch zfs send. It is very hard to run
> that real time. 

depends on how much data changes aka delta size


You have to take the snap then send the snap. Most
> people run in cron, even if it's not in cron, you would want one to
> finish before you started the next.

thats the reason why lock files where invented, tools like zrep handle
that themself via additional zfs properties

or, if one does not trust a single layer

-- snip --
#!/bin/sh
if [ ! -f /var/run/replic ] ; then
        touch /var/run/replic
        /blah/path/zrep sync all >> /var/log/zfsrepli.log
        rm -f /var/run/replic
fi
-- snip --

something like this, simple

 If you lose the sending host before
> the receive is complete you won't have a full copy. 

if rsf fails, and you end up in split brain you loose way more. been
there, seen that.

With zfs though you
> will probably still have the data on the sending host, however long it
> takes to bring it back up. RSF-1 runs in the zfs stack and send the
> writes to the second system. It's kind of pricey, but actually much less
> expensive than commercial alternatives.
> 
> Anytime you run anything sync it adds latency but makes things safer..

not surprising, it all depends on the usecase

> There is also a cool tool I like, called zerto for vmware that sits in
> the hypervisor and sends a sync copy of a write locally and then an
> async remotely. It's pretty cool. Although I haven't run it myself, have
> a bunch of customers running it. I believe it works with proxmox too.
> 
> Most people I run into (these days) don't mind losing 5 or even 30
> minutes of data. Small shops.

you talk about minutes, what delta size are we talking here about? why
not using zrep in a loop for example

 They usually have a copy somewhere else.
> Or the cost of 5-30 minutes isn't that great. I used work as a
> datacenter architect for sun/oracle with only fortune 500. There losing
> 1 sec could put large companies out of business. I worked with banks and
> exchanges. 

again, usecase. i bet 99% on this list are not operating fortune 500
bank filers

They couldn't ever lose a single transaction. Most people
> nowadays do the replication/availability in the application though and
> don't care about underlying hardware, especially disk.
> 
> 
> On 8/17/16 11:55 AM, Chris Watson wrote:
>> Of course, if you are willing to accept some amount of data loss that
>> opens up a lot more options. :)
>>
>> Some may find that acceptable though. Like turning off fsync with
>> PostgreSQL to get much higher throughput. As little no as you are made
>> *very* aware of the risks.
>>
>> It's good to have input in this thread from one with more experience
>> with RSF-1 than the rest of us. You confirm what others have that said
>> about RSF-1, that it's stable and works well. What were you deploying
>> it on?
>>
>> Chris
>>
>> Sent from my iPhone 5
>>
>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
>> <mailto:lkateley@kateley.com>> wrote:
>>
>>> The question I always ask, as an architect, is "can you lose 1 minute
>>> worth of data?" If you can, then batched replication is perfect. If
>>> you can't.. then HA. Every place I have positioned it, rsf-1 has
>>> worked extremely well. If i remember right, it works at the dmu. I
>>> would suggest try it. They have been trying to have a full freebsd
>>> solution, I have several customers running it well.
>>>
>>> linda
>>>
>>>
>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
>>>> Gotteswinter wrote:
>>>>>
>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
>>>>>> Gotteswinter wrote:
>>>>>>>
>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city
>>>>>>>>> <mailto:julien@perdition.city>> wrote:
>>>>>>>>>
>>>>>>>>> As I said in a previous post I tested the zfs send/receive
>>>>>>>>> approach (with
>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
>>>>>>>>> all what you
>>>>>>>>> said, especially about off-site replicate and synchronous
>>>>>>>>> replication.
>>>>>>>>>
>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
>>>>>>>>> moment,
>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but
>>>>>>>>> ATM it
>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>> I must be too old school, but I don’t quite like the idea of
>>>>>>>> using an essentially unreliable transport
>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>
>>>>>>>> In case something went wrong, that approach could risk
>>>>>>>> corrupting a pool. Although, frankly,
>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA
>>>>>>>> problem that caused some
>>>>>>>> silent corruption.
>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
>>>>>>> hooked up
>>>>>>> to the same disk chassis.
>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>
>>>>>> I'm still busy to test the whole setup here, including the
>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>>>>> that thanks to:
>>>>>>
>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>>>>> BACKUP):
>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>>>
>>>>>> - The shared pool should not be mounted at boot, and you should
>>>>>> ensure
>>>>>> that the failover script is not executed during boot time too:
>>>>>> this is
>>>>>> to handle the case wherein both machines turn off and/or re-ignite at
>>>>>> the same time. Indeed, the CARP interface can "flip" it's status
>>>>>> if both
>>>>>> machines are powered on at the same time, for ex:
>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>>>>> you will have a split-brain scenario
>>>>>>
>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>>> happen, this can be handled with a trigger file or something like
>>>>>> that
>>>>>>
>>>>>> - I've still have to check if the order is OK, but I think that as
>>>>>> long
>>>>>> as you shutdown the replication interface and that you adapt the
>>>>>> advskew (including the config file) of the CARP interface before the
>>>>>> zpool import -f in the failover script you can be relatively
>>>>>> confident
>>>>>> that nothing will be written on the iSCSI targets
>>>>>>
>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>
>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>>>
>>>>>> Julien
>>>>>>
>>>>> 100€ question without detailed looking at that script. yes from a
>>>>> first
>>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>>> powerful / featurerich. Theres a reason for, which is that they try to
>>>>> cover every possible situation (which makes more than sense for this).
>>>> I've never used "rsf-1" so I can't say much more about it, but I have
>>>> no doubts about it's ability to handle "complex situations", where
>>>> multiple nodes / networks are involved.
>>>>
>>>>> That script works for sure, within very limited cases imho
>>>>>
>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen
>>>>>>> sooner
>>>>>>> or later especially when it comes to homegrown automatism solutions.
>>>>>>> even the commercial parts where much more time/work goes into such
>>>>>>> solutions fail in a regular manner
>>>>>>>
>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that
>>>>>>>> you can consider it
>>>>>>>> essentially atomic. A transport corruption should not cause
>>>>>>>> trouble (apart from a failed
>>>>>>>> "zfs receive") and with snapshot retention you can even roll
>>>>>>>> back. You can’t roll back
>>>>>>>> zpool replications :)
>>>>>>>>
>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your
>>>>>>>> zfs receive doesn’t involve a rollback
>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake.
>>>>>>>> Just make sure that your replica datasets
>>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Borja.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to
>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to
>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>
>>> _______________________________________________
>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@freebsd.org  Thu Aug 18 07:34:17 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7BE46BBE1F3
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 07:34:17 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 by mx1.freebsd.org (Postfix) with ESMTP id 2796C1B1C
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 07:34:16 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id 12A5D4C4C804;
 Thu, 18 Aug 2016 09:34:15 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id uYYVsmXkrWdz; Thu, 18 Aug 2016 09:34:12 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id 2E51A4C4C688;
 Thu, 18 Aug 2016 09:34:12 +0200 (CEST)
Reply-To: juergen.gotteswinter@internetx.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com>
To: Ben RUBSON <ben.rubson@gmail.com>, freebsd-fs@freebsd.org
From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Organization: InterNetX GmbH
Message-ID: <f95c18c6-ab64-b14c-74e0-3185b8faf04e@internetx.com>
Date: Thu, 18 Aug 2016 09:34:10 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 07:34:17 -0000


Am 17.08.2016 um 23:14 schrieb Ben RUBSON:
> 
>> On 17 Aug 2016, at 20:03, Linda Kateley <lkateley@kateley.com> wrote:
>>
>> RSF-1 runs in the zfs stack and send the writes to the second system.
> 
> Linda, do you have any link to a documentation about this RSF-1 operation mode ?
> 
> According to what I red about RSF-1, storage is shared between nodes, and RSF-1 manages the failover, we do not have 2 different storages.
> (so I don't really understand how writes are sent to the "second system")

yes this is how i know rsf-1, too. external cross cabled sas jbods
hooked up to two headnodes. it all works (or fails) if activating /
disabling of sas channels works like expected.

> 
> In addition, RSF-1 does not seem to help with long-distance replication to a different storage.

i think theres something called metro replication, but did not dig
further into that. might be part of nexenta

> But I may be wrong ?
> This is where ZFS send/receive helps.
> Or even a nicer solution I proposed a few weeks ago : https://www.illumos.org/issues/7166 (but a lot of work to achieve).
> 
> Ben
> 
>> On 8/17/16 11:55 AM, Chris Watson wrote:
>>> Of course, if you are willing to accept some amount of data loss that opens up a lot more options. :)
>>>
>>> Some may find that acceptable though. Like turning off fsync with PostgreSQL to get much higher throughput. As little no as you are made *very* aware of the risks.
>>>
>>> It's good to have input in this thread from one with more experience with RSF-1 than the rest of us. You confirm what others have that said about RSF-1, that it's stable and works well. What were you deploying it on?
>>>
>>> Chris
>>>
>>> Sent from my iPhone 5
>>>
>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com <mailto:lkateley@kateley.com>> wrote:
>>>
>>>> The question I always ask, as an architect, is "can you lose 1 minute worth of data?" If you can, then batched replication is perfect. If you can't.. then HA. Every place I have positioned it, rsf-1 has worked extremely well. If i remember right, it works at the dmu. I would suggest try it. They have been trying to have a full freebsd solution, I have several customers running it well.
>>>>
>>>> linda
>>>>
>>>>
>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter wrote:
>>>>>>
>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote:
>>>>>>>>
>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city <mailto:julien@perdition.city>> wrote:
>>>>>>>>>>
>>>>>>>>>> As I said in a previous post I tested the zfs send/receive approach (with
>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in all what you
>>>>>>>>>> said, especially about off-site replicate and synchronous replication.
>>>>>>>>>>
>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment,
>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it
>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport
>>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>>
>>>>>>>>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly,
>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some
>>>>>>>>> silent corruption.
>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines hooked up
>>>>>>>> to the same disk chassis.
>>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>>
>>>>>>> I'm still busy to test the whole setup here, including the
>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>>>>>> that thanks to:
>>>>>>>
>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>>>>>> BACKUP):
>>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>>>>
>>>>>>> - The shared pool should not be mounted at boot, and you should ensure
>>>>>>> that the failover script is not executed during boot time too: this is
>>>>>>> to handle the case wherein both machines turn off and/or re-ignite at
>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status if both
>>>>>>> machines are powered on at the same time, for ex:
>>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>>>>>> you will have a split-brain scenario
>>>>>>>
>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>>>> happen, this can be handled with a trigger file or something like that
>>>>>>>
>>>>>>> - I've still have to check if the order is OK, but I think that as long
>>>>>>> as you shutdown the replication interface and that you adapt the
>>>>>>> advskew (including the config file) of the CARP interface before the
>>>>>>> zpool import -f in the failover script you can be relatively confident
>>>>>>> that nothing will be written on the iSCSI targets
>>>>>>>
>>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>>
>>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>>>>
>>>>>>> Julien
>>>>>>>
>>>>>> 100€ question without detailed looking at that script. yes from a first
>>>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>>>> powerful / featurerich. Theres a reason for, which is that they try to
>>>>>> cover every possible situation (which makes more than sense for this).
>>>>> I've never used "rsf-1" so I can't say much more about it, but I have
>>>>> no doubts about it's ability to handle "complex situations", where
>>>>> multiple nodes / networks are involved.
>>>>>
>>>>>> That script works for sure, within very limited cases imho
>>>>>>
>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner
>>>>>>>> or later especially when it comes to homegrown automatism solutions.
>>>>>>>> even the commercial parts where much more time/work goes into such
>>>>>>>> solutions fail in a regular manner
>>>>>>>>
>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that you can consider it
>>>>>>>>> essentially atomic. A transport corruption should not cause trouble (apart from a failed
>>>>>>>>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back
>>>>>>>>> zpool replications :)
>>>>>>>>>
>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback
>>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets
>>>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Borja.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 

From owner-freebsd-fs@freebsd.org  Thu Aug 18 07:36:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AB55DBBE275
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 07:36:28 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com
 [IPv6:2a00:1450:400c:c09::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 15E3A1C24
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 07:36:28 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: by mail-wm0-x232.google.com with SMTP id f65so227964671wmi.0
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 00:36:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=NcF8jKkpWLdAdzRFFppe2SYWjpmHImGWrvZiVxsktUc=;
 b=a3ZYPJG8FFMs5RwYItLWeRGIrh/E/VEoB/k3gapwvsnpn8a/OFgD7ENh+W5M27HJeZ
 b/fTIJukHY8RRrRIr5VrH2aSuivNG0pxcWhNz030HbiTVz9SmEwfd6a8u+saJpVxfgVq
 RfvxwtzoAsIODMZbNVPqOSGNCL6mU36BN+Z6ux1Jw51BJC1qQeEpuqEvuHLUT5C/Pee/
 RUSsK4OtRHPxtswhqEmBIegBt4QInX1mAPGKvEoTN60voRbRT/kZkVAu43sJ8e0uTNHa
 NU7emyxpjhlHa3GwJvMYTin6bhlEWdcNZPn+UkKeluiaKYfl2T11qbmZZq5ZMcgZN1Ou
 IQSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=NcF8jKkpWLdAdzRFFppe2SYWjpmHImGWrvZiVxsktUc=;
 b=lWced+U/egwcJoSht+CHRN2eVawX66XKpEQE1FaLWu7XLSsDCY2A72habXx/BNpDDT
 Fp0eKr5UsMqgdvqdH8V/jkbjfqZ/Sg6ChSZEFFBdSPdWP/Gckg0Dl8QLKtjm851GYweX
 iQD4W02f9gyzsrJd5NZxvYOjFI+u7e16IQpz+U+ntqnSoXH4SU+ma+qj4EyRgvCkyYAe
 IC7Nfl0m1AiOhEYShaJI5Ig5+z5nUi3HRmfbxyKtVOQHtFuWLOVzvMfTy+gQDC9+ngEd
 7vQFrYLIv+WZqJp4w6yr5R/vv+LNCBj6/WxxbHCicMfRBE2z8uXqm6IN3z4ejmm+CId7
 YuWQ==
X-Gm-Message-State: AEkooutrj5TjLikhaF8aG7M1kcwLc/O7Z1JFFcVDvrBrcpy+R0QKly5gTk3gtddy0WORkJvjPcXE0ENMsFf/TA==
X-Received: by 10.28.139.144 with SMTP id n138mr1116835wmd.71.1471505785461;
 Thu, 18 Aug 2016 00:36:25 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.54.202 with HTTP; Thu, 18 Aug 2016 00:36:24 -0700 (PDT)
In-Reply-To: <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
From: krad <kraduk@gmail.com>
Date: Thu, 18 Aug 2016 08:36:24 +0100
Message-ID: <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
Subject: Re: HAST + ZFS + NFS + CARP
To: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Cc: linda@kateley.com, Chris Watson <bsdunix44@gmail.com>, 
 FreeBSD FS <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 07:36:28 -0000

I didnt think touch was atomic, mkdir is though

On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter <
juergen.gotteswinter@internetx.com> wrote:

>
>
> Am 17.08.2016 um 20:03 schrieb Linda Kateley:
> > I just do consulting so I don't always get to see the end of the
> > project. Although we are starting to do more ongoing support so we can
> > see the progress..
> >
> > I have worked with some of the guys from high-availability.com for mayb=
e
> > 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work
> > beautifully with omni/illumos. The one customer I have running it in
> > prod is an isp in south america running openstack and zfs on freebsd as
> > iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, =
i
> > have some contacts there. Ping me offlist.
>
> no offense, but it sounds a bit like marketing.
>
> here: running nexenta ha setup since several years with one catastrophic
> failure due to split brain
>
> >
> > You do risk losing data if you batch zfs send. It is very hard to run
> > that real time.
>
> depends on how much data changes aka delta size
>
>
> You have to take the snap then send the snap. Most
> > people run in cron, even if it's not in cron, you would want one to
> > finish before you started the next.
>
> thats the reason why lock files where invented, tools like zrep handle
> that themself via additional zfs properties
>
> or, if one does not trust a single layer
>
> -- snip --
> #!/bin/sh
> if [ ! -f /var/run/replic ] ; then
>         touch /var/run/replic
>         /blah/path/zrep sync all >> /var/log/zfsrepli.log
>         rm -f /var/run/replic
> fi
> -- snip --
>
> something like this, simple
>
>  If you lose the sending host before
> > the receive is complete you won't have a full copy.
>
> if rsf fails, and you end up in split brain you loose way more. been
> there, seen that.
>
> With zfs though you
> > will probably still have the data on the sending host, however long it
> > takes to bring it back up. RSF-1 runs in the zfs stack and send the
> > writes to the second system. It's kind of pricey, but actually much les=
s
> > expensive than commercial alternatives.
> >
> > Anytime you run anything sync it adds latency but makes things safer..
>
> not surprising, it all depends on the usecase
>
> > There is also a cool tool I like, called zerto for vmware that sits in
> > the hypervisor and sends a sync copy of a write locally and then an
> > async remotely. It's pretty cool. Although I haven't run it myself, hav=
e
> > a bunch of customers running it. I believe it works with proxmox too.
> >
> > Most people I run into (these days) don't mind losing 5 or even 30
> > minutes of data. Small shops.
>
> you talk about minutes, what delta size are we talking here about? why
> not using zrep in a loop for example
>
>  They usually have a copy somewhere else.
> > Or the cost of 5-30 minutes isn't that great. I used work as a
> > datacenter architect for sun/oracle with only fortune 500. There losing
> > 1 sec could put large companies out of business. I worked with banks an=
d
> > exchanges.
>
> again, usecase. i bet 99% on this list are not operating fortune 500
> bank filers
>
> They couldn't ever lose a single transaction. Most people
> > nowadays do the replication/availability in the application though and
> > don't care about underlying hardware, especially disk.
> >
> >
> > On 8/17/16 11:55 AM, Chris Watson wrote:
> >> Of course, if you are willing to accept some amount of data loss that
> >> opens up a lot more options. :)
> >>
> >> Some may find that acceptable though. Like turning off fsync with
> >> PostgreSQL to get much higher throughput. As little no as you are made
> >> *very* aware of the risks.
> >>
> >> It's good to have input in this thread from one with more experience
> >> with RSF-1 than the rest of us. You confirm what others have that said
> >> about RSF-1, that it's stable and works well. What were you deploying
> >> it on?
> >>
> >> Chris
> >>
> >> Sent from my iPhone 5
> >>
> >> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
> >> <mailto:lkateley@kateley.com>> wrote:
> >>
> >>> The question I always ask, as an architect, is "can you lose 1 minute
> >>> worth of data?" If you can, then batched replication is perfect. If
> >>> you can't.. then HA. Every place I have positioned it, rsf-1 has
> >>> worked extremely well. If i remember right, it works at the dmu. I
> >>> would suggest try it. They have been trying to have a full freebsd
> >>> solution, I have several customers running it well.
> >>>
> >>> linda
> >>>
> >>>
> >>> On 8/17/16 4:52 AM, Julien Cigar wrote:
> >>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
> >>>> Gotteswinter wrote:
> >>>>>
> >>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
> >>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
> >>>>>> Gotteswinter wrote:
> >>>>>>>
> >>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> >>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city
> >>>>>>>>> <mailto:julien@perdition.city>> wrote:
> >>>>>>>>>
> >>>>>>>>> As I said in a previous post I tested the zfs send/receive
> >>>>>>>>> approach (with
> >>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
> >>>>>>>>> all what you
> >>>>>>>>> said, especially about off-site replicate and synchronous
> >>>>>>>>> replication.
> >>>>>>>>>
> >>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
> >>>>>>>>> moment,
> >>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but
> >>>>>>>>> ATM it
> >>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
> >>>>>>>> I must be too old school, but I don=E2=80=99t quite like the ide=
a of
> >>>>>>>> using an essentially unreliable transport
> >>>>>>>> (Ethernet) for low-level filesystem operations.
> >>>>>>>>
> >>>>>>>> In case something went wrong, that approach could risk
> >>>>>>>> corrupting a pool. Although, frankly,
> >>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA
> >>>>>>>> problem that caused some
> >>>>>>>> silent corruption.
> >>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
> >>>>>>> hooked up
> >>>>>>> to the same disk chassis.
> >>>>>> Yes this is the first thing on the list to avoid .. :)
> >>>>>>
> >>>>>> I'm still busy to test the whole setup here, including the
> >>>>>> MASTER -> BACKUP failover script (CARP), but I think you can preve=
nt
> >>>>>> that thanks to:
> >>>>>>
> >>>>>> - As long as ctld is running on the BACKUP the disks are locked
> >>>>>> and you can't import the pool (even with -f) for ex (filer2 is the
> >>>>>> BACKUP):
> >>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
> >>>>>>
> >>>>>> - The shared pool should not be mounted at boot, and you should
> >>>>>> ensure
> >>>>>> that the failover script is not executed during boot time too:
> >>>>>> this is
> >>>>>> to handle the case wherein both machines turn off and/or re-ignite
> at
> >>>>>> the same time. Indeed, the CARP interface can "flip" it's status
> >>>>>> if both
> >>>>>> machines are powered on at the same time, for ex:
> >>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
> and
> >>>>>> you will have a split-brain scenario
> >>>>>>
> >>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
> >>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
> >>>>>> happen, this can be handled with a trigger file or something like
> >>>>>> that
> >>>>>>
> >>>>>> - I've still have to check if the order is OK, but I think that as
> >>>>>> long
> >>>>>> as you shutdown the replication interface and that you adapt the
> >>>>>> advskew (including the config file) of the CARP interface before t=
he
> >>>>>> zpool import -f in the failover script you can be relatively
> >>>>>> confident
> >>>>>> that nothing will be written on the iSCSI targets
> >>>>>>
> >>>>>> - A zpool scrub should be run at regular intervals
> >>>>>>
> >>>>>> This is my MASTER -> BACKUP CARP script ATM
> >>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
> >>>>>>
> >>>>>> Julien
> >>>>>>
> >>>>> 100=E2=82=AC question without detailed looking at that script. yes =
from a
> >>>>> first
> >>>>> view its super simple, but: why are solutions like rsf-1 such more
> >>>>> powerful / featurerich. Theres a reason for, which is that they try
> to
> >>>>> cover every possible situation (which makes more than sense for
> this).
> >>>> I've never used "rsf-1" so I can't say much more about it, but I hav=
e
> >>>> no doubts about it's ability to handle "complex situations", where
> >>>> multiple nodes / networks are involved.
> >>>>
> >>>>> That script works for sure, within very limited cases imho
> >>>>>
> >>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen
> >>>>>>> sooner
> >>>>>>> or later especially when it comes to homegrown automatism
> solutions.
> >>>>>>> even the commercial parts where much more time/work goes into suc=
h
> >>>>>>> solutions fail in a regular manner
> >>>>>>>
> >>>>>>>> The advantage of ZFS send/receive of datasets is, however, that
> >>>>>>>> you can consider it
> >>>>>>>> essentially atomic. A transport corruption should not cause
> >>>>>>>> trouble (apart from a failed
> >>>>>>>> "zfs receive") and with snapshot retention you can even roll
> >>>>>>>> back. You can=E2=80=99t roll back
> >>>>>>>> zpool replications :)
> >>>>>>>>
> >>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your
> >>>>>>>> zfs receive doesn=E2=80=99t involve a rollback
> >>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by mis=
take.
> >>>>>>>> Just make sure that your replica datasets
> >>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Borja.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing
> list
> >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>>>>>> To unsubscribe, send any mail to
> >>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
> >>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing
> list
> >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>>>>> To unsubscribe, send any mail to
> >>>>>>> "freebsd-fs-unsubscribe@freebsd.org
> >>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> >>>
> >>> _______________________________________________
> >>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
> >>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> >
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

From owner-freebsd-fs@freebsd.org  Thu Aug 18 07:38:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86807BBE2E9
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 07:38:28 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 085D11CE8
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 07:38:27 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id 70F6849FC2B9;
 Thu, 18 Aug 2016 09:38:25 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id gp8C3KOyuofX; Thu, 18 Aug 2016 09:38:18 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id DBA524C4C688;
 Thu, 18 Aug 2016 09:38:18 +0200 (CEST)
Reply-To: juergen.gotteswinter@internetx.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
To: krad <kraduk@gmail.com>
Cc: linda@kateley.com, Chris Watson <bsdunix44@gmail.com>,
 FreeBSD FS <freebsd-fs@freebsd.org>
From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Organization: InterNetX GmbH
Message-ID: <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com>
Date: Thu, 18 Aug 2016 09:38:16 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 07:38:28 -0000

uhm, dont really investigated if it is or not. add a "sync" after that?
or replace it?

but anyway, thanks for the hint. will dig into this!

Am 18.08.2016 um 09:36 schrieb krad:
> I didnt think touch was atomic, mkdir is though
> 
> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter
> <juergen.gotteswinter@internetx.com
> <mailto:juergen.gotteswinter@internetx.com>> wrote:
> 
> 
> 
>     Am 17.08.2016 um 20:03 schrieb Linda Kateley:
>     > I just do consulting so I don't always get to see the end of the
>     > project. Although we are starting to do more ongoing support so we can
>     > see the progress..
>     >
>     > I have worked with some of the guys from high-availability.com <http://high-availability.com> for maybe
>     > 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work
>     > beautifully with omni/illumos. The one customer I have running it in
>     > prod is an isp in south america running openstack and zfs on freebsd as
>     > iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i
>     > have some contacts there. Ping me offlist.
> 
>     no offense, but it sounds a bit like marketing.
> 
>     here: running nexenta ha setup since several years with one catastrophic
>     failure due to split brain
> 
>     >
>     > You do risk losing data if you batch zfs send. It is very hard to run
>     > that real time.
> 
>     depends on how much data changes aka delta size
> 
> 
>     You have to take the snap then send the snap. Most
>     > people run in cron, even if it's not in cron, you would want one to
>     > finish before you started the next.
> 
>     thats the reason why lock files where invented, tools like zrep handle
>     that themself via additional zfs properties
> 
>     or, if one does not trust a single layer
> 
>     -- snip --
>     #!/bin/sh
>     if [ ! -f /var/run/replic ] ; then
>             touch /var/run/replic
>             /blah/path/zrep sync all >> /var/log/zfsrepli.log
>             rm -f /var/run/replic
>     fi
>     -- snip --
> 
>     something like this, simple
> 
>      If you lose the sending host before
>     > the receive is complete you won't have a full copy.
> 
>     if rsf fails, and you end up in split brain you loose way more. been
>     there, seen that.
> 
>     With zfs though you
>     > will probably still have the data on the sending host, however long it
>     > takes to bring it back up. RSF-1 runs in the zfs stack and send the
>     > writes to the second system. It's kind of pricey, but actually much less
>     > expensive than commercial alternatives.
>     >
>     > Anytime you run anything sync it adds latency but makes things safer..
> 
>     not surprising, it all depends on the usecase
> 
>     > There is also a cool tool I like, called zerto for vmware that sits in
>     > the hypervisor and sends a sync copy of a write locally and then an
>     > async remotely. It's pretty cool. Although I haven't run it myself, have
>     > a bunch of customers running it. I believe it works with proxmox too.
>     >
>     > Most people I run into (these days) don't mind losing 5 or even 30
>     > minutes of data. Small shops.
> 
>     you talk about minutes, what delta size are we talking here about? why
>     not using zrep in a loop for example
> 
>      They usually have a copy somewhere else.
>     > Or the cost of 5-30 minutes isn't that great. I used work as a
>     > datacenter architect for sun/oracle with only fortune 500. There losing
>     > 1 sec could put large companies out of business. I worked with banks and
>     > exchanges.
> 
>     again, usecase. i bet 99% on this list are not operating fortune 500
>     bank filers
> 
>     They couldn't ever lose a single transaction. Most people
>     > nowadays do the replication/availability in the application though and
>     > don't care about underlying hardware, especially disk.
>     >
>     >
>     > On 8/17/16 11:55 AM, Chris Watson wrote:
>     >> Of course, if you are willing to accept some amount of data loss that
>     >> opens up a lot more options. :)
>     >>
>     >> Some may find that acceptable though. Like turning off fsync with
>     >> PostgreSQL to get much higher throughput. As little no as you are
>     made
>     >> *very* aware of the risks.
>     >>
>     >> It's good to have input in this thread from one with more experience
>     >> with RSF-1 than the rest of us. You confirm what others have that
>     said
>     >> about RSF-1, that it's stable and works well. What were you deploying
>     >> it on?
>     >>
>     >> Chris
>     >>
>     >> Sent from my iPhone 5
>     >>
>     >> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
>     <mailto:lkateley@kateley.com>
>     >> <mailto:lkateley@kateley.com <mailto:lkateley@kateley.com>>> wrote:
>     >>
>     >>> The question I always ask, as an architect, is "can you lose 1
>     minute
>     >>> worth of data?" If you can, then batched replication is perfect. If
>     >>> you can't.. then HA. Every place I have positioned it, rsf-1 has
>     >>> worked extremely well. If i remember right, it works at the dmu. I
>     >>> would suggest try it. They have been trying to have a full freebsd
>     >>> solution, I have several customers running it well.
>     >>>
>     >>> linda
>     >>>
>     >>>
>     >>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>     >>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
>     >>>> Gotteswinter wrote:
>     >>>>>
>     >>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>     >>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
>     >>>>>> Gotteswinter wrote:
>     >>>>>>>
>     >>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>     >>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city
>     >>>>>>>>> <mailto:julien@perdition.city
>     <mailto:julien@perdition.city>>> wrote:
>     >>>>>>>>>
>     >>>>>>>>> As I said in a previous post I tested the zfs send/receive
>     >>>>>>>>> approach (with
>     >>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
>     >>>>>>>>> all what you
>     >>>>>>>>> said, especially about off-site replicate and synchronous
>     >>>>>>>>> replication.
>     >>>>>>>>>
>     >>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
>     >>>>>>>>> moment,
>     >>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but
>     >>>>>>>>> ATM it
>     >>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>     >>>>>>>> I must be too old school, but I don’t quite like the idea of
>     >>>>>>>> using an essentially unreliable transport
>     >>>>>>>> (Ethernet) for low-level filesystem operations.
>     >>>>>>>>
>     >>>>>>>> In case something went wrong, that approach could risk
>     >>>>>>>> corrupting a pool. Although, frankly,
>     >>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA
>     >>>>>>>> problem that caused some
>     >>>>>>>> silent corruption.
>     >>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
>     >>>>>>> hooked up
>     >>>>>>> to the same disk chassis.
>     >>>>>> Yes this is the first thing on the list to avoid .. :)
>     >>>>>>
>     >>>>>> I'm still busy to test the whole setup here, including the
>     >>>>>> MASTER -> BACKUP failover script (CARP), but I think you can
>     prevent
>     >>>>>> that thanks to:
>     >>>>>>
>     >>>>>> - As long as ctld is running on the BACKUP the disks are locked
>     >>>>>> and you can't import the pool (even with -f) for ex (filer2
>     is the
>     >>>>>> BACKUP):
>     >>>>>>
>     https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>     <https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f>
>     >>>>>>
>     >>>>>> - The shared pool should not be mounted at boot, and you should
>     >>>>>> ensure
>     >>>>>> that the failover script is not executed during boot time too:
>     >>>>>> this is
>     >>>>>> to handle the case wherein both machines turn off and/or
>     re-ignite at
>     >>>>>> the same time. Indeed, the CARP interface can "flip" it's status
>     >>>>>> if both
>     >>>>>> machines are powered on at the same time, for ex:
>     >>>>>>
>     https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
>     <https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf> and
>     >>>>>> you will have a split-brain scenario
>     >>>>>>
>     >>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>     >>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>     >>>>>> happen, this can be handled with a trigger file or something like
>     >>>>>> that
>     >>>>>>
>     >>>>>> - I've still have to check if the order is OK, but I think
>     that as
>     >>>>>> long
>     >>>>>> as you shutdown the replication interface and that you adapt the
>     >>>>>> advskew (including the config file) of the CARP interface
>     before the
>     >>>>>> zpool import -f in the failover script you can be relatively
>     >>>>>> confident
>     >>>>>> that nothing will be written on the iSCSI targets
>     >>>>>>
>     >>>>>> - A zpool scrub should be run at regular intervals
>     >>>>>>
>     >>>>>> This is my MASTER -> BACKUP CARP script ATM
>     >>>>>>
>     https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>     <https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7>
>     >>>>>>
>     >>>>>> Julien
>     >>>>>>
>     >>>>> 100€ question without detailed looking at that script. yes from a
>     >>>>> first
>     >>>>> view its super simple, but: why are solutions like rsf-1 such more
>     >>>>> powerful / featurerich. Theres a reason for, which is that
>     they try to
>     >>>>> cover every possible situation (which makes more than sense
>     for this).
>     >>>> I've never used "rsf-1" so I can't say much more about it, but
>     I have
>     >>>> no doubts about it's ability to handle "complex situations", where
>     >>>> multiple nodes / networks are involved.
>     >>>>
>     >>>>> That script works for sure, within very limited cases imho
>     >>>>>
>     >>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen
>     >>>>>>> sooner
>     >>>>>>> or later especially when it comes to homegrown automatism
>     solutions.
>     >>>>>>> even the commercial parts where much more time/work goes
>     into such
>     >>>>>>> solutions fail in a regular manner
>     >>>>>>>
>     >>>>>>>> The advantage of ZFS send/receive of datasets is, however, that
>     >>>>>>>> you can consider it
>     >>>>>>>> essentially atomic. A transport corruption should not cause
>     >>>>>>>> trouble (apart from a failed
>     >>>>>>>> "zfs receive") and with snapshot retention you can even roll
>     >>>>>>>> back. You can’t roll back
>     >>>>>>>> zpool replications :)
>     >>>>>>>>
>     >>>>>>>> ZFS receive does a lot of sanity checks as well. As long as
>     your
>     >>>>>>>> zfs receive doesn’t involve a rollback
>     >>>>>>>> to the latest snapshot, it won’t destroy anything by mistake.
>     >>>>>>>> Just make sure that your replica datasets
>     >>>>>>>> aren’t mounted and zfs receive won’t complain.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> Cheers,
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> Borja.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> _______________________________________________
>     >>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>     <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>     mailing list
>     >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>     >>>>>>>> To unsubscribe, send any mail to
>     >>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>
>     >>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>     >>>>>>>>
>     >>>>>>> _______________________________________________
>     >>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>     <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>     mailing list
>     >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>     >>>>>>> To unsubscribe, send any mail to
>     >>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>
>     >>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>     >>>
>     >>> _______________________________________________
>     >>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>     <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>     mailing list
>     >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>     >>> To unsubscribe, send any mail to
>     "freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>
>     >>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>     >
>     > _______________________________________________
>     > freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>     > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>     > To unsubscribe, send any mail to
>     "freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>     _______________________________________________
>     freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>     https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>     To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> 
> 

From owner-freebsd-fs@freebsd.org  Thu Aug 18 07:40:55 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3A361BBE374
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 07:40:55 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com
 [IPv6:2a00:1450:400c:c09::234])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B61BC1DF8
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 07:40:54 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x234.google.com with SMTP id f65so228085641wmi.0
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 00:40:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=LZ9hgXyuFbwJYTi8xbl0jQXPmRYJBO/VHIlzpHKBLTo=;
 b=I2uh/TPpePQ+UKWE9AHHeLMgwhk+porBbOPM0IuUc4WdV71l9Sbqr1giaEoa93GY+C
 Wt8GU9/5ldAPMf3cxXV4PysATfAjcRStENhjFrDYyTRmcUoHcmk+bZSW/Jv4WN7w4XQx
 PpQUZTMdIpK5IjufUQsmix8cBvuXtyDLOXayGQR8gGSIKW0p4vAZzfSfvpJNXx6afD9y
 r1NIECKuXQXW1Zyu0SIyIOJUpbkoF1MHZjZ7ReJvhzX3Inkh7D+g0W0WR6iSHXSzT4Ut
 FJfw4u1xhPzF7xVXzS9nF7CoCfe5VfjMonrS6HtWsEgtC3B2WmMuiMWBR+4Oe56B8ilg
 gXgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=LZ9hgXyuFbwJYTi8xbl0jQXPmRYJBO/VHIlzpHKBLTo=;
 b=I8fBDbKxurWDOSwyfMttenEGh1R4djiu7o+db1zOu4S3chwPqmnYNlEqXOcMuEOojZ
 jq+luo6uIOloby34PnDLK0blyvHaDWAKCk3kGUzVGNyQJWfRe7L4a566OkWbqSx1QAFu
 xJGy7EMGcL3HNZXwblFDePlCiWVMuAp2j3NGG15EH5Q4pUx3K0tbJA29CPb7XskIK3EQ
 ZeOWGTZGHFLN+P4EUM+25ChpgdgY0ObPLg8unssWb/rZz57KfAot8TtSVit8KPs3oLat
 P1UG+Sx8ncqBbKfmKXYCMPAktPzcEJINYLYKtbf5/uMaihne3prwG9y4KT7oFhfQ7EIi
 O4kA==
X-Gm-Message-State: AEkooutERVixPeZVl47pS/1GBNfmZiYZVfRnINe9iZZ8ADMXWrbJ7YzJR/ASuP8WjVqgxA==
X-Received: by 10.194.77.97 with SMTP id r1mr817033wjw.83.1471506052538;
 Thu, 18 Aug 2016 00:40:52 -0700 (PDT)
Received: from macbook-air-de-benjamin-1.home
 (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85])
 by smtp.gmail.com with ESMTPSA id i80sm1276096wmf.11.2016.08.18.00.40.51
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Thu, 18 Aug 2016 00:40:51 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com>
Date: Thu, 18 Aug 2016 09:40:50 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <B668A4BF-FC28-4210-809A-38D23C214A3B@gmail.com>
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
 <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com>
To: FreeBSD FS <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 07:40:55 -0000

Yep this is better :

if mkdir <lockdir>
then
	do_your_job
	rm -rf <lockdir>
fi


> On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter =
<juergen.gotteswinter@internetx.com> wrote:
>=20
> uhm, dont really investigated if it is or not. add a "sync" after =
that?
> or replace it?
>=20
> but anyway, thanks for the hint. will dig into this!
>=20
> Am 18.08.2016 um 09:36 schrieb krad:
>> I didnt think touch was atomic, mkdir is though
>>=20
>> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter
>> <juergen.gotteswinter@internetx.com
>> <mailto:juergen.gotteswinter@internetx.com>> wrote:
>>=20
>>=20
>>=20
>>    Am 17.08.2016 um 20:03 schrieb Linda Kateley:
>>> I just do consulting so I don't always get to see the end of the
>>> project. Although we are starting to do more ongoing support so we =
can
>>> see the progress..
>>>=20
>>> I have worked with some of the guys from high-availability.com =
<http://high-availability.com> for maybe
>>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does =
work
>>> beautifully with omni/illumos. The one customer I have running it in
>>> prod is an isp in south america running openstack and zfs on freebsd =
as
>>> iscsi. Big boxes, 90+ drives per frame.  If someone would like try =
it, i
>>> have some contacts there. Ping me offlist.
>>=20
>>    no offense, but it sounds a bit like marketing.
>>=20
>>    here: running nexenta ha setup since several years with one =
catastrophic
>>    failure due to split brain
>>=20
>>>=20
>>> You do risk losing data if you batch zfs send. It is very hard to =
run
>>> that real time.
>>=20
>>    depends on how much data changes aka delta size
>>=20
>>=20
>>    You have to take the snap then send the snap. Most
>>> people run in cron, even if it's not in cron, you would want one to
>>> finish before you started the next.
>>=20
>>    thats the reason why lock files where invented, tools like zrep =
handle
>>    that themself via additional zfs properties
>>=20
>>    or, if one does not trust a single layer
>>=20
>>    -- snip --
>>    #!/bin/sh
>>    if [ ! -f /var/run/replic ] ; then
>>            touch /var/run/replic
>>            /blah/path/zrep sync all >> /var/log/zfsrepli.log
>>            rm -f /var/run/replic
>>    fi
>>    -- snip --
>>=20
>>    something like this, simple
>>=20
>>     If you lose the sending host before
>>> the receive is complete you won't have a full copy.
>>=20
>>    if rsf fails, and you end up in split brain you loose way more. =
been
>>    there, seen that.
>>=20
>>    With zfs though you
>>> will probably still have the data on the sending host, however long =
it
>>> takes to bring it back up. RSF-1 runs in the zfs stack and send the
>>> writes to the second system. It's kind of pricey, but actually much =
less
>>> expensive than commercial alternatives.
>>>=20
>>> Anytime you run anything sync it adds latency but makes things =
safer..
>>=20
>>    not surprising, it all depends on the usecase
>>=20
>>> There is also a cool tool I like, called zerto for vmware that sits =
in
>>> the hypervisor and sends a sync copy of a write locally and then an
>>> async remotely. It's pretty cool. Although I haven't run it myself, =
have
>>> a bunch of customers running it. I believe it works with proxmox =
too.
>>>=20
>>> Most people I run into (these days) don't mind losing 5 or even 30
>>> minutes of data. Small shops.
>>=20
>>    you talk about minutes, what delta size are we talking here about? =
why
>>    not using zrep in a loop for example
>>=20
>>     They usually have a copy somewhere else.
>>> Or the cost of 5-30 minutes isn't that great. I used work as a
>>> datacenter architect for sun/oracle with only fortune 500. There =
losing
>>> 1 sec could put large companies out of business. I worked with banks =
and
>>> exchanges.
>>=20
>>    again, usecase. i bet 99% on this list are not operating fortune =
500
>>    bank filers
>>=20
>>    They couldn't ever lose a single transaction. Most people
>>> nowadays do the replication/availability in the application though =
and
>>> don't care about underlying hardware, especially disk.
>>>=20
>>>=20
>>> On 8/17/16 11:55 AM, Chris Watson wrote:
>>>> Of course, if you are willing to accept some amount of data loss =
that
>>>> opens up a lot more options. :)
>>>>=20
>>>> Some may find that acceptable though. Like turning off fsync with
>>>> PostgreSQL to get much higher throughput. As little no as you are
>>    made
>>>> *very* aware of the risks.
>>>>=20
>>>> It's good to have input in this thread from one with more =
experience
>>>> with RSF-1 than the rest of us. You confirm what others have that
>>    said
>>>> about RSF-1, that it's stable and works well. What were you =
deploying
>>>> it on?
>>>>=20
>>>> Chris
>>>>=20
>>>> Sent from my iPhone 5
>>>>=20
>>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
>>    <mailto:lkateley@kateley.com>
>>>> <mailto:lkateley@kateley.com <mailto:lkateley@kateley.com>>> wrote:
>>>>=20
>>>>> The question I always ask, as an architect, is "can you lose 1
>>    minute
>>>>> worth of data?" If you can, then batched replication is perfect. =
If
>>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has
>>>>> worked extremely well. If i remember right, it works at the dmu. I
>>>>> would suggest try it. They have been trying to have a full freebsd
>>>>> solution, I have several customers running it well.
>>>>>=20
>>>>> linda
>>>>>=20
>>>>>=20
>>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
>>>>>> Gotteswinter wrote:
>>>>>>>=20
>>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
>>>>>>>> Gotteswinter wrote:
>>>>>>>>>=20
>>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar =
<julien@perdition.city
>>>>>>>>>>> <mailto:julien@perdition.city
>>    <mailto:julien@perdition.city>>> wrote:
>>>>>>>>>>>=20
>>>>>>>>>>> As I said in a previous post I tested the zfs send/receive
>>>>>>>>>>> approach (with
>>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
>>>>>>>>>>> all what you
>>>>>>>>>>> said, especially about off-site replicate and synchronous
>>>>>>>>>>> replication.
>>>>>>>>>>>=20
>>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at =
the
>>>>>>>>>>> moment,
>>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, =
but
>>>>>>>>>>> ATM it
>>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>>>> I must be too old school, but I don=E2=80=99t quite like the =
idea of
>>>>>>>>>> using an essentially unreliable transport
>>>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>>>=20
>>>>>>>>>> In case something went wrong, that approach could risk
>>>>>>>>>> corrupting a pool. Although, frankly,
>>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS =
HBA
>>>>>>>>>> problem that caused some
>>>>>>>>>> silent corruption.
>>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
>>>>>>>>> hooked up
>>>>>>>>> to the same disk chassis.
>>>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>>>=20
>>>>>>>> I'm still busy to test the whole setup here, including the
>>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can
>>    prevent
>>>>>>>> that thanks to:
>>>>>>>>=20
>>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>>>> and you can't import the pool (even with -f) for ex (filer2
>>    is the
>>>>>>>> BACKUP):
>>>>>>>>=20
>>    https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>    =
<https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f>
>>>>>>>>=20
>>>>>>>> - The shared pool should not be mounted at boot, and you should
>>>>>>>> ensure
>>>>>>>> that the failover script is not executed during boot time too:
>>>>>>>> this is
>>>>>>>> to handle the case wherein both machines turn off and/or
>>    re-ignite at
>>>>>>>> the same time. Indeed, the CARP interface can "flip" it's =
status
>>>>>>>> if both
>>>>>>>> machines are powered on at the same time, for ex:
>>>>>>>>=20
>>    https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
>>    =
<https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf> and
>>>>>>>> you will have a split-brain scenario
>>>>>>>>=20
>>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should =
not
>>>>>>>> happen, this can be handled with a trigger file or something =
like
>>>>>>>> that
>>>>>>>>=20
>>>>>>>> - I've still have to check if the order is OK, but I think
>>    that as
>>>>>>>> long
>>>>>>>> as you shutdown the replication interface and that you adapt =
the
>>>>>>>> advskew (including the config file) of the CARP interface
>>    before the
>>>>>>>> zpool import -f in the failover script you can be relatively
>>>>>>>> confident
>>>>>>>> that nothing will be written on the iSCSI targets
>>>>>>>>=20
>>>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>>>=20
>>>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>>>>=20
>>    https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>    =
<https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7>
>>>>>>>>=20
>>>>>>>> Julien
>>>>>>>>=20
>>>>>>> 100=E2=82=AC question without detailed looking at that script. =
yes from a
>>>>>>> first
>>>>>>> view its super simple, but: why are solutions like rsf-1 such =
more
>>>>>>> powerful / featurerich. Theres a reason for, which is that
>>    they try to
>>>>>>> cover every possible situation (which makes more than sense
>>    for this).
>>>>>> I've never used "rsf-1" so I can't say much more about it, but
>>    I have
>>>>>> no doubts about it's ability to handle "complex situations", =
where
>>>>>> multiple nodes / networks are involved.
>>>>>>=20
>>>>>>> That script works for sure, within very limited cases imho
>>>>>>>=20
>>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to =
happen
>>>>>>>>> sooner
>>>>>>>>> or later especially when it comes to homegrown automatism
>>    solutions.
>>>>>>>>> even the commercial parts where much more time/work goes
>>    into such
>>>>>>>>> solutions fail in a regular manner
>>>>>>>>>=20
>>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, =
that
>>>>>>>>>> you can consider it
>>>>>>>>>> essentially atomic. A transport corruption should not cause
>>>>>>>>>> trouble (apart from a failed
>>>>>>>>>> "zfs receive") and with snapshot retention you can even roll
>>>>>>>>>> back. You can=E2=80=99t roll back
>>>>>>>>>> zpool replications :)
>>>>>>>>>>=20
>>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as
>>    your
>>>>>>>>>> zfs receive doesn=E2=80=99t involve a rollback
>>>>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by =
mistake.
>>>>>>>>>> Just make sure that your replica datasets
>>>>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t =
complain.
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> Cheers,
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> Borja.
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> _______________________________________________
>>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>    mailing list
>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>>>>>>>=20
>>>>>>>>> _______________________________________________
>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>    mailing list
>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>>=20
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>    mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>> To unsubscribe, send any mail to
>>    "freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>=20
>>> _______________________________________________
>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>> To unsubscribe, send any mail to
>>    "freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>    _______________________________________________
>>    freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing =
list
>>    https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>    To unsubscribe, send any mail to =
"freebsd-fs-unsubscribe@freebsd.org
>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>=20
>>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Thu Aug 18 08:02:55 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7529DBBEE45
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 08:02:55 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 by mx1.freebsd.org (Postfix) with ESMTP id 8328D11BB
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 08:02:54 +0000 (UTC)
 (envelope-from juergen.gotteswinter@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id 5815E45FC0FB;
 Thu, 18 Aug 2016 10:02:52 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id MT7WSVCplP01; Thu, 18 Aug 2016 10:02:47 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id 8736B4C4C698;
 Thu, 18 Aug 2016 10:02:47 +0200 (CEST)
Reply-To: juergen.gotteswinter@internetx.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
 <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com>
 <B668A4BF-FC28-4210-809A-38D23C214A3B@gmail.com>
To: Ben RUBSON <ben.rubson@gmail.com>, FreeBSD FS <freebsd-fs@freebsd.org>
From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Organization: InterNetX GmbH
Message-ID: <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com>
Date: Thu, 18 Aug 2016 10:02:45 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <B668A4BF-FC28-4210-809A-38D23C214A3B@gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 08:02:55 -0000

new day, new things learned :)

thanks!

but like said, zrep does its on locking in zfs properties. so even this
is fine

	while true; do zrep sync all; done


see

http://www.bolthole.com/solaris/zrep/

the properties look like this

tank/vmail  redundant_metadata    all                    default
tank/vmail  zrep:savecount        5                      local
tank/vmail  zrep:lock-time        20160620101703         local
tank/vmail  zrep:master           yes                    local
tank/vmail  zrep:src-fs           tank/vmail             local
tank/vmail  zrep:dest-host        stor1                local
tank/vmail  zrep:src-host         stor2                local
tank/vmail  zrep:dest-fs          tank/vmail             local
tank/vmail  zrep:lock-pid         10887                  local


it also takes care of the replication partner, the replicated datasets
are read only until you tell zrep "go go go, become master"

Simple usage summary:
zrep (init|-i) ZFS/fs remotehost remoteZFSpool/fs
zrep (sync|-S) [-q seconds] ZFS/fs
zrep (sync|-S) [-q seconds] all
zrep (sync|-S) ZFS/fs@snapshot    -- temporary retroactive sync
zrep (status|-s) [-v] [(-a|ZFS/fs)]
zrep refresh ZFS/fs               -- pull version of sync
zrep (list|-l) [-Lv]
zrep (expire|-e) [-L] (ZFS/fs ...)|(all)|()
zrep (changeconfig|-C) [-f] ZFS/fs remotehost remoteZFSpool/fs
zrep (changeconfig|-C) [-f] [-d] ZFS/fs srchost srcZFSpool/fs
zrep failover [-L] ZFS/fs
zrep takeover [-L] ZFS/fs


zrep failover pool/ds -> master sets pool read only, connects to slave,
sets pool on slave rw

should be easy to combine with carp/devd, but this is the land of vodoo
automagic again which i dont trust that much.


Am 18.08.2016 um 09:40 schrieb Ben RUBSON:
> Yep this is better :
> 
> if mkdir <lockdir>
> then
> 	do_your_job
> 	rm -rf <lockdir>
> fi
> 
> 
> 
>> On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> wrote:
>>
>> uhm, dont really investigated if it is or not. add a "sync" after that?
>> or replace it?
>>
>> but anyway, thanks for the hint. will dig into this!
>>
>> Am 18.08.2016 um 09:36 schrieb krad:
>>> I didnt think touch was atomic, mkdir is though
>>>
>>> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter
>>> <juergen.gotteswinter@internetx.com
>>> <mailto:juergen.gotteswinter@internetx.com>> wrote:
>>>
>>>
>>>
>>>    Am 17.08.2016 um 20:03 schrieb Linda Kateley:
>>>> I just do consulting so I don't always get to see the end of the
>>>> project. Although we are starting to do more ongoing support so we can
>>>> see the progress..
>>>>
>>>> I have worked with some of the guys from high-availability.com <http://high-availability.com> for maybe
>>>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work
>>>> beautifully with omni/illumos. The one customer I have running it in
>>>> prod is an isp in south america running openstack and zfs on freebsd as
>>>> iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i
>>>> have some contacts there. Ping me offlist.
>>>
>>>    no offense, but it sounds a bit like marketing.
>>>
>>>    here: running nexenta ha setup since several years with one catastrophic
>>>    failure due to split brain
>>>
>>>>
>>>> You do risk losing data if you batch zfs send. It is very hard to run
>>>> that real time.
>>>
>>>    depends on how much data changes aka delta size
>>>
>>>
>>>    You have to take the snap then send the snap. Most
>>>> people run in cron, even if it's not in cron, you would want one to
>>>> finish before you started the next.
>>>
>>>    thats the reason why lock files where invented, tools like zrep handle
>>>    that themself via additional zfs properties
>>>
>>>    or, if one does not trust a single layer
>>>
>>>    -- snip --
>>>    #!/bin/sh
>>>    if [ ! -f /var/run/replic ] ; then
>>>            touch /var/run/replic
>>>            /blah/path/zrep sync all >> /var/log/zfsrepli.log
>>>            rm -f /var/run/replic
>>>    fi
>>>    -- snip --
>>>
>>>    something like this, simple
>>>
>>>     If you lose the sending host before
>>>> the receive is complete you won't have a full copy.
>>>
>>>    if rsf fails, and you end up in split brain you loose way more. been
>>>    there, seen that.
>>>
>>>    With zfs though you
>>>> will probably still have the data on the sending host, however long it
>>>> takes to bring it back up. RSF-1 runs in the zfs stack and send the
>>>> writes to the second system. It's kind of pricey, but actually much less
>>>> expensive than commercial alternatives.
>>>>
>>>> Anytime you run anything sync it adds latency but makes things safer..
>>>
>>>    not surprising, it all depends on the usecase
>>>
>>>> There is also a cool tool I like, called zerto for vmware that sits in
>>>> the hypervisor and sends a sync copy of a write locally and then an
>>>> async remotely. It's pretty cool. Although I haven't run it myself, have
>>>> a bunch of customers running it. I believe it works with proxmox too.
>>>>
>>>> Most people I run into (these days) don't mind losing 5 or even 30
>>>> minutes of data. Small shops.
>>>
>>>    you talk about minutes, what delta size are we talking here about? why
>>>    not using zrep in a loop for example
>>>
>>>     They usually have a copy somewhere else.
>>>> Or the cost of 5-30 minutes isn't that great. I used work as a
>>>> datacenter architect for sun/oracle with only fortune 500. There losing
>>>> 1 sec could put large companies out of business. I worked with banks and
>>>> exchanges.
>>>
>>>    again, usecase. i bet 99% on this list are not operating fortune 500
>>>    bank filers
>>>
>>>    They couldn't ever lose a single transaction. Most people
>>>> nowadays do the replication/availability in the application though and
>>>> don't care about underlying hardware, especially disk.
>>>>
>>>>
>>>> On 8/17/16 11:55 AM, Chris Watson wrote:
>>>>> Of course, if you are willing to accept some amount of data loss that
>>>>> opens up a lot more options. :)
>>>>>
>>>>> Some may find that acceptable though. Like turning off fsync with
>>>>> PostgreSQL to get much higher throughput. As little no as you are
>>>    made
>>>>> *very* aware of the risks.
>>>>>
>>>>> It's good to have input in this thread from one with more experience
>>>>> with RSF-1 than the rest of us. You confirm what others have that
>>>    said
>>>>> about RSF-1, that it's stable and works well. What were you deploying
>>>>> it on?
>>>>>
>>>>> Chris
>>>>>
>>>>> Sent from my iPhone 5
>>>>>
>>>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
>>>    <mailto:lkateley@kateley.com>
>>>>> <mailto:lkateley@kateley.com <mailto:lkateley@kateley.com>>> wrote:
>>>>>
>>>>>> The question I always ask, as an architect, is "can you lose 1
>>>    minute
>>>>>> worth of data?" If you can, then batched replication is perfect. If
>>>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has
>>>>>> worked extremely well. If i remember right, it works at the dmu. I
>>>>>> would suggest try it. They have been trying to have a full freebsd
>>>>>> solution, I have several customers running it well.
>>>>>>
>>>>>> linda
>>>>>>
>>>>>>
>>>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
>>>>>>> Gotteswinter wrote:
>>>>>>>>
>>>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
>>>>>>>>> Gotteswinter wrote:
>>>>>>>>>>
>>>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city
>>>>>>>>>>>> <mailto:julien@perdition.city
>>>    <mailto:julien@perdition.city>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> As I said in a previous post I tested the zfs send/receive
>>>>>>>>>>>> approach (with
>>>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
>>>>>>>>>>>> all what you
>>>>>>>>>>>> said, especially about off-site replicate and synchronous
>>>>>>>>>>>> replication.
>>>>>>>>>>>>
>>>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
>>>>>>>>>>>> moment,
>>>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but
>>>>>>>>>>>> ATM it
>>>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>>>>> I must be too old school, but I don’t quite like the idea of
>>>>>>>>>>> using an essentially unreliable transport
>>>>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>>>>
>>>>>>>>>>> In case something went wrong, that approach could risk
>>>>>>>>>>> corrupting a pool. Although, frankly,
>>>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA
>>>>>>>>>>> problem that caused some
>>>>>>>>>>> silent corruption.
>>>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
>>>>>>>>>> hooked up
>>>>>>>>>> to the same disk chassis.
>>>>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>>>>
>>>>>>>>> I'm still busy to test the whole setup here, including the
>>>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can
>>>    prevent
>>>>>>>>> that thanks to:
>>>>>>>>>
>>>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>>>>> and you can't import the pool (even with -f) for ex (filer2
>>>    is the
>>>>>>>>> BACKUP):
>>>>>>>>>
>>>    https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>    <https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f>
>>>>>>>>>
>>>>>>>>> - The shared pool should not be mounted at boot, and you should
>>>>>>>>> ensure
>>>>>>>>> that the failover script is not executed during boot time too:
>>>>>>>>> this is
>>>>>>>>> to handle the case wherein both machines turn off and/or
>>>    re-ignite at
>>>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status
>>>>>>>>> if both
>>>>>>>>> machines are powered on at the same time, for ex:
>>>>>>>>>
>>>    https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
>>>    <https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf> and
>>>>>>>>> you will have a split-brain scenario
>>>>>>>>>
>>>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>>>>>> happen, this can be handled with a trigger file or something like
>>>>>>>>> that
>>>>>>>>>
>>>>>>>>> - I've still have to check if the order is OK, but I think
>>>    that as
>>>>>>>>> long
>>>>>>>>> as you shutdown the replication interface and that you adapt the
>>>>>>>>> advskew (including the config file) of the CARP interface
>>>    before the
>>>>>>>>> zpool import -f in the failover script you can be relatively
>>>>>>>>> confident
>>>>>>>>> that nothing will be written on the iSCSI targets
>>>>>>>>>
>>>>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>>>>
>>>>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>>>>>
>>>    https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>    <https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7>
>>>>>>>>>
>>>>>>>>> Julien
>>>>>>>>>
>>>>>>>> 100€ question without detailed looking at that script. yes from a
>>>>>>>> first
>>>>>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>>>>>> powerful / featurerich. Theres a reason for, which is that
>>>    they try to
>>>>>>>> cover every possible situation (which makes more than sense
>>>    for this).
>>>>>>> I've never used "rsf-1" so I can't say much more about it, but
>>>    I have
>>>>>>> no doubts about it's ability to handle "complex situations", where
>>>>>>> multiple nodes / networks are involved.
>>>>>>>
>>>>>>>> That script works for sure, within very limited cases imho
>>>>>>>>
>>>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen
>>>>>>>>>> sooner
>>>>>>>>>> or later especially when it comes to homegrown automatism
>>>    solutions.
>>>>>>>>>> even the commercial parts where much more time/work goes
>>>    into such
>>>>>>>>>> solutions fail in a regular manner
>>>>>>>>>>
>>>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that
>>>>>>>>>>> you can consider it
>>>>>>>>>>> essentially atomic. A transport corruption should not cause
>>>>>>>>>>> trouble (apart from a failed
>>>>>>>>>>> "zfs receive") and with snapshot retention you can even roll
>>>>>>>>>>> back. You can’t roll back
>>>>>>>>>>> zpool replications :)
>>>>>>>>>>>
>>>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as
>>>    your
>>>>>>>>>>> zfs receive doesn’t involve a rollback
>>>>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake.
>>>>>>>>>>> Just make sure that your replica datasets
>>>>>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Borja.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>>    mailing list
>>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>>    mailing list
>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>>>
>>>>>> _______________________________________________
>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>>    mailing list
>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>> To unsubscribe, send any mail to
>>>    "freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>> To unsubscribe, send any mail to
>>>    "freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>    _______________________________________________
>>>    freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>    https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>    To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
>>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>
>>>
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 

From owner-freebsd-fs@freebsd.org  Thu Aug 18 10:38:27 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DE733BBEE1F
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 10:38:27 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com
 [IPv6:2a00:1450:400c:c09::22d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3DFA31F95
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 10:38:27 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: by mail-wm0-x22d.google.com with SMTP id o80so25465911wme.1
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 03:38:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=+3AjPVPLnOhyawDcIISBb8aMrf1BvLFfNZiEPdPdpP4=;
 b=CxbHxOaE7ZeV3bXOb3qex96WD7LPYUHbOiYEdw/hOmdCdTReviRLvHNv7v1yBg1EPg
 72F+4KhaIhYhBWE84wxJc/gNiodwcEF54cK9558GQnUf4kRVzBH/WhhCz/QGtcmBlpEl
 +w0fAZjcTQnjqMQ5VUteKB+BExuEdq4Qt024qPZd33GAlNYPn08exo18rC4XzNYAv0N2
 A+3rm0NQZLbgK1AkPpXxPnzHTFIpVwtjqeok59uQ9DXOkPXpPt1oKPqTRvblUwV9GpnB
 afSwE0/6m86Dy4Yhuy+7fYLn5kFTnP4NoIAL4/wJm/UgTeU9RL4FbLKpCnZQsGWzpgZe
 scuw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=+3AjPVPLnOhyawDcIISBb8aMrf1BvLFfNZiEPdPdpP4=;
 b=BvNC4q9jAAy4AmXHZme0RPMnZD2PqszmnQuU95J6S9/c51948rCie6K+s4cMleg/bU
 3GNRKVVq7y5kcDtpsEBMKRW0iqDpU2de7FV0ZgzNOJ3uOtgBu+TilHjLJJ5IHZBoW3kn
 K7pJd7F9/99D08Wz0PaLgJa2F25S7zc4eEkFbXJ0cduq+q2PFEq1vObzdIqlYYYIP5T2
 sYT/KAZgsjwTHTj0jwJ6kox1NpD3FwjKFWhL+y+T4CGJWcMw5aRUTrPURIDKSBs6y3A0
 urqS3TXTwBlm1EKVvHPq5uS2lWIL1DpfxTpjSXJqpu4YbIWQNacbD60jU8SfT52RTdAx
 P//w==
X-Gm-Message-State: AEkoout2TAmVhEQ3/9qPQ6dF+4flAjMR0ffLr93Cyn1TbplpFgj3F6pNzJ3WVmNvetyGqWwdG379ZGVWUVEURw==
X-Received: by 10.194.127.37 with SMTP id nd5mr1504881wjb.156.1471516705391;
 Thu, 18 Aug 2016 03:38:25 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.54.202 with HTTP; Thu, 18 Aug 2016 03:38:24 -0700 (PDT)
In-Reply-To: <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com>
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
 <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com>
 <B668A4BF-FC28-4210-809A-38D23C214A3B@gmail.com>
 <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com>
From: krad <kraduk@gmail.com>
Date: Thu, 18 Aug 2016 11:38:24 +0100
Message-ID: <CALfReyd5wKty6WtEXeT+caGRPkR084mJDXkYRJW5k3yS1DJeLw@mail.gmail.com>
Subject: Re: HAST + ZFS + NFS + CARP
To: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Cc: Ben RUBSON <ben.rubson@gmail.com>, FreeBSD FS <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 10:38:28 -0000

"new day, new things learned :)" job done for today then, it must be beer o
clock?

On 18 August 2016 at 09:02, InterNetX - Juergen Gotteswinter <
juergen.gotteswinter@internetx.com> wrote:

> new day, new things learned :)
>
> thanks!
>
> but like said, zrep does its on locking in zfs properties. so even this
> is fine
>
>         while true; do zrep sync all; done
>
>
> see
>
> http://www.bolthole.com/solaris/zrep/
>
> the properties look like this
>
> tank/vmail  redundant_metadata    all                    default
> tank/vmail  zrep:savecount        5                      local
> tank/vmail  zrep:lock-time        20160620101703         local
> tank/vmail  zrep:master           yes                    local
> tank/vmail  zrep:src-fs           tank/vmail             local
> tank/vmail  zrep:dest-host        stor1                local
> tank/vmail  zrep:src-host         stor2                local
> tank/vmail  zrep:dest-fs          tank/vmail             local
> tank/vmail  zrep:lock-pid         10887                  local
>
>
> it also takes care of the replication partner, the replicated datasets
> are read only until you tell zrep "go go go, become master"
>
> Simple usage summary:
> zrep (init|-i) ZFS/fs remotehost remoteZFSpool/fs
> zrep (sync|-S) [-q seconds] ZFS/fs
> zrep (sync|-S) [-q seconds] all
> zrep (sync|-S) ZFS/fs@snapshot    -- temporary retroactive sync
> zrep (status|-s) [-v] [(-a|ZFS/fs)]
> zrep refresh ZFS/fs               -- pull version of sync
> zrep (list|-l) [-Lv]
> zrep (expire|-e) [-L] (ZFS/fs ...)|(all)|()
> zrep (changeconfig|-C) [-f] ZFS/fs remotehost remoteZFSpool/fs
> zrep (changeconfig|-C) [-f] [-d] ZFS/fs srchost srcZFSpool/fs
> zrep failover [-L] ZFS/fs
> zrep takeover [-L] ZFS/fs
>
>
> zrep failover pool/ds -> master sets pool read only, connects to slave,
> sets pool on slave rw
>
> should be easy to combine with carp/devd, but this is the land of vodoo
> automagic again which i dont trust that much.
>
>
> Am 18.08.2016 um 09:40 schrieb Ben RUBSON:
> > Yep this is better :
> >
> > if mkdir <lockdir>
> > then
> >       do_your_job
> >       rm -rf <lockdir>
> > fi
> >
> >
> >
> >> On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter <
> juergen.gotteswinter@internetx.com> wrote:
> >>
> >> uhm, dont really investigated if it is or not. add a "sync" after that=
?
> >> or replace it?
> >>
> >> but anyway, thanks for the hint. will dig into this!
> >>
> >> Am 18.08.2016 um 09:36 schrieb krad:
> >>> I didnt think touch was atomic, mkdir is though
> >>>
> >>> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter
> >>> <juergen.gotteswinter@internetx.com
> >>> <mailto:juergen.gotteswinter@internetx.com>> wrote:
> >>>
> >>>
> >>>
> >>>    Am 17.08.2016 um 20:03 schrieb Linda Kateley:
> >>>> I just do consulting so I don't always get to see the end of the
> >>>> project. Although we are starting to do more ongoing support so we c=
an
> >>>> see the progress..
> >>>>
> >>>> I have worked with some of the guys from high-availability.com <
> http://high-availability.com> for maybe
> >>>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does wo=
rk
> >>>> beautifully with omni/illumos. The one customer I have running it in
> >>>> prod is an isp in south america running openstack and zfs on freebsd
> as
> >>>> iscsi. Big boxes, 90+ drives per frame.  If someone would like try
> it, i
> >>>> have some contacts there. Ping me offlist.
> >>>
> >>>    no offense, but it sounds a bit like marketing.
> >>>
> >>>    here: running nexenta ha setup since several years with one
> catastrophic
> >>>    failure due to split brain
> >>>
> >>>>
> >>>> You do risk losing data if you batch zfs send. It is very hard to ru=
n
> >>>> that real time.
> >>>
> >>>    depends on how much data changes aka delta size
> >>>
> >>>
> >>>    You have to take the snap then send the snap. Most
> >>>> people run in cron, even if it's not in cron, you would want one to
> >>>> finish before you started the next.
> >>>
> >>>    thats the reason why lock files where invented, tools like zrep
> handle
> >>>    that themself via additional zfs properties
> >>>
> >>>    or, if one does not trust a single layer
> >>>
> >>>    -- snip --
> >>>    #!/bin/sh
> >>>    if [ ! -f /var/run/replic ] ; then
> >>>            touch /var/run/replic
> >>>            /blah/path/zrep sync all >> /var/log/zfsrepli.log
> >>>            rm -f /var/run/replic
> >>>    fi
> >>>    -- snip --
> >>>
> >>>    something like this, simple
> >>>
> >>>     If you lose the sending host before
> >>>> the receive is complete you won't have a full copy.
> >>>
> >>>    if rsf fails, and you end up in split brain you loose way more. be=
en
> >>>    there, seen that.
> >>>
> >>>    With zfs though you
> >>>> will probably still have the data on the sending host, however long =
it
> >>>> takes to bring it back up. RSF-1 runs in the zfs stack and send the
> >>>> writes to the second system. It's kind of pricey, but actually much
> less
> >>>> expensive than commercial alternatives.
> >>>>
> >>>> Anytime you run anything sync it adds latency but makes things safer=
..
> >>>
> >>>    not surprising, it all depends on the usecase
> >>>
> >>>> There is also a cool tool I like, called zerto for vmware that sits =
in
> >>>> the hypervisor and sends a sync copy of a write locally and then an
> >>>> async remotely. It's pretty cool. Although I haven't run it myself,
> have
> >>>> a bunch of customers running it. I believe it works with proxmox too=
.
> >>>>
> >>>> Most people I run into (these days) don't mind losing 5 or even 30
> >>>> minutes of data. Small shops.
> >>>
> >>>    you talk about minutes, what delta size are we talking here about?
> why
> >>>    not using zrep in a loop for example
> >>>
> >>>     They usually have a copy somewhere else.
> >>>> Or the cost of 5-30 minutes isn't that great. I used work as a
> >>>> datacenter architect for sun/oracle with only fortune 500. There
> losing
> >>>> 1 sec could put large companies out of business. I worked with banks
> and
> >>>> exchanges.
> >>>
> >>>    again, usecase. i bet 99% on this list are not operating fortune 5=
00
> >>>    bank filers
> >>>
> >>>    They couldn't ever lose a single transaction. Most people
> >>>> nowadays do the replication/availability in the application though a=
nd
> >>>> don't care about underlying hardware, especially disk.
> >>>>
> >>>>
> >>>> On 8/17/16 11:55 AM, Chris Watson wrote:
> >>>>> Of course, if you are willing to accept some amount of data loss th=
at
> >>>>> opens up a lot more options. :)
> >>>>>
> >>>>> Some may find that acceptable though. Like turning off fsync with
> >>>>> PostgreSQL to get much higher throughput. As little no as you are
> >>>    made
> >>>>> *very* aware of the risks.
> >>>>>
> >>>>> It's good to have input in this thread from one with more experienc=
e
> >>>>> with RSF-1 than the rest of us. You confirm what others have that
> >>>    said
> >>>>> about RSF-1, that it's stable and works well. What were you deployi=
ng
> >>>>> it on?
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>> Sent from my iPhone 5
> >>>>>
> >>>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
> >>>    <mailto:lkateley@kateley.com>
> >>>>> <mailto:lkateley@kateley.com <mailto:lkateley@kateley.com>>> wrote:
> >>>>>
> >>>>>> The question I always ask, as an architect, is "can you lose 1
> >>>    minute
> >>>>>> worth of data?" If you can, then batched replication is perfect. I=
f
> >>>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has
> >>>>>> worked extremely well. If i remember right, it works at the dmu. I
> >>>>>> would suggest try it. They have been trying to have a full freebsd
> >>>>>> solution, I have several customers running it well.
> >>>>>>
> >>>>>> linda
> >>>>>>
> >>>>>>
> >>>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
> >>>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
> >>>>>>> Gotteswinter wrote:
> >>>>>>>>
> >>>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
> >>>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
> >>>>>>>>> Gotteswinter wrote:
> >>>>>>>>>>
> >>>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> >>>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.cit=
y
> >>>>>>>>>>>> <mailto:julien@perdition.city
> >>>    <mailto:julien@perdition.city>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> As I said in a previous post I tested the zfs send/receive
> >>>>>>>>>>>> approach (with
> >>>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
> >>>>>>>>>>>> all what you
> >>>>>>>>>>>> said, especially about off-site replicate and synchronous
> >>>>>>>>>>>> replication.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at th=
e
> >>>>>>>>>>>> moment,
> >>>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, b=
ut
> >>>>>>>>>>>> ATM it
> >>>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
> >>>>>>>>>>> I must be too old school, but I don=E2=80=99t quite like the =
idea of
> >>>>>>>>>>> using an essentially unreliable transport
> >>>>>>>>>>> (Ethernet) for low-level filesystem operations.
> >>>>>>>>>>>
> >>>>>>>>>>> In case something went wrong, that approach could risk
> >>>>>>>>>>> corrupting a pool. Although, frankly,
> >>>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS H=
BA
> >>>>>>>>>>> problem that caused some
> >>>>>>>>>>> silent corruption.
> >>>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
> >>>>>>>>>> hooked up
> >>>>>>>>>> to the same disk chassis.
> >>>>>>>>> Yes this is the first thing on the list to avoid .. :)
> >>>>>>>>>
> >>>>>>>>> I'm still busy to test the whole setup here, including the
> >>>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can
> >>>    prevent
> >>>>>>>>> that thanks to:
> >>>>>>>>>
> >>>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
> >>>>>>>>> and you can't import the pool (even with -f) for ex (filer2
> >>>    is the
> >>>>>>>>> BACKUP):
> >>>>>>>>>
> >>>    https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
> >>>    <https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f=
>
> >>>>>>>>>
> >>>>>>>>> - The shared pool should not be mounted at boot, and you should
> >>>>>>>>> ensure
> >>>>>>>>> that the failover script is not executed during boot time too:
> >>>>>>>>> this is
> >>>>>>>>> to handle the case wherein both machines turn off and/or
> >>>    re-ignite at
> >>>>>>>>> the same time. Indeed, the CARP interface can "flip" it's statu=
s
> >>>>>>>>> if both
> >>>>>>>>> machines are powered on at the same time, for ex:
> >>>>>>>>>
> >>>    https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
> >>>    <https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf=
>
> and
> >>>>>>>>> you will have a split-brain scenario
> >>>>>>>>>
> >>>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
> >>>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should no=
t
> >>>>>>>>> happen, this can be handled with a trigger file or something li=
ke
> >>>>>>>>> that
> >>>>>>>>>
> >>>>>>>>> - I've still have to check if the order is OK, but I think
> >>>    that as
> >>>>>>>>> long
> >>>>>>>>> as you shutdown the replication interface and that you adapt th=
e
> >>>>>>>>> advskew (including the config file) of the CARP interface
> >>>    before the
> >>>>>>>>> zpool import -f in the failover script you can be relatively
> >>>>>>>>> confident
> >>>>>>>>> that nothing will be written on the iSCSI targets
> >>>>>>>>>
> >>>>>>>>> - A zpool scrub should be run at regular intervals
> >>>>>>>>>
> >>>>>>>>> This is my MASTER -> BACKUP CARP script ATM
> >>>>>>>>>
> >>>    https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
> >>>    <https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7=
>
> >>>>>>>>>
> >>>>>>>>> Julien
> >>>>>>>>>
> >>>>>>>> 100=E2=82=AC question without detailed looking at that script. y=
es from a
> >>>>>>>> first
> >>>>>>>> view its super simple, but: why are solutions like rsf-1 such mo=
re
> >>>>>>>> powerful / featurerich. Theres a reason for, which is that
> >>>    they try to
> >>>>>>>> cover every possible situation (which makes more than sense
> >>>    for this).
> >>>>>>> I've never used "rsf-1" so I can't say much more about it, but
> >>>    I have
> >>>>>>> no doubts about it's ability to handle "complex situations", wher=
e
> >>>>>>> multiple nodes / networks are involved.
> >>>>>>>
> >>>>>>>> That script works for sure, within very limited cases imho
> >>>>>>>>
> >>>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happe=
n
> >>>>>>>>>> sooner
> >>>>>>>>>> or later especially when it comes to homegrown automatism
> >>>    solutions.
> >>>>>>>>>> even the commercial parts where much more time/work goes
> >>>    into such
> >>>>>>>>>> solutions fail in a regular manner
> >>>>>>>>>>
> >>>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, th=
at
> >>>>>>>>>>> you can consider it
> >>>>>>>>>>> essentially atomic. A transport corruption should not cause
> >>>>>>>>>>> trouble (apart from a failed
> >>>>>>>>>>> "zfs receive") and with snapshot retention you can even roll
> >>>>>>>>>>> back. You can=E2=80=99t roll back
> >>>>>>>>>>> zpool replications :)
> >>>>>>>>>>>
> >>>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as
> >>>    your
> >>>>>>>>>>> zfs receive doesn=E2=80=99t involve a rollback
> >>>>>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by =
mistake.
> >>>>>>>>>>> Just make sure that your replica datasets
> >>>>>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain=
.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Borja.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
> >>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
> >>>    mailing list
> >>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>>>>>>>>> To unsubscribe, send any mail to
> >>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
> >>>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
> >>>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
> >>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
> >>>    mailing list
> >>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>>>>>>>> To unsubscribe, send any mail to
> >>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
> >>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
> >>>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
> >>>    mailing list
> >>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>>>> To unsubscribe, send any mail to
> >>>    "freebsd-fs-unsubscribe@freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
> >>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
> >>>>
> >>>> _______________________________________________
> >>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
> >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>> To unsubscribe, send any mail to
> >>>    "freebsd-fs-unsubscribe@freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> >>>    _______________________________________________
> >>>    freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing lis=
t
> >>>    https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>    To unsubscribe, send any mail to "freebsd-fs-unsubscribe@
> freebsd.org
> >>>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> >>>
> >>>
> >> _______________________________________________
> >> freebsd-fs@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

From owner-freebsd-fs@freebsd.org  Thu Aug 18 10:50:16 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97AC9BBE52A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 10:50:16 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com
 [195.16.151.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4EEAC1A8C
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 10:50:15 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop01.sare.net (Postfix) with ESMTPSA id 2AC549DD3B3;
 Thu, 18 Aug 2016 12:50:07 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
Date: Thu, 18 Aug 2016 12:50:06 +0200
Cc: Chris Watson <bsdunix44@gmail.com>,
 freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <354253C2-E42E-4B9C-9931-9135A5A7DFD9@sarenet.es>
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
To: linda@kateley.com
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 10:50:16 -0000


> On 17 Aug 2016, at 20:03, Linda Kateley <lkateley@kateley.com> wrote:
>=20
> You do risk losing data if you batch zfs send. It is very hard to run =
that real time. You have to take the snap then send the snap. Most =
people run in cron, even if it's not in cron, you would want one to =
finish before you started the next. If you lose the sending host before =
the receive is complete you won't have a full copy. With zfs though you =
will probably still have the data on the sending host, however long it =
takes to bring it back up. RSF-1 runs in the zfs stack and send the =
writes to the second system. It's kind of pricey, but actually much less =
expensive than commercial alternatives.

Doing somewhat critical stuff off cron is not usually a good idea. I do =
ZFS replication with a custom program which makes sure of some important =
stuff:

- Using holds to avoid an accidental snapshot deletion to require a full =
send/receive.=20

- Avoiding starting a new send/receive on a dataset in case the previous =
one didn=E2=80=99t finish for whatever reason (the main problem with =
cron)

- Offering the possibility of some random variation on the replication =
period so that, in case several happen to start simultaneously, you =
don=E2=80=99t have a periodically overloaded system.

- Avoiding mounting the replicas so that the receive won=E2=80=99t need =
a rollback, which would be potentially risky.

- Supports one-to-many replicas, with different periodicity for each =
destination if required.

I am sorry I can=E2=80=99t share it (company property) but the program =
is rather silly anyway. The important work was the decision to have the =
previous
features, and a design decision to avoid destructive and portentially =
error-prone operations such as rollbacks.=20


Most applications that require real time replication are databases, and =
they usually include a clustering option which can be much simpler to =
manage (and more robust in this case) than filesystem replication.

For other cases, often you can design around the loss of a small amount =
of data. I understand that in some cases you have no other option,
but the benefits of asynchronous send/receive are so many, especially if =
you are on a tight budget, it=E2=80=99s well worth to try to make the =
most of it.


Borja.


From owner-freebsd-fs@freebsd.org  Thu Aug 18 11:17:40 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7740BBE3D3
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 11:17:40 +0000 (UTC)
 (envelope-from gpalmer@freebsd.org)
Received: from mail.in-addr.com (mail.in-addr.com
 [IPv6:2a01:4f8:191:61e8::2525:2525])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 56C8A1827
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 11:17:40 +0000 (UTC)
 (envelope-from gpalmer@freebsd.org)
Received: from gjp by mail.in-addr.com with local (Exim 4.87 (FreeBSD))
 (envelope-from <gpalmer@freebsd.org>)
 id 1baLKL-0008Ap-Oq; Thu, 18 Aug 2016 12:17:37 +0100
Date: Thu, 18 Aug 2016 12:17:37 +0100
From: Gary Palmer <gpalmer@freebsd.org>
To: Ben RUBSON <ben.rubson@gmail.com>
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
Subject: Re: HAST + ZFS + NFS + CARP
Message-ID: <20160818111737.GB47566@in-addr.com>
References: <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
 <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com>
 <B668A4BF-FC28-4210-809A-38D23C214A3B@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <B668A4BF-FC28-4210-809A-38D23C214A3B@gmail.com>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 11:17:40 -0000

Isn't this exactly what the lockf command was designed to do for you?

I'd also suggest rmdir rather than rm -rf

On Thu, Aug 18, 2016 at 09:40:50AM +0200, Ben RUBSON wrote:
> Yep this is better :
> 
> if mkdir <lockdir>
> then
> 	do_your_job
> 	rm -rf <lockdir>
> fi
> 
> 
> 
> > On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> wrote:
> > 
> > uhm, dont really investigated if it is or not. add a "sync" after that?
> > or replace it?
> > 
> > but anyway, thanks for the hint. will dig into this!
> > 
> > Am 18.08.2016 um 09:36 schrieb krad:
> >> I didnt think touch was atomic, mkdir is though
> >> 
> >> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter
> >> <juergen.gotteswinter@internetx.com
> >> <mailto:juergen.gotteswinter@internetx.com>> wrote:
> >> 
> >> 
> >> 
> >>    Am 17.08.2016 um 20:03 schrieb Linda Kateley:
> >>> I just do consulting so I don't always get to see the end of the
> >>> project. Although we are starting to do more ongoing support so we can
> >>> see the progress..
> >>> 
> >>> I have worked with some of the guys from high-availability.com <http://high-availability.com> for maybe
> >>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work
> >>> beautifully with omni/illumos. The one customer I have running it in
> >>> prod is an isp in south america running openstack and zfs on freebsd as
> >>> iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i
> >>> have some contacts there. Ping me offlist.
> >> 
> >>    no offense, but it sounds a bit like marketing.
> >> 
> >>    here: running nexenta ha setup since several years with one catastrophic
> >>    failure due to split brain
> >> 
> >>> 
> >>> You do risk losing data if you batch zfs send. It is very hard to run
> >>> that real time.
> >> 
> >>    depends on how much data changes aka delta size
> >> 
> >> 
> >>    You have to take the snap then send the snap. Most
> >>> people run in cron, even if it's not in cron, you would want one to
> >>> finish before you started the next.
> >> 
> >>    thats the reason why lock files where invented, tools like zrep handle
> >>    that themself via additional zfs properties
> >> 
> >>    or, if one does not trust a single layer
> >> 
> >>    -- snip --
> >>    #!/bin/sh
> >>    if [ ! -f /var/run/replic ] ; then
> >>            touch /var/run/replic
> >>            /blah/path/zrep sync all >> /var/log/zfsrepli.log
> >>            rm -f /var/run/replic
> >>    fi
> >>    -- snip --
> >> 
> >>    something like this, simple
> >> 
> >>     If you lose the sending host before
> >>> the receive is complete you won't have a full copy.
> >> 
> >>    if rsf fails, and you end up in split brain you loose way more. been
> >>    there, seen that.
> >> 
> >>    With zfs though you
> >>> will probably still have the data on the sending host, however long it
> >>> takes to bring it back up. RSF-1 runs in the zfs stack and send the
> >>> writes to the second system. It's kind of pricey, but actually much less
> >>> expensive than commercial alternatives.
> >>> 
> >>> Anytime you run anything sync it adds latency but makes things safer..
> >> 
> >>    not surprising, it all depends on the usecase
> >> 
> >>> There is also a cool tool I like, called zerto for vmware that sits in
> >>> the hypervisor and sends a sync copy of a write locally and then an
> >>> async remotely. It's pretty cool. Although I haven't run it myself, have
> >>> a bunch of customers running it. I believe it works with proxmox too.
> >>> 
> >>> Most people I run into (these days) don't mind losing 5 or even 30
> >>> minutes of data. Small shops.
> >> 
> >>    you talk about minutes, what delta size are we talking here about? why
> >>    not using zrep in a loop for example
> >> 
> >>     They usually have a copy somewhere else.
> >>> Or the cost of 5-30 minutes isn't that great. I used work as a
> >>> datacenter architect for sun/oracle with only fortune 500. There losing
> >>> 1 sec could put large companies out of business. I worked with banks and
> >>> exchanges.
> >> 
> >>    again, usecase. i bet 99% on this list are not operating fortune 500
> >>    bank filers
> >> 
> >>    They couldn't ever lose a single transaction. Most people
> >>> nowadays do the replication/availability in the application though and
> >>> don't care about underlying hardware, especially disk.
> >>> 
> >>> 
> >>> On 8/17/16 11:55 AM, Chris Watson wrote:
> >>>> Of course, if you are willing to accept some amount of data loss that
> >>>> opens up a lot more options. :)
> >>>> 
> >>>> Some may find that acceptable though. Like turning off fsync with
> >>>> PostgreSQL to get much higher throughput. As little no as you are
> >>    made
> >>>> *very* aware of the risks.
> >>>> 
> >>>> It's good to have input in this thread from one with more experience
> >>>> with RSF-1 than the rest of us. You confirm what others have that
> >>    said
> >>>> about RSF-1, that it's stable and works well. What were you deploying
> >>>> it on?
> >>>> 
> >>>> Chris
> >>>> 
> >>>> Sent from my iPhone 5
> >>>> 
> >>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
> >>    <mailto:lkateley@kateley.com>
> >>>> <mailto:lkateley@kateley.com <mailto:lkateley@kateley.com>>> wrote:
> >>>> 
> >>>>> The question I always ask, as an architect, is "can you lose 1
> >>    minute
> >>>>> worth of data?" If you can, then batched replication is perfect. If
> >>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has
> >>>>> worked extremely well. If i remember right, it works at the dmu. I
> >>>>> would suggest try it. They have been trying to have a full freebsd
> >>>>> solution, I have several customers running it well.
> >>>>> 
> >>>>> linda
> >>>>> 
> >>>>> 
> >>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
> >>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
> >>>>>> Gotteswinter wrote:
> >>>>>>> 
> >>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
> >>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
> >>>>>>>> Gotteswinter wrote:
> >>>>>>>>> 
> >>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
> >>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city
> >>>>>>>>>>> <mailto:julien@perdition.city
> >>    <mailto:julien@perdition.city>>> wrote:
> >>>>>>>>>>> 
> >>>>>>>>>>> As I said in a previous post I tested the zfs send/receive
> >>>>>>>>>>> approach (with
> >>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
> >>>>>>>>>>> all what you
> >>>>>>>>>>> said, especially about off-site replicate and synchronous
> >>>>>>>>>>> replication.
> >>>>>>>>>>> 
> >>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
> >>>>>>>>>>> moment,
> >>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but
> >>>>>>>>>>> ATM it
> >>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
> >>>>>>>>>> I must be too old school, but I don???t quite like the idea of
> >>>>>>>>>> using an essentially unreliable transport
> >>>>>>>>>> (Ethernet) for low-level filesystem operations.
> >>>>>>>>>> 
> >>>>>>>>>> In case something went wrong, that approach could risk
> >>>>>>>>>> corrupting a pool. Although, frankly,
> >>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA
> >>>>>>>>>> problem that caused some
> >>>>>>>>>> silent corruption.
> >>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
> >>>>>>>>> hooked up
> >>>>>>>>> to the same disk chassis.
> >>>>>>>> Yes this is the first thing on the list to avoid .. :)
> >>>>>>>> 
> >>>>>>>> I'm still busy to test the whole setup here, including the
> >>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can
> >>    prevent
> >>>>>>>> that thanks to:
> >>>>>>>> 
> >>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
> >>>>>>>> and you can't import the pool (even with -f) for ex (filer2
> >>    is the
> >>>>>>>> BACKUP):
> >>>>>>>> 
> >>    https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
> >>    <https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f>
> >>>>>>>> 
> >>>>>>>> - The shared pool should not be mounted at boot, and you should
> >>>>>>>> ensure
> >>>>>>>> that the failover script is not executed during boot time too:
> >>>>>>>> this is
> >>>>>>>> to handle the case wherein both machines turn off and/or
> >>    re-ignite at
> >>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status
> >>>>>>>> if both
> >>>>>>>> machines are powered on at the same time, for ex:
> >>>>>>>> 
> >>    https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
> >>    <https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf> and
> >>>>>>>> you will have a split-brain scenario
> >>>>>>>> 
> >>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
> >>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
> >>>>>>>> happen, this can be handled with a trigger file or something like
> >>>>>>>> that
> >>>>>>>> 
> >>>>>>>> - I've still have to check if the order is OK, but I think
> >>    that as
> >>>>>>>> long
> >>>>>>>> as you shutdown the replication interface and that you adapt the
> >>>>>>>> advskew (including the config file) of the CARP interface
> >>    before the
> >>>>>>>> zpool import -f in the failover script you can be relatively
> >>>>>>>> confident
> >>>>>>>> that nothing will be written on the iSCSI targets
> >>>>>>>> 
> >>>>>>>> - A zpool scrub should be run at regular intervals
> >>>>>>>> 
> >>>>>>>> This is my MASTER -> BACKUP CARP script ATM
> >>>>>>>> 
> >>    https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
> >>    <https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7>
> >>>>>>>> 
> >>>>>>>> Julien
> >>>>>>>> 
> >>>>>>> 100??? question without detailed looking at that script. yes from a
> >>>>>>> first
> >>>>>>> view its super simple, but: why are solutions like rsf-1 such more
> >>>>>>> powerful / featurerich. Theres a reason for, which is that
> >>    they try to
> >>>>>>> cover every possible situation (which makes more than sense
> >>    for this).
> >>>>>> I've never used "rsf-1" so I can't say much more about it, but
> >>    I have
> >>>>>> no doubts about it's ability to handle "complex situations", where
> >>>>>> multiple nodes / networks are involved.
> >>>>>> 
> >>>>>>> That script works for sure, within very limited cases imho
> >>>>>>> 
> >>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen
> >>>>>>>>> sooner
> >>>>>>>>> or later especially when it comes to homegrown automatism
> >>    solutions.
> >>>>>>>>> even the commercial parts where much more time/work goes
> >>    into such
> >>>>>>>>> solutions fail in a regular manner
> >>>>>>>>> 
> >>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that
> >>>>>>>>>> you can consider it
> >>>>>>>>>> essentially atomic. A transport corruption should not cause
> >>>>>>>>>> trouble (apart from a failed
> >>>>>>>>>> "zfs receive") and with snapshot retention you can even roll
> >>>>>>>>>> back. You can???t roll back
> >>>>>>>>>> zpool replications :)
> >>>>>>>>>> 
> >>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as
> >>    your
> >>>>>>>>>> zfs receive doesn???t involve a rollback
> >>>>>>>>>> to the latest snapshot, it won???t destroy anything by mistake.
> >>>>>>>>>> Just make sure that your replica datasets
> >>>>>>>>>> aren???t mounted and zfs receive won???t complain.
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> Cheers,
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> Borja.
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
> >>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
> >>    mailing list
> >>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>>>>>>>> To unsubscribe, send any mail to
> >>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
> >>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
> >>>>>>>>>> 
> >>>>>>>>> _______________________________________________
> >>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
> >>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
> >>    mailing list
> >>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>>>>>>> To unsubscribe, send any mail to
> >>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
> >>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
> >>>>> 
> >>>>> _______________________________________________
> >>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
> >>    <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
> >>    mailing list
> >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>>>> To unsubscribe, send any mail to
> >>    "freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>
> >>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
> >>> 
> >>> _______________________________________________
> >>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>> To unsubscribe, send any mail to
> >>    "freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> >>    _______________________________________________
> >>    freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
> >>    https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>    <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> >>    To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
> >>    <mailto:freebsd-fs-unsubscribe@freebsd.org>"
> >> 
> >> 
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@freebsd.org  Thu Aug 18 11:32:19 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C37B8BBEBEF
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 11:32:19 +0000 (UTC)
 (envelope-from emz@norma.perm.ru)
Received: from elf.hq.norma.perm.ru (mail.norma.perm.ru [IPv6:2a00:7540:1::5])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client CN "mail.norma.perm.ru",
 Issuer "Vivat-Trade UNIX Root CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4755E1911
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 11:32:19 +0000 (UTC)
 (envelope-from emz@norma.perm.ru)
Received: from bsdrookie.norma.com. ([IPv6:fd00::7fe])
 by elf.hq.norma.perm.ru (8.15.2/8.15.2) with ESMTPS id u7IBWCfo036601
 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO)
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 16:32:13 +0500 (YEKT)
 (envelope-from emz@norma.perm.ru)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=norma.perm.ru; s=key;
 t=1471519935; bh=fOVv6ickd2L7fZ97nb2B+DOqhmHXCfgrsgHnSF/+PFY=;
 h=To:From:Subject:Date;
 b=izbVrb22gjaWMIlfxFkItM/8wIHaW60AhvhawaX+qAGFbAiCp4HyfnlhAtVcAqWwS
 XMMtEpTCTblz41nBw3nzbpYhnhu0pYYKgney5PT4Iiyznnz7z3kY9RyRHbed8gSc1u
 lg8Q/eqdUB7XfSTxpknyoEXU+5K7vvrhpn8Ygdog=
To: FreeBSD FS <freebsd-fs@freebsd.org>
From: "Eugene M. Zheganin" <emz@norma.perm.ru>
Subject: zpool list FREE vs zfs list AVAIL
Message-ID: <57B59CBC.8000904@norma.perm.ru>
Date: Thu, 18 Aug 2016 16:32:12 +0500
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.7.0
MIME-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 11:32:19 -0000

Hi.

What is the difference between zpool list FREE for a pool and zfs list
AVAIL ? Because they differ a lot, I'm looking at a server at the moment
where the difference is like dozens of times: zfs list reports that 97
gigabytes is available, and the zpool list for the same pool says that
4.18 terabytes is free. From my point of view this should be the same
number.

Thanks.
Eugene.

From owner-freebsd-fs@freebsd.org  Thu Aug 18 14:55:05 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 56FF4BBD746
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 14:55:05 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 18AD211C4
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 14:55:05 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD))
 (envelope-from <slw@zxy.spb.ru>)
 id 1baOid-000CjB-P2; Thu, 18 Aug 2016 17:54:55 +0300
Date: Thu, 18 Aug 2016 17:54:55 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Matthias Gamsjager <mgamsjager@gmail.com>, freebsd-fs@freebsd.org
Subject: Re: ZFS ARC under memory pressure
Message-ID: <20160818145455.GA48739@zxy.spb.ru>
References: <20160816193416.GM8192@zxy.spb.ru>
 <CA+D9Qhv0MW6mkEWDhGuB-S7W_6oxV5bKJ6GGpSzdqQYnNPeyhA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+D9Qhv0MW6mkEWDhGuB-S7W_6oxV5bKJ6GGpSzdqQYnNPeyhA@mail.gmail.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 14:55:05 -0000

On Wed, Aug 17, 2016 at 09:18:20AM +0200, Matthias Gamsjager wrote:

> On 16 August 2016 at 21:34, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> 
> > I see issuses with ZFS ARC inder memory pressure.
> > ZFS ARC size can be dramaticaly reduced, up to arc_min.
> >
> > As I see memory pressure event cause call arc_lowmem and set needfree:
> >
> > arc.c:arc_lowmem
> >
> >         needfree = btoc(arc_c >> arc_shrink_shift);
> >
> > After this, arc_available_memory return negative vaules (PAGESIZE *
> > (-needfree)) until needfree is zero. Independent how too much memory
> > freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
> > arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every
> > loop interation).
> >
> > arc_c droped to minimum value if arc_size fast enough droped.
> >
> > No control current to initial memory allocation.
> >
> > As result, I can see needless arc reclaim, from 10x to 100x times.
> >
> > Can some one check me and comment this?
> > _______________________________________________
> >
> 
> 
> What version are you on?

stable/10, same code in stable/11/9 and current/12

-- 
Slawa Olhovchenkov

From owner-freebsd-fs@freebsd.org  Thu Aug 18 14:58:59 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id ABBAFBBDA91;
 Thu, 18 Aug 2016 14:58:59 +0000 (UTC)
 (envelope-from emz@norma.perm.ru)
Received: from elf.hq.norma.perm.ru (mail.norma.perm.ru [IPv6:2a00:7540:1::5])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client CN "mail.norma.perm.ru",
 Issuer "Vivat-Trade UNIX Root CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2EFB21350;
 Thu, 18 Aug 2016 14:58:58 +0000 (UTC)
 (envelope-from emz@norma.perm.ru)
Received: from bsdrookie.norma.com. ([IPv6:fd00::7fe])
 by elf.hq.norma.perm.ru (8.15.2/8.15.2) with ESMTPS id u7IEwtox052480
 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO);
 Thu, 18 Aug 2016 19:58:55 +0500 (YEKT)
 (envelope-from emz@norma.perm.ru)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=norma.perm.ru; s=key;
 t=1471532335; bh=hnc4Ar2j/187ahmTWpLNqP1CJA3hyHJ3tgqfhuFf7Qc=;
 h=To:Cc:From:Subject:Date;
 b=joczOecG2LeOvEj3UA9SR1Y2erZ7Flsl19FQJD032D5Wi/3aQx/MvqbIR5fBq/jGJ
 68JRKTa97EP1xrQrOEgJv0ngPoRVJNwaw3Ml6mcGsEguzNYzz5svAkKYotQ6w2Qoxb
 Uwq68fm0lWYNFTPElG4b70VCB45VzYKSoaUjYbkg=
To: FreeBSD FS <freebsd-fs@freebsd.org>
Cc: freebsd-stable <freebsd-stable@FreeBSD.org>
From: "Eugene M. Zheganin" <emz@norma.perm.ru>
Subject: cannot destroy '<dataset name>': dataset is busy vs iSCSI
Message-ID: <57B5CD2F.2070204@norma.perm.ru>
Date: Thu, 18 Aug 2016 19:58:55 +0500
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.7.0
MIME-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 14:58:59 -0000

Hi.

I'm using zvol clones with iSCSI. Perdiodically I renew them and destroy
the old ones, but sometimes the clone gets stuck and refuses to be
destroyed:

(I'm showing the full sequence so it's self explanatory who is who's parent)

[root@san2:/etc]# zfs destroy esx/games-reference1@ver5_6       
cannot destroy 'esx/games-reference1@ver5_6': snapshot has dependent clones
use '-R' to destroy the following datasets:
esx/games-reference1-ver5_6-worker111
[root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111
cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy

The only entity that can hold the dataset open is ctld, so:

[root@san2:/etc]# service ctld reload                                    
[root@san2:/etc]# grep esx/games-reference1-ver5_6-worker111 /etc/ctl.conf
[root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111      
cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy

As you can see, the clone isn't mentioned in ctl.conf, but still refuses
to be destroyed.
Is there any way to destroy it without restarting ctld or rebooting the
server ? iSCSI is vital for production, but clones sometimes holds lot
of space.

Thanks.
Eugene.

From owner-freebsd-fs@freebsd.org  Thu Aug 18 17:04:42 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9188BBECD1
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 17:04:42 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: from mail-it0-x233.google.com (mail-it0-x233.google.com
 [IPv6:2607:f8b0:4001:c0b::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B26031BE9
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 17:04:42 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: by mail-it0-x233.google.com with SMTP id e63so3685197ith.1
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 10:04:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=kateley-com.20150623.gappssmtp.com; s=20150623;
 h=reply-to:subject:references:to:from:organization:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=sjD3C1UduSe7+TJ7RrNmA+eJRZ5T1psJbeO5vNzMLcA=;
 b=nZOO229YoHeVR91n+eFtvLPeEco0f59iG2Sntk3Pc+XNjAO9ffb8F6EGh3q8pUNDkk
 voWTCUyOO/uFkPI28UVB6w3cSERQoPnJgpMVevgSKvA3X5ALdDS5jJqLYf1bTOz+yEgp
 97bbNs37mcwD6+2yEfKvdwkoGtkHRxYE1Ic2ibv4nvldo0GV9hce6RF7cFprbUt+dXnI
 jLyakjGaPeORj/tFw3cHKgrrNVuKS7Nwfr3VRVg55IhKBf8RP4BdRuSJuEfR6u9VAs6G
 1/jbIhVH7f4Cj4kbwLp9WH4TWhDNZdpn/AEvQVUCJmUv5uGsc1fRhesJbr1SOGT9F8jX
 zWNg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:reply-to:subject:references:to:from:organization
 :message-id:date:user-agent:mime-version:in-reply-to
 :content-transfer-encoding;
 bh=sjD3C1UduSe7+TJ7RrNmA+eJRZ5T1psJbeO5vNzMLcA=;
 b=V3mmPs2rp7FuWlUeeU8VMiphpW7kFn7rfhwAl1v0jU0F/mIYKJdwaXjlWzNHZTFKtP
 NyqylxrwgoM+HQyqGeSLiuJurOiifM5FgAJTBI361XxlFXfItsCh931p3grmOwFPlejo
 CDx/DElBlIwW3kymlsbiK/pIy4lZMDtqmgAl/gIUHfpXnABnIqPVlNd0VrzocIYL01vE
 kMpy26WspHTpiBA3l/htyDczxrKI0QPPsfewNL5FbLEDQ8D8u5kXI3bmeaby1n1PDeT0
 0F0tSony3JH2csb/1sVX3/7iVAm4mWhr0/0rDzBTPR0l8Bv1c+Oev7qYSMpbcRMQ7Sjm
 Z6Ig==
X-Gm-Message-State: AEkooutyhAUlTYyZ5oiVQdos1Fbm3baE/wOZLlyN1Dx9CaGQ/tFCuF6mKqMPTdYe/HqfKg==
X-Received: by 10.36.149.5 with SMTP id m5mr794636itd.20.1471539881144;
 Thu, 18 Aug 2016 10:04:41 -0700 (PDT)
Received: from [192.168.0.19] (67-4-156-204.mpls.qwest.net. [67.4.156.204])
 by smtp.googlemail.com with ESMTPSA id e6sm2291968ith.0.2016.08.18.10.04.39
 for <freebsd-fs@freebsd.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 18 Aug 2016 10:04:40 -0700 (PDT)
Reply-To: linda@kateley.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com>
To: freebsd-fs@freebsd.org
From: Linda Kateley <lkateley@kateley.com>
Organization: Kateley Company
Message-ID: <bbc86e19-a467-269d-65d7-bff6994dcbfd@kateley.com>
Date: Thu, 18 Aug 2016 12:04:35 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0)
 Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 17:04:43 -0000

Lemme send this over them :)

linda


On 8/17/16 4:14 PM, Ben RUBSON wrote:
>> On 17 Aug 2016, at 20:03, Linda Kateley <lkateley@kateley.com> wrote:
>>
>> RSF-1 runs in the zfs stack and send the writes to the second system.
> Linda, do you have any link to a documentation about this RSF-1 operation mode ?
>
> According to what I red about RSF-1, storage is shared between nodes, and RSF-1 manages the failover, we do not have 2 different storages.
> (so I don't really understand how writes are sent to the "second system")
>
> In addition, RSF-1 does not seem to help with long-distance replication to a different storage.
> But I may be wrong ?
> This is where ZFS send/receive helps.
> Or even a nicer solution I proposed a few weeks ago : https://www.illumos.org/issues/7166 (but a lot of work to achieve).
>
> Ben
>
>> On 8/17/16 11:55 AM, Chris Watson wrote:
>>> Of course, if you are willing to accept some amount of data loss that opens up a lot more options. :)
>>>
>>> Some may find that acceptable though. Like turning off fsync with PostgreSQL to get much higher throughput. As little no as you are made *very* aware of the risks.
>>>
>>> It's good to have input in this thread from one with more experience with RSF-1 than the rest of us. You confirm what others have that said about RSF-1, that it's stable and works well. What were you deploying it on?
>>>
>>> Chris
>>>
>>> Sent from my iPhone 5
>>>
>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com <mailto:lkateley@kateley.com>> wrote:
>>>
>>>> The question I always ask, as an architect, is "can you lose 1 minute worth of data?" If you can, then batched replication is perfect. If you can't.. then HA. Every place I have positioned it, rsf-1 has worked extremely well. If i remember right, it works at the dmu. I would suggest try it. They have been trying to have a full freebsd solution, I have several customers running it well.
>>>>
>>>> linda
>>>>
>>>>
>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter wrote:
>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote:
>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city <mailto:julien@perdition.city>> wrote:
>>>>>>>>>>
>>>>>>>>>> As I said in a previous post I tested the zfs send/receive approach (with
>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in all what you
>>>>>>>>>> said, especially about off-site replicate and synchronous replication.
>>>>>>>>>>
>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment,
>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it
>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport
>>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>>
>>>>>>>>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly,
>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some
>>>>>>>>> silent corruption.
>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines hooked up
>>>>>>>> to the same disk chassis.
>>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>>
>>>>>>> I'm still busy to test the whole setup here, including the
>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>>>>>> that thanks to:
>>>>>>>
>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>>>>>> BACKUP):
>>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>>>>
>>>>>>> - The shared pool should not be mounted at boot, and you should ensure
>>>>>>> that the failover script is not executed during boot time too: this is
>>>>>>> to handle the case wherein both machines turn off and/or re-ignite at
>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status if both
>>>>>>> machines are powered on at the same time, for ex:
>>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>>>>>> you will have a split-brain scenario
>>>>>>>
>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>>>> happen, this can be handled with a trigger file or something like that
>>>>>>>
>>>>>>> - I've still have to check if the order is OK, but I think that as long
>>>>>>> as you shutdown the replication interface and that you adapt the
>>>>>>> advskew (including the config file) of the CARP interface before the
>>>>>>> zpool import -f in the failover script you can be relatively confident
>>>>>>> that nothing will be written on the iSCSI targets
>>>>>>>
>>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>>
>>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>>>>
>>>>>>> Julien
>>>>>>>
>>>>>> 100€ question without detailed looking at that script. yes from a first
>>>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>>>> powerful / featurerich. Theres a reason for, which is that they try to
>>>>>> cover every possible situation (which makes more than sense for this).
>>>>> I've never used "rsf-1" so I can't say much more about it, but I have
>>>>> no doubts about it's ability to handle "complex situations", where
>>>>> multiple nodes / networks are involved.
>>>>>
>>>>>> That script works for sure, within very limited cases imho
>>>>>>
>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner
>>>>>>>> or later especially when it comes to homegrown automatism solutions.
>>>>>>>> even the commercial parts where much more time/work goes into such
>>>>>>>> solutions fail in a regular manner
>>>>>>>>
>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that you can consider it
>>>>>>>>> essentially atomic. A transport corruption should not cause trouble (apart from a failed
>>>>>>>>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back
>>>>>>>>> zpool replications :)
>>>>>>>>>
>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback
>>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets
>>>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Borja.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Thu Aug 18 17:13:54 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F025BBE061
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 17:13:54 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: from mail-it0-x234.google.com (mail-it0-x234.google.com
 [IPv6:2607:f8b0:4001:c0b::234])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 403BA12A1
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 17:13:54 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: by mail-it0-x234.google.com with SMTP id f6so3619838ith.0
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 10:13:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=kateley-com.20150623.gappssmtp.com; s=20150623;
 h=reply-to:subject:references:to:from:organization:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=/cv7LFUo9H4S+yvxGHb7DVtSk+yH52epUXSSck8hF/Q=;
 b=r2pBk41tOYKMmFgk17EPLf2lH+Eu6KZNtLJakADi6iHfRI9EqZlAbnFVjKJ7jncEzU
 B+9WW0nlOSzTbZTthGUhx7xNC3LpCQr20KVd5pcLv9pGU+IyR3A/KGRnziCZ51Hm5Ny9
 vMYbzbMfjnCvZbyf88NWdUlT/uVp1vEsNVkM7b1ZLByozP9gOYpyHoDNjpIkvDhfXMRn
 icuCXEmzn4b26hgdMKrUyvvhdr39TIzWQKz8PKGaejsAemPZabn/D2zSblDglILuvMhZ
 EaXQlt/VDUtt3hQEMmzD9s1hX/lkguaXX9MIw+sknQLJHLTxDvkTxlpnlCOQD43mhOJ7
 KRRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:reply-to:subject:references:to:from:organization
 :message-id:date:user-agent:mime-version:in-reply-to
 :content-transfer-encoding;
 bh=/cv7LFUo9H4S+yvxGHb7DVtSk+yH52epUXSSck8hF/Q=;
 b=NyylD+QJ5+rL2DHyB/5O2pHzO2x59cQq3i1DPXtpGwv2hRAOGYVYxCYYeU6Jm33Ou+
 A/g/WE8wxiHQG0MY5p47p3OJjpwOD38WdJVXw/4DGi45Iuqc1IQdn4vzDrVSpduMS5aa
 AaJOD3jAdrNAXOHkQ6mI+xvb6ADm0x0XP63htXu6Cedyvn9Idsinra4iOiR5NXXVsFWc
 ejHaeRzL2n1+4SaPXpVXNXWcJ6LV/FfSV+TjlrixNBV3jAtv76cd413emlghM86Fjco9
 Jvue5cUsaZIC81MRJetJataofHt3I3paf9Y3TGmC8LNWDS5kUC+qmXWaivdqI007Rpqu
 1DVQ==
X-Gm-Message-State: AEkoouvvXlt9cWRu8mz2IiFm7Q6L7E5/jX9rR41WSMY1eLOk4vZlSiV7v4WVYwqGghp7UA==
X-Received: by 10.36.149.193 with SMTP id m184mr815974itd.94.1471540433123;
 Thu, 18 Aug 2016 10:13:53 -0700 (PDT)
Received: from [192.168.0.19] (67-4-156-204.mpls.qwest.net. [67.4.156.204])
 by smtp.googlemail.com with ESMTPSA id f126sm235063ith.7.2016.08.18.10.13.52
 for <freebsd-fs@freebsd.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 18 Aug 2016 10:13:52 -0700 (PDT)
Reply-To: linda@kateley.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
 <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com>
 <B668A4BF-FC28-4210-809A-38D23C214A3B@gmail.com>
 <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com>
To: freebsd-fs@freebsd.org
From: Linda Kateley <lkateley@kateley.com>
Organization: Kateley Company
Message-ID: <7828fdbc-3a5b-3998-ac54-a896cf02927f@kateley.com>
Date: Thu, 18 Aug 2016 12:13:51 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0)
 Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 17:13:54 -0000

Cool, thanks

linda


On 8/18/16 3:02 AM, InterNetX - Juergen Gotteswinter wrote:
> new day, new things learned :)
>
> thanks!
>
> but like said, zrep does its on locking in zfs properties. so even this
> is fine
>
> 	while true; do zrep sync all; done
>
>
> see
>
> http://www.bolthole.com/solaris/zrep/
>
> the properties look like this
>
> tank/vmail  redundant_metadata    all                    default
> tank/vmail  zrep:savecount        5                      local
> tank/vmail  zrep:lock-time        20160620101703         local
> tank/vmail  zrep:master           yes                    local
> tank/vmail  zrep:src-fs           tank/vmail             local
> tank/vmail  zrep:dest-host        stor1                local
> tank/vmail  zrep:src-host         stor2                local
> tank/vmail  zrep:dest-fs          tank/vmail             local
> tank/vmail  zrep:lock-pid         10887                  local
>
>
> it also takes care of the replication partner, the replicated datasets
> are read only until you tell zrep "go go go, become master"
>
> Simple usage summary:
> zrep (init|-i) ZFS/fs remotehost remoteZFSpool/fs
> zrep (sync|-S) [-q seconds] ZFS/fs
> zrep (sync|-S) [-q seconds] all
> zrep (sync|-S) ZFS/fs@snapshot    -- temporary retroactive sync
> zrep (status|-s) [-v] [(-a|ZFS/fs)]
> zrep refresh ZFS/fs               -- pull version of sync
> zrep (list|-l) [-Lv]
> zrep (expire|-e) [-L] (ZFS/fs ...)|(all)|()
> zrep (changeconfig|-C) [-f] ZFS/fs remotehost remoteZFSpool/fs
> zrep (changeconfig|-C) [-f] [-d] ZFS/fs srchost srcZFSpool/fs
> zrep failover [-L] ZFS/fs
> zrep takeover [-L] ZFS/fs
>
>
> zrep failover pool/ds -> master sets pool read only, connects to slave,
> sets pool on slave rw
>
> should be easy to combine with carp/devd, but this is the land of vodoo
> automagic again which i dont trust that much.
>
>
> Am 18.08.2016 um 09:40 schrieb Ben RUBSON:
>> Yep this is better :
>>
>> if mkdir <lockdir>
>> then
>> 	do_your_job
>> 	rm -rf <lockdir>
>> fi
>>
>>
>>
>>> On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> wrote:
>>>
>>> uhm, dont really investigated if it is or not. add a "sync" after that?
>>> or replace it?
>>>
>>> but anyway, thanks for the hint. will dig into this!
>>>
>>> Am 18.08.2016 um 09:36 schrieb krad:
>>>> I didnt think touch was atomic, mkdir is though
>>>>
>>>> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter
>>>> <juergen.gotteswinter@internetx.com
>>>> <mailto:juergen.gotteswinter@internetx.com>> wrote:
>>>>
>>>>
>>>>
>>>>     Am 17.08.2016 um 20:03 schrieb Linda Kateley:
>>>>> I just do consulting so I don't always get to see the end of the
>>>>> project. Although we are starting to do more ongoing support so we can
>>>>> see the progress..
>>>>>
>>>>> I have worked with some of the guys from high-availability.com <http://high-availability.com> for maybe
>>>>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work
>>>>> beautifully with omni/illumos. The one customer I have running it in
>>>>> prod is an isp in south america running openstack and zfs on freebsd as
>>>>> iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i
>>>>> have some contacts there. Ping me offlist.
>>>>     no offense, but it sounds a bit like marketing.
>>>>
>>>>     here: running nexenta ha setup since several years with one catastrophic
>>>>     failure due to split brain
>>>>
>>>>> You do risk losing data if you batch zfs send. It is very hard to run
>>>>> that real time.
>>>>     depends on how much data changes aka delta size
>>>>
>>>>
>>>>     You have to take the snap then send the snap. Most
>>>>> people run in cron, even if it's not in cron, you would want one to
>>>>> finish before you started the next.
>>>>     thats the reason why lock files where invented, tools like zrep handle
>>>>     that themself via additional zfs properties
>>>>
>>>>     or, if one does not trust a single layer
>>>>
>>>>     -- snip --
>>>>     #!/bin/sh
>>>>     if [ ! -f /var/run/replic ] ; then
>>>>             touch /var/run/replic
>>>>             /blah/path/zrep sync all >> /var/log/zfsrepli.log
>>>>             rm -f /var/run/replic
>>>>     fi
>>>>     -- snip --
>>>>
>>>>     something like this, simple
>>>>
>>>>      If you lose the sending host before
>>>>> the receive is complete you won't have a full copy.
>>>>     if rsf fails, and you end up in split brain you loose way more. been
>>>>     there, seen that.
>>>>
>>>>     With zfs though you
>>>>> will probably still have the data on the sending host, however long it
>>>>> takes to bring it back up. RSF-1 runs in the zfs stack and send the
>>>>> writes to the second system. It's kind of pricey, but actually much less
>>>>> expensive than commercial alternatives.
>>>>>
>>>>> Anytime you run anything sync it adds latency but makes things safer..
>>>>     not surprising, it all depends on the usecase
>>>>
>>>>> There is also a cool tool I like, called zerto for vmware that sits in
>>>>> the hypervisor and sends a sync copy of a write locally and then an
>>>>> async remotely. It's pretty cool. Although I haven't run it myself, have
>>>>> a bunch of customers running it. I believe it works with proxmox too.
>>>>>
>>>>> Most people I run into (these days) don't mind losing 5 or even 30
>>>>> minutes of data. Small shops.
>>>>     you talk about minutes, what delta size are we talking here about? why
>>>>     not using zrep in a loop for example
>>>>
>>>>      They usually have a copy somewhere else.
>>>>> Or the cost of 5-30 minutes isn't that great. I used work as a
>>>>> datacenter architect for sun/oracle with only fortune 500. There losing
>>>>> 1 sec could put large companies out of business. I worked with banks and
>>>>> exchanges.
>>>>     again, usecase. i bet 99% on this list are not operating fortune 500
>>>>     bank filers
>>>>
>>>>     They couldn't ever lose a single transaction. Most people
>>>>> nowadays do the replication/availability in the application though and
>>>>> don't care about underlying hardware, especially disk.
>>>>>
>>>>>
>>>>> On 8/17/16 11:55 AM, Chris Watson wrote:
>>>>>> Of course, if you are willing to accept some amount of data loss that
>>>>>> opens up a lot more options. :)
>>>>>>
>>>>>> Some may find that acceptable though. Like turning off fsync with
>>>>>> PostgreSQL to get much higher throughput. As little no as you are
>>>>     made
>>>>>> *very* aware of the risks.
>>>>>>
>>>>>> It's good to have input in this thread from one with more experience
>>>>>> with RSF-1 than the rest of us. You confirm what others have that
>>>>     said
>>>>>> about RSF-1, that it's stable and works well. What were you deploying
>>>>>> it on?
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> Sent from my iPhone 5
>>>>>>
>>>>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
>>>>     <mailto:lkateley@kateley.com>
>>>>>> <mailto:lkateley@kateley.com <mailto:lkateley@kateley.com>>> wrote:
>>>>>>
>>>>>>> The question I always ask, as an architect, is "can you lose 1
>>>>     minute
>>>>>>> worth of data?" If you can, then batched replication is perfect. If
>>>>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has
>>>>>>> worked extremely well. If i remember right, it works at the dmu. I
>>>>>>> would suggest try it. They have been trying to have a full freebsd
>>>>>>> solution, I have several customers running it well.
>>>>>>>
>>>>>>> linda
>>>>>>>
>>>>>>>
>>>>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
>>>>>>>> Gotteswinter wrote:
>>>>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
>>>>>>>>>> Gotteswinter wrote:
>>>>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city
>>>>>>>>>>>>> <mailto:julien@perdition.city
>>>>     <mailto:julien@perdition.city>>> wrote:
>>>>>>>>>>>>> As I said in a previous post I tested the zfs send/receive
>>>>>>>>>>>>> approach (with
>>>>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
>>>>>>>>>>>>> all what you
>>>>>>>>>>>>> said, especially about off-site replicate and synchronous
>>>>>>>>>>>>> replication.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
>>>>>>>>>>>>> moment,
>>>>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but
>>>>>>>>>>>>> ATM it
>>>>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>>>>>> I must be too old school, but I don’t quite like the idea of
>>>>>>>>>>>> using an essentially unreliable transport
>>>>>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>>>>>
>>>>>>>>>>>> In case something went wrong, that approach could risk
>>>>>>>>>>>> corrupting a pool. Although, frankly,
>>>>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA
>>>>>>>>>>>> problem that caused some
>>>>>>>>>>>> silent corruption.
>>>>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
>>>>>>>>>>> hooked up
>>>>>>>>>>> to the same disk chassis.
>>>>>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>>>>>
>>>>>>>>>> I'm still busy to test the whole setup here, including the
>>>>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can
>>>>     prevent
>>>>>>>>>> that thanks to:
>>>>>>>>>>
>>>>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>>>>>> and you can't import the pool (even with -f) for ex (filer2
>>>>     is the
>>>>>>>>>> BACKUP):
>>>>>>>>>>
>>>>     https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>     <https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f>
>>>>>>>>>> - The shared pool should not be mounted at boot, and you should
>>>>>>>>>> ensure
>>>>>>>>>> that the failover script is not executed during boot time too:
>>>>>>>>>> this is
>>>>>>>>>> to handle the case wherein both machines turn off and/or
>>>>     re-ignite at
>>>>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status
>>>>>>>>>> if both
>>>>>>>>>> machines are powered on at the same time, for ex:
>>>>>>>>>>
>>>>     https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf
>>>>     <https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf> and
>>>>>>>>>> you will have a split-brain scenario
>>>>>>>>>>
>>>>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>>>>>>> happen, this can be handled with a trigger file or something like
>>>>>>>>>> that
>>>>>>>>>>
>>>>>>>>>> - I've still have to check if the order is OK, but I think
>>>>     that as
>>>>>>>>>> long
>>>>>>>>>> as you shutdown the replication interface and that you adapt the
>>>>>>>>>> advskew (including the config file) of the CARP interface
>>>>     before the
>>>>>>>>>> zpool import -f in the failover script you can be relatively
>>>>>>>>>> confident
>>>>>>>>>> that nothing will be written on the iSCSI targets
>>>>>>>>>>
>>>>>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>>>>>
>>>>>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>>>>>>
>>>>     https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>     <https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7>
>>>>>>>>>> Julien
>>>>>>>>>>
>>>>>>>>> 100€ question without detailed looking at that script. yes from a
>>>>>>>>> first
>>>>>>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>>>>>>> powerful / featurerich. Theres a reason for, which is that
>>>>     they try to
>>>>>>>>> cover every possible situation (which makes more than sense
>>>>     for this).
>>>>>>>> I've never used "rsf-1" so I can't say much more about it, but
>>>>     I have
>>>>>>>> no doubts about it's ability to handle "complex situations", where
>>>>>>>> multiple nodes / networks are involved.
>>>>>>>>
>>>>>>>>> That script works for sure, within very limited cases imho
>>>>>>>>>
>>>>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen
>>>>>>>>>>> sooner
>>>>>>>>>>> or later especially when it comes to homegrown automatism
>>>>     solutions.
>>>>>>>>>>> even the commercial parts where much more time/work goes
>>>>     into such
>>>>>>>>>>> solutions fail in a regular manner
>>>>>>>>>>>
>>>>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that
>>>>>>>>>>>> you can consider it
>>>>>>>>>>>> essentially atomic. A transport corruption should not cause
>>>>>>>>>>>> trouble (apart from a failed
>>>>>>>>>>>> "zfs receive") and with snapshot retention you can even roll
>>>>>>>>>>>> back. You can’t roll back
>>>>>>>>>>>> zpool replications :)
>>>>>>>>>>>>
>>>>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as
>>>>     your
>>>>>>>>>>>> zfs receive doesn’t involve a rollback
>>>>>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake.
>>>>>>>>>>>> Just make sure that your replica datasets
>>>>>>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Borja.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>>>     <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>>>     mailing list
>>>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>>>     <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>>>     mailing list
>>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>
>>>>     <mailto:freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org>>
>>>>     mailing list
>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>>>> To unsubscribe, send any mail to
>>>>     "freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>
>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>>"
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>> To unsubscribe, send any mail to
>>>>     "freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>     _______________________________________________
>>>>     freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>     https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>     <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>>>>     To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
>>>>     <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>
>>>>
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Thu Aug 18 17:19:58 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 250A3BBE2E0
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 17:19:58 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: from mail-it0-x230.google.com (mail-it0-x230.google.com
 [IPv6:2607:f8b0:4001:c0b::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D8972151E
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 17:19:57 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: by mail-it0-x230.google.com with SMTP id n128so3718656ith.1
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 10:19:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=kateley-com.20150623.gappssmtp.com; s=20150623;
 h=reply-to:subject:references:to:cc:from:organization:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=z9zBx+sDu9j56VaWeHA3k8btgArLPWm7s0u78FtsLTk=;
 b=wwOfOQ+Je+EBCGBiONjjN/n/vGJ9f23Mmby9Mj4xGq40zk8ZCpovHA64l8iuRPGtYF
 X1pzM/m2uPlQ9p0R/2Krbk7RzCFWUedW+PgSB5q7kCHITNAYuQn0mj9lhPWYF0/jXF/2
 zVSyLGNU0Gk24ejgCNfXfuUJr4EGx8vIg1DuY8eR9Jiq6te4570f5Ct8N8sFinDQkIR1
 DNuXAso0pLBqTQy/2L67nfj0RJqEDQccTbikuOiPZfAGaU2V1t0aEmeAPXz+X+PJSy+q
 kPfewxOgdqLOTVnsnh8RwDAVWB9kFFnYNW5n4SqWQYPp7fBrk8jWmhaXJOQ8+hcJ4IH3
 LQGw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:reply-to:subject:references:to:cc:from
 :organization:message-id:date:user-agent:mime-version:in-reply-to
 :content-transfer-encoding;
 bh=z9zBx+sDu9j56VaWeHA3k8btgArLPWm7s0u78FtsLTk=;
 b=TV23DtAsB70BnEFcayaORt3AVzX2SlJJe/HYSVMB7v3IDO6ZbzGjOOLmzw3T1B25Wq
 G8b+lJUnZbD3jRlck9wxXcnAVtrzgsNJZJWaSCl8Ldn1dVGe3EYh1JBOEpf/h9kIdEWw
 OvmN7fEyGjcky5kXXEoU0SW/Hhb2EtiJ93f6GIcBAk4AwY8QrfGpTaGF/L6DkbIPHEv1
 rwgFlE1voYwdgJuxc/v+jNxACbYj590aEUxax1AKRhVIILjmBO/DwzxD+KsE17XG1SkD
 x0ro9q58EvbRuVAm8yPWMx4YoO3y72bkqafdQc0/VXEoObw0b4+f2ld0lCtP9awJWCVn
 F60g==
X-Gm-Message-State: AEkoouurVW2JMRY0LaE+rKnCICgc9cIoWSUnFvPtYQLHjboDFtIyWOXLot9gQCxkbT2lbQ==
X-Received: by 10.36.33.197 with SMTP id e188mr869525ita.42.1471540797019;
 Thu, 18 Aug 2016 10:19:57 -0700 (PDT)
Received: from [192.168.0.19] (67-4-156-204.mpls.qwest.net. [67.4.156.204])
 by smtp.googlemail.com with ESMTPSA id z128sm1528555iof.4.2016.08.18.10.19.56
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 18 Aug 2016 10:19:56 -0700 (PDT)
Reply-To: linda@kateley.com
Subject: Re: HAST + ZFS + NFS + CARP
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
To: juergen.gotteswinter@internetx.com, linda@kateley.com,
 Chris Watson <bsdunix44@gmail.com>
Cc: freebsd-fs@freebsd.org
From: Linda Kateley <lkateley@kateley.com>
Organization: Kateley Company
Message-ID: <4c34cbf9-84b5-5d42-e0b4-bf18aa1ef9a7@kateley.com>
Date: Thu, 18 Aug 2016 12:19:55 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0)
 Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 17:19:58 -0000


On 8/18/16 2:32 AM, InterNetX - Juergen Gotteswinter wrote:
>
> Am 17.08.2016 um 20:03 schrieb Linda Kateley:
>> I just do consulting so I don't always get to see the end of the
>> project. Although we are starting to do more ongoing support so we can
>> see the progress..
>>
>> I have worked with some of the guys from high-availability.com for maybe
>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work
>> beautifully with omni/illumos. The one customer I have running it in
>> prod is an isp in south america running openstack and zfs on freebsd as
>> iscsi. Big boxes, 90+ drives per frame.  If someone would like try it, i
>> have some contacts there. Ping me offlist.
> no offense, but it sounds a bit like marketing.
>
> here: running nexenta ha setup since several years with one catastrophic
> failure due to split brain
Just trying to say I don't see projects ongoing.. just at beginning
>
>> You do risk losing data if you batch zfs send. It is very hard to run
>> that real time.
> depends on how much data changes aka delta size
>
>
> You have to take the snap then send the snap. Most
>> people run in cron, even if it's not in cron, you would want one to
>> finish before you started the next.
> thats the reason why lock files where invented, tools like zrep handle
> that themself via additional zfs properties
>
> or, if one does not trust a single layer
>
> -- snip --
> #!/bin/sh
> if [ ! -f /var/run/replic ] ; then
>          touch /var/run/replic
>          /blah/path/zrep sync all >> /var/log/zfsrepli.log
>          rm -f /var/run/replic
> fi
> -- snip --
>
> something like this, simple
>
>   If you lose the sending host before
>> the receive is complete you won't have a full copy.
> if rsf fails, and you end up in split brain you loose way more. been
> there, seen that.
>
> With zfs though you
>> will probably still have the data on the sending host, however long it
>> takes to bring it back up. RSF-1 runs in the zfs stack and send the
>> writes to the second system. It's kind of pricey, but actually much less
>> expensive than commercial alternatives.
>>
>> Anytime you run anything sync it adds latency but makes things safer..
> not surprising, it all depends on the usecase
>
>> There is also a cool tool I like, called zerto for vmware that sits in
>> the hypervisor and sends a sync copy of a write locally and then an
>> async remotely. It's pretty cool. Although I haven't run it myself, have
>> a bunch of customers running it. I believe it works with proxmox too.
>>
>> Most people I run into (these days) don't mind losing 5 or even 30
>> minutes of data. Small shops.
> you talk about minutes, what delta size are we talking here about? why
> not using zrep in a loop for example
>
>   They usually have a copy somewhere else.
>> Or the cost of 5-30 minutes isn't that great. I used work as a
>> datacenter architect for sun/oracle with only fortune 500. There losing
>> 1 sec could put large companies out of business. I worked with banks and
>> exchanges.
> again, usecase. i bet 99% on this list are not operating fortune 500
> bank filers
>
> They couldn't ever lose a single transaction. Most people
>> nowadays do the replication/availability in the application though and
>> don't care about underlying hardware, especially disk.
>>
>>
>> On 8/17/16 11:55 AM, Chris Watson wrote:
>>> Of course, if you are willing to accept some amount of data loss that
>>> opens up a lot more options. :)
>>>
>>> Some may find that acceptable though. Like turning off fsync with
>>> PostgreSQL to get much higher throughput. As little no as you are made
>>> *very* aware of the risks.
>>>
>>> It's good to have input in this thread from one with more experience
>>> with RSF-1 than the rest of us. You confirm what others have that said
>>> about RSF-1, that it's stable and works well. What were you deploying
>>> it on?
>>>
>>> Chris
>>>
>>> Sent from my iPhone 5
>>>
>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley <lkateley@kateley.com
>>> <mailto:lkateley@kateley.com>> wrote:
>>>
>>>> The question I always ask, as an architect, is "can you lose 1 minute
>>>> worth of data?" If you can, then batched replication is perfect. If
>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has
>>>> worked extremely well. If i remember right, it works at the dmu. I
>>>> would suggest try it. They have been trying to have a full freebsd
>>>> solution, I have several customers running it well.
>>>>
>>>> linda
>>>>
>>>>
>>>> On 8/17/16 4:52 AM, Julien Cigar wrote:
>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen
>>>>> Gotteswinter wrote:
>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar:
>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen
>>>>>>> Gotteswinter wrote:
>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos:
>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar <julien@perdition.city
>>>>>>>>>> <mailto:julien@perdition.city>> wrote:
>>>>>>>>>>
>>>>>>>>>> As I said in a previous post I tested the zfs send/receive
>>>>>>>>>> approach (with
>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in
>>>>>>>>>> all what you
>>>>>>>>>> said, especially about off-site replicate and synchronous
>>>>>>>>>> replication.
>>>>>>>>>>
>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the
>>>>>>>>>> moment,
>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but
>>>>>>>>>> ATM it
>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool.
>>>>>>>>> I must be too old school, but I don’t quite like the idea of
>>>>>>>>> using an essentially unreliable transport
>>>>>>>>> (Ethernet) for low-level filesystem operations.
>>>>>>>>>
>>>>>>>>> In case something went wrong, that approach could risk
>>>>>>>>> corrupting a pool. Although, frankly,
>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA
>>>>>>>>> problem that caused some
>>>>>>>>> silent corruption.
>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines
>>>>>>>> hooked up
>>>>>>>> to the same disk chassis.
>>>>>>> Yes this is the first thing on the list to avoid .. :)
>>>>>>>
>>>>>>> I'm still busy to test the whole setup here, including the
>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent
>>>>>>> that thanks to:
>>>>>>>
>>>>>>> - As long as ctld is running on the BACKUP the disks are locked
>>>>>>> and you can't import the pool (even with -f) for ex (filer2 is the
>>>>>>> BACKUP):
>>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f
>>>>>>>
>>>>>>> - The shared pool should not be mounted at boot, and you should
>>>>>>> ensure
>>>>>>> that the failover script is not executed during boot time too:
>>>>>>> this is
>>>>>>> to handle the case wherein both machines turn off and/or re-ignite at
>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status
>>>>>>> if both
>>>>>>> machines are powered on at the same time, for ex:
>>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and
>>>>>>> you will have a split-brain scenario
>>>>>>>
>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons
>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not
>>>>>>> happen, this can be handled with a trigger file or something like
>>>>>>> that
>>>>>>>
>>>>>>> - I've still have to check if the order is OK, but I think that as
>>>>>>> long
>>>>>>> as you shutdown the replication interface and that you adapt the
>>>>>>> advskew (including the config file) of the CARP interface before the
>>>>>>> zpool import -f in the failover script you can be relatively
>>>>>>> confident
>>>>>>> that nothing will be written on the iSCSI targets
>>>>>>>
>>>>>>> - A zpool scrub should be run at regular intervals
>>>>>>>
>>>>>>> This is my MASTER -> BACKUP CARP script ATM
>>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7
>>>>>>>
>>>>>>> Julien
>>>>>>>
>>>>>> 100€ question without detailed looking at that script. yes from a
>>>>>> first
>>>>>> view its super simple, but: why are solutions like rsf-1 such more
>>>>>> powerful / featurerich. Theres a reason for, which is that they try to
>>>>>> cover every possible situation (which makes more than sense for this).
>>>>> I've never used "rsf-1" so I can't say much more about it, but I have
>>>>> no doubts about it's ability to handle "complex situations", where
>>>>> multiple nodes / networks are involved.
>>>>>
>>>>>> That script works for sure, within very limited cases imho
>>>>>>
>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen
>>>>>>>> sooner
>>>>>>>> or later especially when it comes to homegrown automatism solutions.
>>>>>>>> even the commercial parts where much more time/work goes into such
>>>>>>>> solutions fail in a regular manner
>>>>>>>>
>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that
>>>>>>>>> you can consider it
>>>>>>>>> essentially atomic. A transport corruption should not cause
>>>>>>>>> trouble (apart from a failed
>>>>>>>>> "zfs receive") and with snapshot retention you can even roll
>>>>>>>>> back. You can’t roll back
>>>>>>>>> zpool replications :)
>>>>>>>>>
>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your
>>>>>>>>> zfs receive doesn’t involve a rollback
>>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake.
>>>>>>>>> Just make sure that your replica datasets
>>>>>>>>> aren’t mounted and zfs receive won’t complain.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Borja.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>>> To unsubscribe, send any mail to
>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to
>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org
>>>>>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
>>>> <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Thu Aug 18 20:01:33 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1E90BBE785
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 20:01:33 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id DD1D11845;
 Thu, 18 Aug 2016 20:01:32 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA16268;
 Thu, 18 Aug 2016 23:01:24 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1baTVE-000JoD-HQ; Thu, 18 Aug 2016 23:01:24 +0300
Subject: Re: ZFS ARC under memory pressure
To: Slawa Olhovchenkov <slw@zxy.spb.ru>, freebsd-fs@FreeBSD.org,
 Alexander Motin <mav@FreeBSD.org>
References: <20160816193416.GM8192@zxy.spb.ru>
From: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
Date: Thu, 18 Aug 2016 23:00:28 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160816193416.GM8192@zxy.spb.ru>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 20:01:33 -0000

On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
> I see issuses with ZFS ARC inder memory pressure.
> ZFS ARC size can be dramaticaly reduced, up to arc_min.
> 
> As I see memory pressure event cause call arc_lowmem and set needfree:
> 
> arc.c:arc_lowmem
> 
>         needfree = btoc(arc_c >> arc_shrink_shift);
> 
> After this, arc_available_memory return negative vaules (PAGESIZE *
> (-needfree)) until needfree is zero. Independent how too much memory
> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every
> loop interation).
> 
> arc_c droped to minimum value if arc_size fast enough droped.
> 
> No control current to initial memory allocation.
> 
> As result, I can see needless arc reclaim, from 10x to 100x times.
> 
> Can some one check me and comment this?

You might have found a real problem here, but I am short of time right now to
properly analyze the issue.  I think that on illumos 'needfree' is a variable
that's managed by the virtual memory system and it is akin to our
vm_pageout_deficit.  But during the porting it became an artificial value and
its handling might be sub-optimal.

-- 
Andriy Gapon

From owner-freebsd-fs@freebsd.org  Thu Aug 18 20:27:00 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F5ECBBEE34
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 20:27:00 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 214F21730;
 Thu, 18 Aug 2016 20:27:00 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD))
 (envelope-from <slw@zxy.spb.ru>)
 id 1baTty-000JwN-0I; Thu, 18 Aug 2016 23:26:58 +0300
Date: Thu, 18 Aug 2016 23:26:57 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: freebsd-fs@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: ZFS ARC under memory pressure
Message-ID: <20160818202657.GS8192@zxy.spb.ru>
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 20:27:00 -0000

On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote:

> On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
> > I see issuses with ZFS ARC inder memory pressure.
> > ZFS ARC size can be dramaticaly reduced, up to arc_min.
> > 
> > As I see memory pressure event cause call arc_lowmem and set needfree:
> > 
> > arc.c:arc_lowmem
> > 
> >         needfree = btoc(arc_c >> arc_shrink_shift);
> > 
> > After this, arc_available_memory return negative vaules (PAGESIZE *
> > (-needfree)) until needfree is zero. Independent how too much memory
> > freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
> > arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every
> > loop interation).
> > 
> > arc_c droped to minimum value if arc_size fast enough droped.
> > 
> > No control current to initial memory allocation.
> > 
> > As result, I can see needless arc reclaim, from 10x to 100x times.
> > 
> > Can some one check me and comment this?
> 
> You might have found a real problem here, but I am short of time right now to
> properly analyze the issue.  I think that on illumos 'needfree' is a variable
> that's managed by the virtual memory system and it is akin to our
> vm_pageout_deficit.  But during the porting it became an artificial value and
> its handling might be sub-optimal.

As I see, totaly not optimal.
I am create some patch for sub-optimal handling and now test it.

From owner-freebsd-fs@freebsd.org  Thu Aug 18 20:31:38 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BC001BBEEE8
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 18 Aug 2016 20:31:38 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (denninger.net [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8EEB019F9
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 20:31:38 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 713AA20855C
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 15:31:30 -0500 (CDT)
Subject: Re: ZFS ARC under memory pressure
To: freebsd-fs@freebsd.org
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
 <20160818202657.GS8192@zxy.spb.ru>
From: Karl Denninger <karl@denninger.net>
Message-ID: <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
Date: Thu, 18 Aug 2016 15:31:26 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160818202657.GS8192@zxy.spb.ru>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms020806060003040608060704"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 20:31:38 -0000

This is a cryptographically signed message in MIME format.

--------------ms020806060003040608060704
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 8/18/2016 15:26, Slawa Olhovchenkov wrote:
> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote:
>
>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
>>> I see issuses with ZFS ARC inder memory pressure.
>>> ZFS ARC size can be dramaticaly reduced, up to arc_min.
>>>
>>> As I see memory pressure event cause call arc_lowmem and set needfree=
:
>>>
>>> arc.c:arc_lowmem
>>>
>>>         needfree =3D btoc(arc_c >> arc_shrink_shift);
>>>
>>> After this, arc_available_memory return negative vaules (PAGESIZE *
>>> (-needfree)) until needfree is zero. Independent how too much memory
>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=3D
>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every=

>>> loop interation).
>>>
>>> arc_c droped to minimum value if arc_size fast enough droped.
>>>
>>> No control current to initial memory allocation.
>>>
>>> As result, I can see needless arc reclaim, from 10x to 100x times.
>>>
>>> Can some one check me and comment this?
>> You might have found a real problem here, but I am short of time right=
 now to
>> properly analyze the issue.  I think that on illumos 'needfree' is a v=
ariable
>> that's managed by the virtual memory system and it is akin to our
>> vm_pageout_deficit.  But during the porting it became an artificial va=
lue and
>> its handling might be sub-optimal.
> As I see, totaly not optimal.
> I am create some patch for sub-optimal handling and now test it.
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

You might want to look at the code contained in here:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594

There are some ugly interactions with the VM system you can run into if
you're not careful; I've chased this issue before and while I haven't
yet done the work to integrate it into 11.x (and the underlying code
*has* changed since the 10.x patches I developed) if you wind up driving
the VM system to evict pages to swap rather than pare back ARC you're
probably making the wrong choice.

In addition UMA can come into the picture too and (at least previously)
was a severe contributor to pathological behavior.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms020806060003040608060704
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTgyMDMxMjZaME8GCSqGSIb3DQEJBDFCBEAv
nKKhxLndyPUzBHmLmYJfFxZ6B48eRagCuHF7H9YjfaCFQq0ZxSIvaajnjktqNcWyqQbfWIvd
fEewlEUa35snMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIALszJzrwg
0ll1oeKkSiMgZsap90O6BhYKSp2DflnUELzShod/ya7uo1yqQmavyfSHk+Wdm0soqRUfbdcs
pfvTInNpmKIhShrURrLCFNIQyXc5x+gvrtrsXZbTleUYAIZqU38b6ipCshOIBcDXtzWKT7yn
zojAu8wi01V3rcB9L/NdcPHNDtemEq0n5mbCGuukGNnKUzLKOfBIT5OaCKFJIq61Sh60uz+v
TPKudH3l7+04fiS3WBSN8z3LP9J3MjRckUszvH+abFi5X4d9NKn1gfDXiGZ0ketn68x3fKu+
f2LMCXNIXbIGWL6iSNiavPR7h5Bx172LsN/PJMtcCU7rT0Ctd9I41lOCCayEtZ24AewBL3uU
kMzv/24LMEqpgXqQ6N0Uw9qC0R4KGL4LijO06uop0ZBZcOUECTwEdj6UYPPRLkGHYdIOzqla
zYD13MTpdou58qn5KANHq8VS5iJ4A2axET7Niddj4Eu4K72FLS2DjIrrH+1KXtEcPgUxKm8X
1yGhALgdgvIU8Pbs/+YccPmCPX5KUDBOkrW/wcHPytNYKBFWnLlM3cvmSOLnTSVjeNnbpyKQ
jxCvGpV6UfQBNBmBvHw6mh/lpnmwt55Djs2xA8oEnZa37Vg86RpXuzy9IZI0s6Rfz6B7ODuO
hvplmtx7ZtTHCMe+I4HjXQMVxnsAAAAAAAA=
--------------ms020806060003040608060704--

From owner-freebsd-fs@freebsd.org  Thu Aug 18 22:58:25 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F854BBEF15;
 Thu, 18 Aug 2016 22:58:25 +0000 (UTC) (envelope-from jhs@berklix.com)
Received: from land.berklix.org (land.berklix.org [144.76.10.75])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 34C6D1182;
 Thu, 18 Aug 2016 22:58:24 +0000 (UTC) (envelope-from jhs@berklix.com)
Received: from mart.js.berklix.net (p5083CC3A.dip0.t-ipconnect.de
 [80.131.204.58]) (authenticated bits=128)
 by land.berklix.org (8.15.2/8.15.2) with ESMTPA id u7IMwFPs076625;
 Thu, 18 Aug 2016 22:58:15 GMT (envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41])
 by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id u7IMwAbf003005;
 Fri, 19 Aug 2016 00:58:10 +0200 (CEST)
 (envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (localhost [127.0.0.1])
 by fire.js.berklix.net (8.14.7/8.14.7) with ESMTP id u7IMvpT5090433;
 Fri, 19 Aug 2016 00:58:09 +0200 (CEST)
 (envelope-from jhs@berklix.com)
Message-Id: <201608182258.u7IMvpT5090433@fire.js.berklix.net>
To: "Jukka A. Ukkonen" <jau789@gmail.com>
cc: freebsd-advocacy@freebsd.org, freebsd-fs@freebsd.org
Subject: Re: A how-to guide which you might wish to use for freebsd advocacy
From: "Julian H. Stacey" <jhs@berklix.com>
Organization: http://berklix.eu BSD Unix Linux Consultants, Munich Germany
User-agent: EXMH on FreeBSD http://berklix.eu/free/
X-URL: http://www.berklix.eu/~jhs/
In-reply-to: Your message "Thu, 18 Aug 2016 12:40:18 +0300."
 <71a9ed60-90c1-9df3-4da0-cafd23e48fc0@gmail.com>
Date: Fri, 19 Aug 2016 00:57:51 +0200
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 22:58:25 -0000

"Jukka A. Ukkonen" wrote freebsd-advocacy@freebsd.org:
> 
> https://www.facebook.com/notes/jukka-ukkonen/upgrading-the-storage-disk-to-finnsat-fh05-hdr-digital-tv-receiver-while-retaini/10208639116987804
> 
> Feel free to publish the link on the freebsd web site or otherwise
> distribute it further.

I added cc: freebsd-fs@freebsd.org as it's about file systems & Ext2 & offsets.

(BTW I have no facebook login, so can assure readers Jukka's
page is public, no fbook login needed to access URL)

An extract re FS.

] FreeBSD will by default not accept the Finnsat generated partition
] for mounting. The trick is that Finnsat creates partitions with
] slack alignment. FreeBSD knows that a live ext2fs has to be a
] multiple of 4kB, 4096 bytes in size, i.e. 8 disk blocks, 8*512
] bytes. If the partition size is not perfectly aligned, FreeBSD does
] not allow read-write mount to an ext2fs instance. With all likelihood
] it might be a broken file system. Why should FreeBSD help making
] things worse?
] 
] So, you will have to adjust the partition size such that its length
] will be a multiple of 8 disk blocks. The tool for this is gpart
] (geom partition) which both modifies the partition tables and shows
] their contents. First use the command

Thanks Jukka,

I've bcc'd a friend who I discussed Humax TV recorders with a while
back, on similar issues, some time when I'm visiting the town where
my 3 Humax owner friends are, I hope to find time to experiment
with my USB to SATA converter.

This thread is archived here:
http://lists.freebsd.org/pipermail/freebsd-advocacy/2016-August/004619.html
& under here:
http://lists.freebsd.org/pipermail/freebsd-fs/2016-August/date.html

Cheers,
Julian
--
Julian Stacey, BSD Linux Unix Sys Eng Consultant Munich
 Reply below, Prefix '> '. Plain text, No .doc, base64, HTML, quoted-printable.
 http://berklix.eu/brexit/#stolen_votes

From owner-freebsd-fs@freebsd.org  Fri Aug 19 04:21:36 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8578ABBD414
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 19 Aug 2016 04:21:36 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: from mail-ua0-x230.google.com (mail-ua0-x230.google.com
 [IPv6:2607:f8b0:400c:c08::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3E7F61576
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 04:21:36 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: by mail-ua0-x230.google.com with SMTP id n59so61632969uan.2
 for <freebsd-fs@freebsd.org>; Thu, 18 Aug 2016 21:21:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=bsdimp-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=Kc4hqKEb/ysBW+KD38nOPRAs96e8ZeSESdrxoANTxf4=;
 b=WbK3RpfMunCBQ7j/vQSfmkgJ0+0NalqpvUquTJWpMfcDNjcAlM15rSCBEsFks8q5j8
 P5+AhkVbWpFUT52WVrzB+OOEL2IiqnZwi7Tv9w4iLJpPpUAYyao7P3sto/NML7N6tR/I
 1Fv3dwYDlehKaGXZMoKuQe8ZQ49Q6uibqtVHDsb3chmyAiIkCKLVtkYgh97aK2pF1WDs
 po4NFcMdEuO2XFqp7/I2dthtxdOJc2X/NBytQHgx+inplew9v8pyY1J4XzyrBTo/cjZs
 x1iehLwSisWv8bMoxtAZlIrAC4n+Tuv0T3vtSRjLTtIhGmgJfitM43yJglN/mzc7W3zX
 xZlg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=Kc4hqKEb/ysBW+KD38nOPRAs96e8ZeSESdrxoANTxf4=;
 b=E6sFiXVK1ikg9ed1HJBJFcLfjrir3Hl+IPSeR9m9eyUhtHfcMa053Movsy+umPUIJA
 NG+ImoSpkYbDxMQImZQJxtGWv7Bmra88pizqGs7R1HY5iowmYz2JidaQ+4OXAuzJ7zWt
 ggFcEWb/M3FVRKCeEgK07E7v3iS2VOKZwq7RETu0CI0txjXd0YmfB4MKWH9G/fil6J19
 ZfcNdPykdG4EWmTPlVVIzGqz8H6DsO1kEluBKeuragK+huQ9wecVn/R12EbbXPyp5RCj
 XA40fyzpYwhbpoqg8o2Sq4djhLBaCDuTWgRbF/rLuoU7SDNk1df1+NwMGlFdD5lVjBBz
 9V2w==
X-Gm-Message-State: AEkoout7+arknWKneuyC/S+T7gzHsMhZ+1fSxFQP6lb1g/KjFdVepLc4a5eKm/nleNjW6k6m3XnKmE0ofKWrMA==
X-Received: by 10.31.183.193 with SMTP id h184mr3070211vkf.3.1471580495383;
 Thu, 18 Aug 2016 21:21:35 -0700 (PDT)
MIME-Version: 1.0
Sender: wlosh@bsdimp.com
Received: by 10.103.0.84 with HTTP; Thu, 18 Aug 2016 21:21:34 -0700 (PDT)
X-Originating-IP: [69.53.245.200]
In-Reply-To: <201608182258.u7IMvpT5090433@fire.js.berklix.net>
References: <71a9ed60-90c1-9df3-4da0-cafd23e48fc0@gmail.com>
 <201608182258.u7IMvpT5090433@fire.js.berklix.net>
From: Warner Losh <imp@bsdimp.com>
Date: Thu, 18 Aug 2016 22:21:34 -0600
X-Google-Sender-Auth: Z17SRGj_0gN3_LHIkPgFFM1t0kI
Message-ID: <CANCZdfrKy+wdvFZSJxxhe1xStGsBzgJcR6icPLW-h1eOi0impQ@mail.gmail.com>
Subject: Re: A how-to guide which you might wish to use for freebsd advocacy
To: "Julian H. Stacey" <jhs@berklix.com>
Cc: "Jukka A. Ukkonen" <jau789@gmail.com>, freebsd-fs@freebsd.org,
 freebsd-advocacy@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 04:21:36 -0000

On Thu, Aug 18, 2016 at 4:57 PM, Julian H. Stacey <jhs@berklix.com> wrote:
> "Jukka A. Ukkonen" wrote freebsd-advocacy@freebsd.org:
>>
>> https://www.facebook.com/notes/jukka-ukkonen/upgrading-the-storage-disk-to-finnsat-fh05-hdr-digital-tv-receiver-while-retaini/10208639116987804
>>
>> Feel free to publish the link on the freebsd web site or otherwise
>> distribute it further.
>
> I added cc: freebsd-fs@freebsd.org as it's about file systems & Ext2 & offsets.
>
> (BTW I have no facebook login, so can assure readers Jukka's
> page is public, no fbook login needed to access URL)
>
> An extract re FS.
>
> ] FreeBSD will by default not accept the Finnsat generated partition
> ] for mounting. The trick is that Finnsat creates partitions with
> ] slack alignment. FreeBSD knows that a live ext2fs has to be a
> ] multiple of 4kB, 4096 bytes in size, i.e. 8 disk blocks, 8*512
> ] bytes. If the partition size is not perfectly aligned, FreeBSD does
> ] not allow read-write mount to an ext2fs instance. With all likelihood
> ] it might be a broken file system. Why should FreeBSD help making
> ] things worse?
> ]
> ] So, you will have to adjust the partition size such that its length
> ] will be a multiple of 8 disk blocks. The tool for this is gpart
> ] (geom partition) which both modifies the partition tables and shows
> ] their contents. First use the command
>
> Thanks Jukka,
>
> I've bcc'd a friend who I discussed Humax TV recorders with a while
> back, on similar issues, some time when I'm visiting the town where
> my 3 Humax owner friends are, I hope to find time to experiment
> with my USB to SATA converter.
>
> This thread is archived here:
> http://lists.freebsd.org/pipermail/freebsd-advocacy/2016-August/004619.html
> & under here:
> http://lists.freebsd.org/pipermail/freebsd-fs/2016-August/date.html

This has been a problem for a while. gpart is too smart. It won't allow one
to create unaligned partitions, even when you know they will work.

Warner

From owner-freebsd-fs@freebsd.org  Fri Aug 19 07:56:24 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EFFBBBF4AB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 19 Aug 2016 07:56:24 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com
 [195.16.151.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 12DF4120C
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 07:56:23 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop01.sare.net (Postfix) with ESMTPSA id 2A04B9DD34F;
 Fri, 19 Aug 2016 09:56:19 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <20160818132948.GB51561@neutralgood.org>
Date: Fri, 19 Aug 2016 09:56:19 +0200
Cc: krad <kraduk@gmail.com>, FreeBSD FS <freebsd-fs@freebsd.org>,
 InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0B420CCC-D04F-451A-960B-496F9F0031AE@sarenet.es>
References: <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <CALfReydFhMfFpQ1v6F8nv5a-UN-EnY5ipYe_oe_edYJfBzjXVQ@mail.gmail.com>
 <20160818132948.GB51561@neutralgood.org>
To: "Kevin P. Neal" <kpn@neutralgood.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 07:56:24 -0000


> On 18 Aug 2016, at 15:29, Kevin P. Neal <kpn@neutralgood.org> wrote:
>=20
> On Thu, Aug 18, 2016 at 08:36:24AM +0100, krad wrote:
>> I didnt think touch was atomic, mkdir is though
>=20
> The shell script snippit that was posted is not safe since there is =
time
> in between the touch and the check for the existance of the lock file.
>=20
> The better solution is to use FreeBSD's lockf command.

Unfortunately it=E2=80=99s not portable. Hence mkdir is the suggested =
way you will find on scripting
tutorials, especially the classic ones :)


Borja.


From owner-freebsd-fs@freebsd.org  Fri Aug 19 08:23:18 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B24BABBFEBC;
 Fri, 19 Aug 2016 08:23:18 +0000 (UTC)
 (envelope-from etnapierala@gmail.com)
Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com
 [IPv6:2a00:1450:400c:c09::234])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 44D6D16BE;
 Fri, 19 Aug 2016 08:23:18 +0000 (UTC)
 (envelope-from etnapierala@gmail.com)
Received: by mail-wm0-x234.google.com with SMTP id q128so24815388wma.1;
 Fri, 19 Aug 2016 01:23:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:date:from:to:cc:subject:message-id:mail-followup-to
 :references:mime-version:content-disposition:in-reply-to:user-agent;
 bh=XudDE4YK6XONsuVyvfMpLIKIhMyMEwGZInjqbQmepIY=;
 b=f7fMM+XuPCqB3f4BBMxyJY34O7l/KVvdTiKhgWXPg3jL6M1itTZgYA5GI4Elau+xqY
 cE5LnWQSSgF0HcKP+l2R0/xc9iNXctRvvd/IUebg4yBneH6MGrASb1mUOFF1agGPGXBi
 DOhzXr+nFWnkRtUeDbczRueoWOr7WDCK7WJMJ983QfuGOFow80xNkiFYE7fMtshrEPhd
 uDAF1yC1WYXVsMobEpju9R3rK/Zo+7eWpE86NheGaPQeUbAxZs2FbjPc/1w1pvH9SW5r
 3xbEs5srf79+yzAEeKkH7fKOeidP6t8VMIzilYA8rPR2FG017TGSByzMCMyQLQIHT/0U
 4qXg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
 :mail-followup-to:references:mime-version:content-disposition
 :in-reply-to:user-agent;
 bh=XudDE4YK6XONsuVyvfMpLIKIhMyMEwGZInjqbQmepIY=;
 b=M8xMPlBVF4NwtnaWm1QFecMdEq4DKQpQEfl88+1JdRQmrLr1c08IOpx96T78KGtCI9
 BvsXNzm6ta74aA8nz3mVvZNHBS0T56QInfrU6E2JUsF5mXiafWoUXks3a3AKcC7h8Eh6
 KePdp65fx8NWp+iM6vLp+SilT9MyK9WJBH5bqN+3fSMANy69UhbWde2kWbckhIp/oNfp
 XkC++tHtJrlEFnBCrdjHjVos4Cx07raacmC5BYgcQ6qYt2G9lLwLuB+DJDcnhPT5TEHV
 A/Z3IbEOpVnmAhXFmCU7LGnnxj6bd4NdcVtGm4kZabkWOeMPmrhO41F8t2AD22bvZupQ
 gMAA==
X-Gm-Message-State: AEkoouuK2v0C1kpw4+Cuy8pA/H6EfsQyzQr5qHjCqOhhV5YvffHW6TguxBU7l6psKeDO/g==
X-Received: by 10.28.32.77 with SMTP id g74mr2752671wmg.45.1471594996724;
 Fri, 19 Aug 2016 01:23:16 -0700 (PDT)
Received: from brick (euc212.neoplus.adsl.tpnet.pl. [83.20.174.212])
 by smtp.gmail.com with ESMTPSA id w129sm3306406wmd.9.2016.08.19.01.23.13
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 19 Aug 2016 01:23:15 -0700 (PDT)
Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= <etnapierala@gmail.com>
Date: Fri, 19 Aug 2016 10:23:10 +0200
From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= <trasz@FreeBSD.org>
To: "Eugene M. Zheganin" <emz@norma.perm.ru>
Cc: FreeBSD FS <freebsd-fs@freebsd.org>,
 freebsd-stable <freebsd-stable@FreeBSD.org>
Subject: Re: cannot destroy '<dataset name>': dataset is busy vs iSCSI
Message-ID: <20160819082310.GA14806@brick>
Mail-Followup-To: "Eugene M. Zheganin" <emz@norma.perm.ru>,
 FreeBSD FS <freebsd-fs@freebsd.org>,
 freebsd-stable <freebsd-stable@FreeBSD.org>
References: <57B5CD2F.2070204@norma.perm.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <57B5CD2F.2070204@norma.perm.ru>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 08:23:18 -0000

On 0818T1958, Eugene M. Zheganin wrote:
> Hi.
> 
> I'm using zvol clones with iSCSI. Perdiodically I renew them and destroy
> the old ones, but sometimes the clone gets stuck and refuses to be
> destroyed:
> 
> (I'm showing the full sequence so it's self explanatory who is who's parent)
> 
> [root@san2:/etc]# zfs destroy esx/games-reference1@ver5_6       
> cannot destroy 'esx/games-reference1@ver5_6': snapshot has dependent clones
> use '-R' to destroy the following datasets:
> esx/games-reference1-ver5_6-worker111
> [root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111
> cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy
> 
> The only entity that can hold the dataset open is ctld, so:
> 
> [root@san2:/etc]# service ctld reload                                    
> [root@san2:/etc]# grep esx/games-reference1-ver5_6-worker111 /etc/ctl.conf
> [root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111      
> cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy
> 
> As you can see, the clone isn't mentioned in ctl.conf, but still refuses
> to be destroyed.
> Is there any way to destroy it without restarting ctld or rebooting the
> server ? iSCSI is vital for production, but clones sometimes holds lot
> of space.

Could you do "ctladm devlist -v" and see if the LUN for this file somehow
didn't get removed?


From owner-freebsd-fs@freebsd.org  Fri Aug 19 08:49:30 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EAD5BBE7E5
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 19 Aug 2016 08:49:30 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu1176c.smtpx.saremail.com (cu1176c.smtpx.saremail.com
 [195.16.148.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 112F41A19
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 08:49:29 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop02.sare.net (Postfix) with ESMTPSA id 8A20D9DC90E;
 Fri, 19 Aug 2016 10:49:20 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: HAST + ZFS + NFS + CARP
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <4c34cbf9-84b5-5d42-e0b4-bf18aa1ef9a7@kateley.com>
Date: Fri, 19 Aug 2016 10:49:20 +0200
Cc: juergen.gotteswinter@internetx.com, Chris Watson <bsdunix44@gmail.com>,
 freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <3F002B89-353E-41CE-8ACF-B34D7D774BCC@sarenet.es>
References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com>
 <20160704183643.GI41276@mordor.lan>
 <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com>
 <20160704193131.GJ41276@mordor.lan>
 <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es>
 <20160811091016.GI70364@mordor.lan>
 <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es>
 <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com>
 <20160817085413.GE22506@mordor.lan>
 <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com>
 <20160817095222.GG22506@mordor.lan>
 <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com>
 <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com>
 <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com>
 <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com>
 <4c34cbf9-84b5-5d42-e0b4-bf18aa1ef9a7@kateley.com>
To: linda@kateley.com
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 08:49:30 -0000


> On 18 Aug 2016, at 19:19, Linda Kateley <lkateley@kateley.com> wrote:
>> here: running nexenta ha setup since several years with one =
catastrophic
>> failure due to split brain
> Just trying to say I don't see projects ongoing.. just at beginning

I saw consultants near the T=C3=A4nnhauser Gate=E2=80=A6

Some kind of feedback loop is terrific, really! ;)


Borja.


From owner-freebsd-fs@freebsd.org  Fri Aug 19 20:18:49 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9F9EBBE694
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 19 Aug 2016 20:18:49 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8A4AB1592
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 20:18:49 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD))
 (envelope-from <slw@zxy.spb.ru>)
 id 1baqFU-0003gK-Kd; Fri, 19 Aug 2016 23:18:40 +0300
Date: Fri, 19 Aug 2016 23:18:40 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Karl Denninger <karl@denninger.net>, freebsd-fs@freebsd.org
Subject: Re: ZFS ARC under memory pressure
Message-ID: <20160819201840.GA12519@zxy.spb.ru>
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
 <20160818202657.GS8192@zxy.spb.ru>
 <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 20:18:49 -0000

On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote:

> 
> On 8/18/2016 15:26, Slawa Olhovchenkov wrote:
> > On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote:
> >
> >> On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
> >>> I see issuses with ZFS ARC inder memory pressure.
> >>> ZFS ARC size can be dramaticaly reduced, up to arc_min.
> >>>
> >>> As I see memory pressure event cause call arc_lowmem and set needfree:
> >>>
> >>> arc.c:arc_lowmem
> >>>
> >>>         needfree = btoc(arc_c >> arc_shrink_shift);
> >>>
> >>> After this, arc_available_memory return negative vaules (PAGESIZE *
> >>> (-needfree)) until needfree is zero. Independent how too much memory
> >>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
> >>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every
> >>> loop interation).
> >>>
> >>> arc_c droped to minimum value if arc_size fast enough droped.
> >>>
> >>> No control current to initial memory allocation.
> >>>
> >>> As result, I can see needless arc reclaim, from 10x to 100x times.
> >>>
> >>> Can some one check me and comment this?
> >> You might have found a real problem here, but I am short of time right now to
> >> properly analyze the issue.  I think that on illumos 'needfree' is a variable
> >> that's managed by the virtual memory system and it is akin to our
> >> vm_pageout_deficit.  But during the porting it became an artificial value and
> >> its handling might be sub-optimal.
> > As I see, totaly not optimal.
> > I am create some patch for sub-optimal handling and now test it.
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> 
> You might want to look at the code contained in here:
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

In may case arc.c issuse cused by revision r286625 in HEAD (and
r288562 in STABLE) -- all in 2015, not touch in 2014.

> There are some ugly interactions with the VM system you can run into if
> you're not careful; I've chased this issue before and while I haven't
> yet done the work to integrate it into 11.x (and the underlying code
> *has* changed since the 10.x patches I developed) if you wind up driving
> the VM system to evict pages to swap rather than pare back ARC you're
> probably making the wrong choice.
> 
> In addition UMA can come into the picture too and (at least previously)
> was a severe contributor to pathological behavior.

I am only do less aggresive (and more controlled) shrink of ARC size.
Now ARC just collapsed.

Pointed PR is realy BIG. I am can't read and understund all of this.
r286625 change behaivor of interaction between ARC and VM.
You problem still exist? Can you explain (in list)?

-- 
Slawa Olhovchenkov

From owner-freebsd-fs@freebsd.org  Fri Aug 19 20:39:09 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 81CE5BBEF51
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 19 Aug 2016 20:39:09 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (denninger.net [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3D69A1318
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 20:39:09 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 3C107208713;
 Fri, 19 Aug 2016 15:39:01 -0500 (CDT)
Subject: Re: ZFS ARC under memory pressure
To: Slawa Olhovchenkov <slw@zxy.spb.ru>, freebsd-fs@freebsd.org
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
 <20160818202657.GS8192@zxy.spb.ru>
 <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
 <20160819201840.GA12519@zxy.spb.ru>
From: Karl Denninger <karl@denninger.net>
Message-ID: <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
Date: Fri, 19 Aug 2016 15:38:55 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160819201840.GA12519@zxy.spb.ru>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms030104010004020408090706"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 20:39:09 -0000

This is a cryptographically signed message in MIME format.

--------------ms030104010004020408090706
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 8/19/2016 15:18, Slawa Olhovchenkov wrote:
> On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote:
>
>> On 8/18/2016 15:26, Slawa Olhovchenkov wrote:
>>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote:
>>>
>>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
>>>>> I see issuses with ZFS ARC inder memory pressure.
>>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min.
>>>>>
>>>>> As I see memory pressure event cause call arc_lowmem and set needfr=
ee:
>>>>>
>>>>> arc.c:arc_lowmem
>>>>>
>>>>>         needfree =3D btoc(arc_c >> arc_shrink_shift);
>>>>>
>>>>> After this, arc_available_memory return negative vaules (PAGESIZE *=

>>>>> (-needfree)) until needfree is zero. Independent how too much memor=
y
>>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=3D=

>>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at eve=
ry
>>>>> loop interation).
>>>>>
>>>>> arc_c droped to minimum value if arc_size fast enough droped.
>>>>>
>>>>> No control current to initial memory allocation.
>>>>>
>>>>> As result, I can see needless arc reclaim, from 10x to 100x times.
>>>>>
>>>>> Can some one check me and comment this?
>>>> You might have found a real problem here, but I am short of time rig=
ht now to
>>>> properly analyze the issue.  I think that on illumos 'needfree' is a=
 variable
>>>> that's managed by the virtual memory system and it is akin to our
>>>> vm_pageout_deficit.  But during the porting it became an artificial =
value and
>>>> its handling might be sub-optimal.
>>> As I see, totaly not optimal.
>>> I am create some patch for sub-optimal handling and now test it.
>>> _______________________________________________
>>> freebsd-fs at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.o=
rg"
>> You might want to look at the code contained in here:
>>
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594
> In may case arc.c issuse cused by revision r286625 in HEAD (and
> r288562 in STABLE) -- all in 2015, not touch in 2014.
>
>> There are some ugly interactions with the VM system you can run into i=
f
>> you're not careful; I've chased this issue before and while I haven't
>> yet done the work to integrate it into 11.x (and the underlying code
>> *has* changed since the 10.x patches I developed) if you wind up drivi=
ng
>> the VM system to evict pages to swap rather than pare back ARC you're
>> probably making the wrong choice.
>>
>> In addition UMA can come into the picture too and (at least previously=
)
>> was a severe contributor to pathological behavior.
> I am only do less aggresive (and more controlled) shrink of ARC size.
> Now ARC just collapsed.
>
> Pointed PR is realy BIG. I am can't read and understund all of this.
> r286625 change behaivor of interaction between ARC and VM.
> You problem still exist? Can you explain (in list)?
>

Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified
buffer cache (which the VM system manages) ZFS does not.  ARC is
allocated out of kernel memory and (by default) also uses UMA; the VM
system is not involved in its management.

When the VM system gets constrained (low memory) it thus cannot tell the
ARC to pare back.  So when the VM system gets low on RAM it will start
to page.  The problem with this is that if the VM system is low on RAM
because the ARC is consuming memory you do NOT want to page, you want to
evict some of the ARC.

Consider this: ARC data *at best* prevents one I/O.  That is, if there
is data in the cache when you go to read from disk, you avoid one I/O
per unit of data in the ARC you didn't have to read.

Paging *always* requires one I/O (to write the page(s) to the swap) and
MAY involve two (to later page it back in.)  It is never a "win" to
spend a *guaranteed* I/O when you can instead act in a way that *might*
cause you to (later) need to execute one.

Unfortunately the VM system has another interaction that causes trouble
too.  The VM system will "demote" a page to inactive or cache status but
not actually free it.  It only starts to go through those pages and free
them when the vm system wakes up, and that only happens when free space
gets low enough to trigger it.

Finally, there's another problem that comes into play; UMA.  Kernel
memory allocation is fairly expensive.  UMA grabs memory from the kernel
allocation system in big chunks and manages it, and by doing so gains a
pretty-significant performance boost.  But this means that you can have
large amounts of RAM that are allocated, not in use, and yet the VM
system cannot reclaim them on its own.  The ZFS code has to reap those
caches, but reaping them is a moderately expensive operation too, thus
you don't want to do it unnecessarily.

I've not yet gone through the 11.x code to see what changed from 10.x;
what I do know is that it is materially better-behaved than it used to
be, in that prior to 11.x I would have (by now) pretty much been forced
into rolling that forward and testing it because the misbehavior in one
of my production systems was severe enough to render it basically
unusable without the patch in that PR inline, with the most-serious
misbehavior being paging-induced stalls that could reach 10s of seconds
or more in duration.

11.x hasn't exhibited the severe problems, unpatched, that 10.x was
known to do on my production systems -- but it is far less than great in
that it sure as heck does have UMA coherence issues.....

ARC Size:                               38.58%  8.61    GiB
        Target Size: (Adaptive)         70.33%  15.70   GiB
        Min Size (Hard Limit):          12.50%  2.79    GiB
        Max Size (High Water):          8:1     22.32   GiB

I have 20GB out in kernel memory on this machine right now but only 8.6
of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused
-- so despite the belief expressed by some that the 11.x code is
"better" at reaping UMA I'm sure not seeing it here.

I'll get around to rolling forward and modifying that PR since that
particular bit of jackassery with UMA is a definite performance
problem.  I suspect a big part of what you're seeing lies there as
well.  When I do get that code done and tested I suspect it may solve
your problems as well.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030104010004020408090706
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTkyMDM4NTVaME8GCSqGSIb3DQEJBDFCBEBf
jToACzZ0aJsbs4kPgVo+9NmclF/Tf7IFMaWtJhWC9c6c69kh1e+sd08ixw/+69mWurKFLbsi
k0c2kARW6yWTMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAiRCmguHm
vmtO9kxmx0/QNeOjkaTT4SSzpqNuCXGfFc8z0DusMi0yV+d1qQ6+Kd7MGZKuyliV934suUt4
1f4Nr6F8A72QoHVYxFhO5FfkPqCogYMGgcViqaAh3AsHvr/lwHef1gGiUQS0rRVv61WYFacs
g7vSULg4J5WuwDAhnHoW2FJ5EFrHXZdCb4h0i7aFnDktrCasXowAy5HPxa52oF1zSFKtOP3Z
N1o3irux+R+ZvQPs5Bt31feVIevfaWHtdESwqgtalnGhpdgr4yFJldMIegJG3gNYc7+ChbH5
xLctVV6Du/PEZdcDUmq3jZjjAHAhqxcDmheRW+EXBbRbMTSHESx5hGIKYZ/7MrAfjC+MwN1W
JHCeLw9x7VV8ucmUje7X3Pb7VSStmfkSt3qgAUEQQDc/1BpBBLPVXSF5i62UqNC/4iL00/Qe
pQ9uNoBwVDIZik0Qmhb+Smeu6jEG88IBacqfNH+RoF9UPW0GgGh3k2un4jeaPHkljRQStydR
8RlOqflMEGG4LHX2v4YiYA+UHtxMV2pu4L+q26WsFEelD0byDenXX5aneKr0J62/Qo40NEc1
2lDDhPouaTPmYKNFjQX4pXKMTLo8hWSQEvcfwleZBBh0gmVoQNHDQP/wGGzSNkuq7gtEXs3l
i0kEDF41DkONr7NO1N3+mBYkFNYAAAAAAAA=
--------------ms030104010004020408090706--

From owner-freebsd-fs@freebsd.org  Fri Aug 19 21:34:50 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A060FBC09B7
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 19 Aug 2016 21:34:50 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6088311AA
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 21:34:50 +0000 (UTC)
 (envelope-from slw@zxy.spb.ru)
Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD))
 (envelope-from <slw@zxy.spb.ru>)
 id 1barR8-0005Lk-9Y; Sat, 20 Aug 2016 00:34:46 +0300
Date: Sat, 20 Aug 2016 00:34:46 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Karl Denninger <karl@denninger.net>
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS ARC under memory pressure
Message-ID: <20160819213446.GT8192@zxy.spb.ru>
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
 <20160818202657.GS8192@zxy.spb.ru>
 <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
 <20160819201840.GA12519@zxy.spb.ru>
 <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 21:34:50 -0000

On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote:

> On 8/19/2016 15:18, Slawa Olhovchenkov wrote:
> > On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote:
> >
> >> On 8/18/2016 15:26, Slawa Olhovchenkov wrote:
> >>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote:
> >>>
> >>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
> >>>>> I see issuses with ZFS ARC inder memory pressure.
> >>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min.
> >>>>>
> >>>>> As I see memory pressure event cause call arc_lowmem and set needfree:
> >>>>>
> >>>>> arc.c:arc_lowmem
> >>>>>
> >>>>>         needfree = btoc(arc_c >> arc_shrink_shift);
> >>>>>
> >>>>> After this, arc_available_memory return negative vaules (PAGESIZE *
> >>>>> (-needfree)) until needfree is zero. Independent how too much memory
> >>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
> >>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every
> >>>>> loop interation).
> >>>>>
> >>>>> arc_c droped to minimum value if arc_size fast enough droped.
> >>>>>
> >>>>> No control current to initial memory allocation.
> >>>>>
> >>>>> As result, I can see needless arc reclaim, from 10x to 100x times.
> >>>>>
> >>>>> Can some one check me and comment this?
> >>>> You might have found a real problem here, but I am short of time right now to
> >>>> properly analyze the issue.  I think that on illumos 'needfree' is a variable
> >>>> that's managed by the virtual memory system and it is akin to our
> >>>> vm_pageout_deficit.  But during the porting it became an artificial value and
> >>>> its handling might be sub-optimal.
> >>> As I see, totaly not optimal.
> >>> I am create some patch for sub-optimal handling and now test it.
> >>> _______________________________________________
> >>> freebsd-fs at freebsd.org mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> >> You might want to look at the code contained in here:
> >>
> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594
> > In may case arc.c issuse cused by revision r286625 in HEAD (and
> > r288562 in STABLE) -- all in 2015, not touch in 2014.
> >
> >> There are some ugly interactions with the VM system you can run into if
> >> you're not careful; I've chased this issue before and while I haven't
> >> yet done the work to integrate it into 11.x (and the underlying code
> >> *has* changed since the 10.x patches I developed) if you wind up driving
> >> the VM system to evict pages to swap rather than pare back ARC you're
> >> probably making the wrong choice.
> >>
> >> In addition UMA can come into the picture too and (at least previously)
> >> was a severe contributor to pathological behavior.
> > I am only do less aggresive (and more controlled) shrink of ARC size.
> > Now ARC just collapsed.
> >
> > Pointed PR is realy BIG. I am can't read and understund all of this.
> > r286625 change behaivor of interaction between ARC and VM.
> > You problem still exist? Can you explain (in list)?
> >
> 
> Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified
> buffer cache (which the VM system manages) ZFS does not.  ARC is
> allocated out of kernel memory and (by default) also uses UMA; the VM
> system is not involved in its management.
> 
> When the VM system gets constrained (low memory) it thus cannot tell the
> ARC to pare back.  So when the VM system gets low on RAM it will start

Currently VM generate event and ARC listen for this event, handle it
by arc.c:arc_lowmem().

> to page.  The problem with this is that if the VM system is low on RAM
> because the ARC is consuming memory you do NOT want to page, you want to
> evict some of the ARC.

Now by event `lowmem` ARC try to evict 1/128 of ARC.

> Unfortunately the VM system has another interaction that causes trouble
> too.  The VM system will "demote" a page to inactive or cache status but
> not actually free it.  It only starts to go through those pages and free
> them when the vm system wakes up, and that only happens when free space
> gets low enough to trigger it.


> Finally, there's another problem that comes into play; UMA.  Kernel
> memory allocation is fairly expensive.  UMA grabs memory from the kernel
> allocation system in big chunks and manages it, and by doing so gains a
> pretty-significant performance boost.  But this means that you can have
> large amounts of RAM that are allocated, not in use, and yet the VM
> system cannot reclaim them on its own.  The ZFS code has to reap those
> caches, but reaping them is a moderately expensive operation too, thus
> you don't want to do it unnecessarily.

Not sure, but some code in ZFS may be handle this.
arc.c:arc_kmem_reap_now().
Not sure.

> I've not yet gone through the 11.x code to see what changed from 10.x;
> what I do know is that it is materially better-behaved than it used to
> be, in that prior to 11.x I would have (by now) pretty much been forced
> into rolling that forward and testing it because the misbehavior in one
> of my production systems was severe enough to render it basically
> unusable without the patch in that PR inline, with the most-serious
> misbehavior being paging-induced stalls that could reach 10s of seconds
> or more in duration.
> 
> 11.x hasn't exhibited the severe problems, unpatched, that 10.x was
> known to do on my production systems -- but it is far less than great in
> that it sure as heck does have UMA coherence issues.....
> 
> ARC Size:                               38.58%  8.61    GiB
>         Target Size: (Adaptive)         70.33%  15.70   GiB
>         Min Size (Hard Limit):          12.50%  2.79    GiB
>         Max Size (High Water):          8:1     22.32   GiB
> 
> I have 20GB out in kernel memory on this machine right now but only 8.6
> of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused
> -- so despite the belief expressed by some that the 11.x code is
> "better" at reaping UMA I'm sure not seeing it here.

I see.
In my case:

ARC Size:                               79.65%  98.48   GiB
        Target Size: (Adaptive)         79.60%  98.42   GiB
        Min Size (Hard Limit):          12.50%  15.46   GiB
        Max Size (High Water):          8:1     123.64  GiB

System Memory:

        2.27%   2.83    GiB Active,     9.58%   11.94   GiB Inact
        86.34%  107.62  GiB Wired,      0.00%   0 Cache
        1.80%   2.25    GiB Free,       0.00%   0 Gap

        Real Installed:                         128.00  GiB
        Real Available:                 99.96%  127.95  GiB
        Real Managed:                   97.41%  124.64  GiB

        Logical Total:                          128.00  GiB
        Logical Used:                   88.92%  113.81  GiB
        Logical Free:                   11.08%  14.19   GiB

Kernel Memory:                                  758.25  MiB
        Data:                           97.81%  741.61  MiB
        Text:                           2.19%   16.64   MiB

Kernel Memory Map:                              124.64  GiB
        Size:                           81.84%  102.01  GiB
        Free:                           18.16%  22.63   GiB

Mem: 2895M Active, 12G Inact, 108G Wired, 528K Buf, 2303M Free
ARC: 98G Total, 89G MFU, 9535M MRU, 35M Anon, 126M Header, 404M Other
Swap: 32G Total, 394M Used, 32G Free, 1% Inuse

Is this 12G Inactive as 'UMA allocated-but-unused'?
This is also may be freed but not reclaimed network bufs.

> I'll get around to rolling forward and modifying that PR since that
> particular bit of jackassery with UMA is a definite performance
> problem.  I suspect a big part of what you're seeing lies there as
> well.  When I do get that code done and tested I suspect it may solve
> your problems as well.

No. May problem is completly different: under memory pressure, after arc_lowmem()
set needfree to non-zero arc_reclaim_thread() start to shrink ARC. But
arc_reclaim_thread (in FreeBSD case) don't correctly control this process
and shrink stoped in random time (when after next iteration arc_size <= arc_c),
mostly after drop to Min Size (Hard Limit).

I am just resore control of shrink process.

From owner-freebsd-fs@freebsd.org  Fri Aug 19 21:52:09 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 416BABC0DE2
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 19 Aug 2016 21:52:09 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (denninger.net [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D48311BC7
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 21:52:08 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 53614208A4C
 for <freebsd-fs@freebsd.org>; Fri, 19 Aug 2016 16:52:06 -0500 (CDT)
Subject: Re: ZFS ARC under memory pressure
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
 <20160818202657.GS8192@zxy.spb.ru>
 <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
 <20160819201840.GA12519@zxy.spb.ru>
 <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
 <20160819213446.GT8192@zxy.spb.ru>
To: freebsd-fs@freebsd.org
From: Karl Denninger <karl@denninger.net>
Message-ID: <05ba785a-c86f-1ec8-fcf3-71d22551f4f3@denninger.net>
Date: Fri, 19 Aug 2016 16:52:00 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160819213446.GT8192@zxy.spb.ru>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms040808020408080903000805"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2016 21:52:09 -0000

This is a cryptographically signed message in MIME format.

--------------ms040808020408080903000805
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 8/19/2016 16:34, Slawa Olhovchenkov wrote:
> On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote:
>
>> On 8/19/2016 15:18, Slawa Olhovchenkov wrote:
>>> On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote:
>>>
>>>> On 8/18/2016 15:26, Slawa Olhovchenkov wrote:
>>>>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote:
>>>>>
>>>>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
>>>>>>> I see issuses with ZFS ARC inder memory pressure.
>>>>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min.
>>>>>>>
>>>>>>> As I see memory pressure event cause call arc_lowmem and set need=
free:
>>>>>>>
>>>>>>> arc.c:arc_lowmem
>>>>>>>
>>>>>>>         needfree =3D btoc(arc_c >> arc_shrink_shift);
>>>>>>>
>>>>>>> After this, arc_available_memory return negative vaules (PAGESIZE=
 *
>>>>>>> (-needfree)) until needfree is zero. Independent how too much mem=
ory
>>>>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=
=3D
>>>>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at e=
very
>>>>>>> loop interation).
>>>>>>>
>>>>>>> arc_c droped to minimum value if arc_size fast enough droped.
>>>>>>>
>>>>>>> No control current to initial memory allocation.
>>>>>>>
>>>>>>> As result, I can see needless arc reclaim, from 10x to 100x times=
=2E
>>>>>>>
>>>>>>> Can some one check me and comment this?
>>>>>> You might have found a real problem here, but I am short of time r=
ight now to
>>>>>> properly analyze the issue.  I think that on illumos 'needfree' is=
 a variable
>>>>>> that's managed by the virtual memory system and it is akin to our
>>>>>> vm_pageout_deficit.  But during the porting it became an artificia=
l value and
>>>>>> its handling might be sub-optimal.
>>>>> As I see, totaly not optimal.
>>>>> I am create some patch for sub-optimal handling and now test it.
>>>>> _______________________________________________
>>>>> freebsd-fs at freebsd.org mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd=
=2Eorg"
>>>> You might want to look at the code contained in here:
>>>>
>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594
>>> In may case arc.c issuse cused by revision r286625 in HEAD (and
>>> r288562 in STABLE) -- all in 2015, not touch in 2014.
>>>
>>>> There are some ugly interactions with the VM system you can run into=
 if
>>>> you're not careful; I've chased this issue before and while I haven'=
t
>>>> yet done the work to integrate it into 11.x (and the underlying code=

>>>> *has* changed since the 10.x patches I developed) if you wind up dri=
ving
>>>> the VM system to evict pages to swap rather than pare back ARC you'r=
e
>>>> probably making the wrong choice.
>>>>
>>>> In addition UMA can come into the picture too and (at least previous=
ly)
>>>> was a severe contributor to pathological behavior.
>>> I am only do less aggresive (and more controlled) shrink of ARC size.=

>>> Now ARC just collapsed.
>>>
>>> Pointed PR is realy BIG. I am can't read and understund all of this.
>>> r286625 change behaivor of interaction between ARC and VM.
>>> You problem still exist? Can you explain (in list)?
>>>
>> Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified
>> buffer cache (which the VM system manages) ZFS does not.  ARC is
>> allocated out of kernel memory and (by default) also uses UMA; the VM
>> system is not involved in its management.
>>
>> When the VM system gets constrained (low memory) it thus cannot tell t=
he
>> ARC to pare back.  So when the VM system gets low on RAM it will start=

> Currently VM generate event and ARC listen for this event, handle it
> by arc.c:arc_lowmem().
>
>> to page.  The problem with this is that if the VM system is low on RAM=

>> because the ARC is consuming memory you do NOT want to page, you want =
to
>> evict some of the ARC.
> Now by event `lowmem` ARC try to evict 1/128 of ARC.
>
>> Unfortunately the VM system has another interaction that causes troubl=
e
>> too.  The VM system will "demote" a page to inactive or cache status b=
ut
>> not actually free it.  It only starts to go through those pages and fr=
ee
>> them when the vm system wakes up, and that only happens when free spac=
e
>> gets low enough to trigger it.
>
>> Finally, there's another problem that comes into play; UMA.  Kernel
>> memory allocation is fairly expensive.  UMA grabs memory from the kern=
el
>> allocation system in big chunks and manages it, and by doing so gains =
a
>> pretty-significant performance boost.  But this means that you can hav=
e
>> large amounts of RAM that are allocated, not in use, and yet the VM
>> system cannot reclaim them on its own.  The ZFS code has to reap those=

>> caches, but reaping them is a moderately expensive operation too, thus=

>> you don't want to do it unnecessarily.
> Not sure, but some code in ZFS may be handle this.
> arc.c:arc_kmem_reap_now().
> Not sure.
>
>> I've not yet gone through the 11.x code to see what changed from 10.x;=

>> what I do know is that it is materially better-behaved than it used to=

>> be, in that prior to 11.x I would have (by now) pretty much been force=
d
>> into rolling that forward and testing it because the misbehavior in on=
e
>> of my production systems was severe enough to render it basically
>> unusable without the patch in that PR inline, with the most-serious
>> misbehavior being paging-induced stalls that could reach 10s of second=
s
>> or more in duration.
>>
>> 11.x hasn't exhibited the severe problems, unpatched, that 10.x was
>> known to do on my production systems -- but it is far less than great =
in
>> that it sure as heck does have UMA coherence issues.....
>>
>> ARC Size:                               38.58%  8.61    GiB
>>         Target Size: (Adaptive)         70.33%  15.70   GiB
>>         Min Size (Hard Limit):          12.50%  2.79    GiB
>>         Max Size (High Water):          8:1     22.32   GiB
>>
>> I have 20GB out in kernel memory on this machine right now but only 8.=
6
>> of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused=

>> -- so despite the belief expressed by some that the 11.x code is
>> "better" at reaping UMA I'm sure not seeing it here.
> I see.
> In my case:
>
> ARC Size:                               79.65%  98.48   GiB
>         Target Size: (Adaptive)         79.60%  98.42   GiB
>         Min Size (Hard Limit):          12.50%  15.46   GiB
>         Max Size (High Water):          8:1     123.64  GiB
>
> System Memory:
>
>         2.27%   2.83    GiB Active,     9.58%   11.94   GiB Inact
>         86.34%  107.62  GiB Wired,      0.00%   0 Cache
>         1.80%   2.25    GiB Free,       0.00%   0 Gap
>
>         Real Installed:                         128.00  GiB
>         Real Available:                 99.96%  127.95  GiB
>         Real Managed:                   97.41%  124.64  GiB
>
>         Logical Total:                          128.00  GiB
>         Logical Used:                   88.92%  113.81  GiB
>         Logical Free:                   11.08%  14.19   GiB
>
> Kernel Memory:                                  758.25  MiB
>         Data:                           97.81%  741.61  MiB
>         Text:                           2.19%   16.64   MiB
>
> Kernel Memory Map:                              124.64  GiB
>         Size:                           81.84%  102.01  GiB
>         Free:                           18.16%  22.63   GiB
>
> Mem: 2895M Active, 12G Inact, 108G Wired, 528K Buf, 2303M Free
> ARC: 98G Total, 89G MFU, 9535M MRU, 35M Anon, 126M Header, 404M Other
> Swap: 32G Total, 394M Used, 32G Free, 1% Inuse
>
> Is this 12G Inactive as 'UMA allocated-but-unused'?
> This is also may be freed but not reclaimed network bufs.
>
>> I'll get around to rolling forward and modifying that PR since that
>> particular bit of jackassery with UMA is a definite performance
>> problem.  I suspect a big part of what you're seeing lies there as
>> well.  When I do get that code done and tested I suspect it may solve
>> your problems as well.
> No. May problem is completly different: under memory pressure, after ar=
c_lowmem()
> set needfree to non-zero arc_reclaim_thread() start to shrink ARC. But
> arc_reclaim_thread (in FreeBSD case) don't correctly control this proce=
ss
> and shrink stoped in random time (when after next iteration arc_size <=3D=
 arc_c),
> mostly after drop to Min Size (Hard Limit).
>
> I am just resore control of shrink process.
Not quite due to the UMA issue, among other things.  There's also a
potential "stall" issue that can arise also having to do with dirty_max
sizing, especially if you are using rotating media.  The PR patch scaled
that back dynamically as well under memory pressure and eliminated that
issue as well.

I won't have time to look at this for at least another week on my test
machine as I'm unfortunately buried with unrelated work at present, but
I should be able to put some effort into this within the next couple
weeks and see if I can quickly roll forward the important parts of the
previous PR patch.

I think you'll find that it stops the behavior you're seeing - I'm just
pointing out that this was more-complex internally than it first
appeared in the 10.x branch and I have no reason to believe the
interactions that lead to bad behavior are not still in play given what
you're describing for symptoms.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms040808020408080903000805
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTkyMTUyMDBaME8GCSqGSIb3DQEJBDFCBEAK
hJi5/8ptyPvenRhWie/BSME8lhs9BQnHdC6flidXNcBCWBhTvA0NrlqjIYn/ORlwXesJRByf
t14fEPqQtrVaMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAM2mjKv7n
smv9SiI6bPPW708oruljYXQpJPRsM0HD8/hYLn5TPsVysnWZwuZCUrNikEBrQI5qqMmpYt9n
o/DrVAhOiupZ2Jz8/oO7KJ+EEdMCABFdY9LRowdpJTHOhYUkaJ5D4YFg/EKP3a8RWGZ6av07
Iy4WZliVOVAV8147Pqxc/YJRxqEM225WV4riC2KkGgskNmYzB9M/nsNNTJiT0EhGxJIq/qfS
k5WwkSAMOpUj8M3dI6pOCyIDIqjSUc4wxoVa4UXrdgx5VvXIZCsaatC8USfjCi9j1UE0aACe
/CiPQFNIoesa+yMGszJ5jmHQAt1Wv/95nTQlfN6hEnZw015hGq6Wh3IPb4ajBVyy5TzEOiCV
qiql3Z8ccHGaBjQDlSqK+CM/8ApZSeXE/CpThaGRPdUyZBQ51XRLvYzqVnAAM2bPOAgrd2kw
ND8Ez7O2N3dpQJlc9pNKM7k7M0bfBSNp+bnjj3bLiiTFNA0fHnCKB2a1Eowucw3jDuVZ2jy3
OTUnBlWN48cE94fsMZ8hh2jYRZ7PHDLrveWUsCTkh8zPoN3rnWOrgw+SDBTREGp4rtJn7nQo
popEmhSAR5ZZ6txJ65XAhISwOcHaTJMTn5CitAAG03koJjHK244t64e9P2BiB0LqMrxehM5T
tOFRe/TzVhNIxUwq26xOfvP/pDYAAAAAAAA=
--------------ms040808020408080903000805--

From owner-freebsd-fs@freebsd.org  Sat Aug 20 01:38:53 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id ED6EABBF8AF
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 20 Aug 2016 01:38:53 +0000 (UTC)
 (envelope-from ler@lerctr.org)
Received: from thebighonker.lerctr.org (thebighonker.lerctr.org
 [IPv6:2001:470:1f0f:3ad:223:7dff:fe9e:6e8a])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "thebighonker.lerctr.org",
 Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B9EA818B1
 for <freebsd-fs@freebsd.org>; Sat, 20 Aug 2016 01:38:53 +0000 (UTC)
 (envelope-from ler@lerctr.org)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lerctr.org; 
 s=lerami;
 h=Message-ID:Subject:To:From:Date:Content-Transfer-Encoding:
 Content-Type:MIME-Version:Sender:Reply-To:Cc:Content-ID:Content-Description:
 Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
 In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive;
 bh=3VXayWpeBBXYBdeoPq+JUq19eaUd7ZXp01ZNjPVCDlA=; b=Y679OYAVNuaqi7uOTIoNmgNF6X
 IAhip/YHps/cI6J/AEElQs7fFeTcKcte4Uw7ehifMA7YpXe81/zU/SYQ98k7NYUM0W0/6DRvAbTDf
 aqlPURmRV3CMdwyqT8zx4q+jFMu2bam0IA/Gmw3+vNbHjN1/2Dv4aApATK43sUXUnq2M=;
Received: from thebighonker.lerctr.org
 ([2001:470:1f0f:3ad:223:7dff:fe9e:6e8a]:40261 helo=webmail.lerctr.org)
 by thebighonker.lerctr.org with esmtpsa
 (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87 (FreeBSD))
 (envelope-from <ler@lerctr.org>) id 1bavFL-0006U8-Ua
 for freebsd-fs@freebsd.org; Fri, 19 Aug 2016 20:38:52 -0500
Received: from 2001:470:1f0f:42c:cc:6a5b:b3ec:36fb by webmail.lerctr.org
 with HTTP (HTTP/1.1 POST); Fri, 19 Aug 2016 20:38:51 -0500
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Fri, 19 Aug 2016 20:38:51 -0500
From: Larry Rosenman <ler@lerctr.org>
To: Freebsd fs <freebsd-fs@freebsd.org>
Subject: Duplicate ZAP
Message-ID: <529897b39cc8c04069a4c2b10bec7a7a@thebighonker.lerctr.org>
X-Sender: ler@lerctr.org
User-Agent: Roundcube Webmail/1.2.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2016 01:38:54 -0000

I brought this up in May, and finally had a chance to try the
zfs send|zfs recv thing, and did on all the filesystems containing
these 4 files, but I still have these:

ZFS WARNING: Duplicated ZAP entry detected (libssl.a).
ZFS WARNING: Duplicated ZAP entry detected (libzpool.so).
ZFS WARNING: Duplicated ZAP entry detected (libtinfo_p.a).
ZFS WARNING: Duplicated ZAP entry detected (libumem.so).

The message appears to come out of the dedup code, and I've (long time 
ago) turned
off dedup pool-wide.

Do any of the ZFS experts have any other ideas?

I can give access if you want to look around with zdb.

current world/kernel:
thebighonker.lerctr.org ~ $ uname -aKU
FreeBSD thebighonker.lerctr.org 10.3-STABLE FreeBSD 10.3-STABLE #43 
r301479: Sun Jun  5 22:39:14 CDT 2016     
root@thebighonker.lerctr.org:/usr/obj/usr/src/sys/GENERIC  amd64 1003503 
1003503
thebighonker.lerctr.org ~ $

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281

From owner-freebsd-fs@freebsd.org  Sat Aug 20 07:29:54 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D6788BBF7D8
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 20 Aug 2016 07:29:54 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C61711B84
 for <freebsd-fs@FreeBSD.org>; Sat, 20 Aug 2016 07:29:54 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7K7TsQt041629
 for <freebsd-fs@FreeBSD.org>; Sat, 20 Aug 2016 07:29:54 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 211939] ZFS does not correctly import cache and spares by label
Date: Sat, 20 Aug 2016 07:29:55 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: ben.rubson@gmail.com
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-211939-3630-ssCEN3ABOx@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211939-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-211939-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2016 07:29:54 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211939

--- Comment #1 from Ben RUBSON <ben.rubson@gmail.com> ---
Perhaps we are talking about these 2 commits ?
https://svnweb.freebsd.org/base?view=3Drevision&revision=3D292066
https://svnweb.freebsd.org/base?view=3Drevision&revision=3D293708

I'm not really sure...

Thank you !

Ben

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Sat Aug 20 15:22:35 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9410BC04BF
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 20 Aug 2016 15:22:35 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 701FB1B0E
 for <freebsd-fs@freebsd.org>; Sat, 20 Aug 2016 15:22:35 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u7KFMPNS094804
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Sat, 20 Aug 2016 18:22:26 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u7KFMPNS094804
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u7KFMPYP094803;
 Sat, 20 Aug 2016 18:22:25 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 20 Aug 2016 18:22:25 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Karl Denninger <karl@denninger.net>
Cc: Slawa Olhovchenkov <slw@zxy.spb.ru>, freebsd-fs@freebsd.org
Subject: Re: ZFS ARC under memory pressure
Message-ID: <20160820152225.GP83214@kib.kiev.ua>
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
 <20160818202657.GS8192@zxy.spb.ru>
 <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
 <20160819201840.GA12519@zxy.spb.ru>
 <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2016 15:22:35 -0000

On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote:
> Paging *always* requires one I/O (to write the page(s) to the swap) and
> MAY involve two (to later page it back in.)  It is never a "win" to
> spend a *guaranteed* I/O when you can instead act in a way that *might*
> cause you to (later) need to execute one.
Why would pagedaemon need to write out clean page ?

From owner-freebsd-fs@freebsd.org  Sat Aug 20 16:08:54 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 04B8FBC0194
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 20 Aug 2016 16:08:54 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (denninger.net [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CC5B0167F
 for <freebsd-fs@freebsd.org>; Sat, 20 Aug 2016 16:08:53 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 3FB3F219FD
 for <freebsd-fs@freebsd.org>; Sat, 20 Aug 2016 11:08:51 -0500 (CDT)
Subject: Re: ZFS ARC under memory pressure
References: <20160816193416.GM8192@zxy.spb.ru>
 <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org>
 <20160818202657.GS8192@zxy.spb.ru>
 <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net>
 <20160819201840.GA12519@zxy.spb.ru>
 <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
 <20160820152225.GP83214@kib.kiev.ua>
Cc: freebsd-fs@freebsd.org
From: Karl Denninger <karl@denninger.net>
Message-ID: <97f166f0-4d47-d5a3-ecb3-d15f1ecf9c1f@denninger.net>
Date: Sat, 20 Aug 2016 11:08:44 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160820152225.GP83214@kib.kiev.ua>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms050405090709090407070503"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2016 16:08:54 -0000

This is a cryptographically signed message in MIME format.

--------------ms050405090709090407070503
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 8/20/2016 10:22, Konstantin Belousov wrote:
> On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote:
>> Paging *always* requires one I/O (to write the page(s) to the swap) an=
d
>> MAY involve two (to later page it back in.)  It is never a "win" to
>> spend a *guaranteed* I/O when you can instead act in a way that *might=
*
>> cause you to (later) need to execute one.
> Why would pagedaemon need to write out clean page ?
If you are talking about the case of an executable in which part of the
text is evicted you are correct, however, you are still choosing in that
instance to evict a page for which there will likely be a future demand
and thus require an I/O (should that executable come back up for
execution) as opposed to one for which you have no idea how likely
demand for same will be (a data page in the ARC.)

Since the VM has no means of "coloring" the ARC (as it is opaque other
than the consumption of system memory to the VM) as to how "useful"
(e.g. how often used, etc) a particular data item in the ARC is, it has
no information available on which to decide.  However, the fact that an
executing process is in some sort of waiting state still likely trumps
an ARC data page in terms of likelihood of future access.

root@NewFS:/usr/src/sys/amd64/conf # pstat -s
Device          1K-blocks     Used    Avail Capacity
/dev/mirror/sw.eli  67108860   291356 66817504     0%

While this is not a large amount of page space used I can assure you
that at no time since boot was all 32GB of memory in the machine
consumed with other-than-ARC data.  As such for the VM system to have
decided to evict pages to the swap file rather than the ARC be pared
back is demonstrably wrong since the result was the execution of I/Os on
the *speculative* bet that a page in the ARC would preferentially be
required.

On 10.x, unpatched, there were fairly trivial "added" workload choices
that one might make on a routine basis (e.g. "make -j8 buildworld") on
this machine that, if you had a largish text file open in "vi", would
lead to user-perceived stalls exceeding 10 seconds in length during
which that process's working set had been evicted so as to keep ARC
cache data!  While it might at first blush appear that the Postgres
database consumers on the same machine would be happy with this when
*their* RSS got paged out and *they* took the resulting 10+ second stall
as well that certainly was not the case!

11.x does exhibit far less pathology in this regard than did 10.x
(unpatched) and I've yet to see the "stall system to the point that it
appears it has crashed" behavior that I formerly could provoke with a
trivial test.

However, the fact remains that the same machine, with the same load,
running 10.x and my patches ran for months at a time with zero page
space consumed, a fully-utilized ARC and very little slack space
(defined as RAM in "Cache" + allocated-but-unused UMA)  -- in other
words, with no displayed pathology at all.

The behavior of unpatched 11.x, while very-materially better than
unpatched 10.x, IMHO does not meet this standard.  In particular there
are quite-large quantities of UMA space out-but-unused on a regular basis=

and while *at present* the ARC looks pretty healthy this is a weekend
when system load is quite low. During the week not only does the UMA
situation look far worse so does the ARC size and efficiency which
frequently winds up running at "half-mast" compared to where it ought to =
be.

I believe FreeBSD 11.x can do better and intend to roll forward the 10.x
work in an attempt to implement that.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms050405090709090407070503
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MjAxNjA4NDRaME8GCSqGSIb3DQEJBDFCBEAS
Zl+4p0iIIr2XvXPcFFFHySop9cG9weehGjGjTN2fG8b+6nWgMCkvSGozOg9Ezvojy4PuNEuj
4aJJlOKAnsp8MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAHf4plh2t
fRHIRSFT/S6u8gAkyud9Gq+LnTpO4e2MAvXeNUORco00hBXqa5WW8n0mtUmupmBYMAHsreST
F3sCwmk0yLyK4RqB6rs84/flVvm0GJlwOaHRxeq4B8qGoxUe4KscjiHLfR+YRI1DAHTP5MER
vze4Hk6ANMGUBPlea7Nj6IgAA/pAx8knw3pON0YOnKf6Zb5Rhlbe4pz9I/n7o8BEZ35xfm3o
Of39r9QSQX5Y4IyegpIQjdH1kStAHLA8QmCFbhMpwOi0f6xi/tO0qU18Jhew6y3CqGmAYddN
nBVEV9u0S5JNgClRcV6JZMYjHxT7PyGGRPVtXJ4hKsy0fZxYUNaZ0Ha5fZvabfGAClW1PDLv
sj6DhUvPQ7yXvRFt/ocCQCkGj+UJtHrWcFr75RW6md8/MGnfL386zLLc+/3/h1bm1ig9KRdN
PkJYYMxqmUux3ueNCj0kxlnWcctsXaQpChxrdhTns+yxj+32bHXzDiqR8Me4m1IPQkqdpAW2
KQ0fNlop1E4PguteLdQafmtz6DIdIid4N8hgJ75UevlUf705+nJlZCYTLFATfEAO0liiqZxf
kcuvvU7dmjKFFdH1pfscQDCDbDD5EaHSp7rEShWJbxrOfxc6RoHEWmBzwo/uSlbVh6ZJ+7Gf
4NTodj3yG6NadnGUw1dtmTqjiokAAAAAAAA=
--------------ms050405090709090407070503--

From owner-freebsd-fs@freebsd.org  Sat Aug 20 23:28:29 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 25C71BC0657
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 20 Aug 2016 23:28:29 +0000 (UTC)
 (envelope-from intlightru@vh40.sweb.ru)
Received: from vh40.sweb.ru (unknown [IPv6:2a02:408:7722:1:77:222:42:236])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BF3C81180
 for <freebsd-fs@freebsd.org>; Sat, 20 Aug 2016 23:28:28 +0000 (UTC)
 (envelope-from intlightru@vh40.sweb.ru)
Received: from intlightru by vh40.sweb.ru with local (Exim 4.85_2)
 (envelope-from <intlightru@vh40.sweb.ru>) id 1bbFge-002OEP-J4
 for freebsd-fs@freebsd.org; Sun, 21 Aug 2016 02:28:24 +0300
To: freebsd-fs@freebsd.org
Subject: Courier was unable to deliver the parcel, ID00000131309
X-PHP-Originating-Script: 10900:post.php(4) : regexp code(1) : eval()'d
 code(17) : eval()'d code
Date: Sun, 21 Aug 2016 02:28:24 +0300
From: "FedEx International Ground" <arthur.bowen@piter-snab.ru>
Reply-To: "FedEx International Ground" <arthur.bowen@piter-snab.ru>
Message-ID: <e5434cf0c5e92ddcdf52618d2ab754b6@piter-snab.ru>
X-Priority: 3
MIME-Version: 1.0
X-Sender-Uid: 10900
Content-Type: text/plain; charset=us-ascii
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2016 23:28:29 -0000

Dear Customer,

Courier was unable to deliver the parcel to you.
You can review complete details of your order in the find attached.

Kind regards,
Arthur Bowen,
Station Agent.