From owner-freebsd-fs@freebsd.org Sun Aug 14 05:53:35 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 447D7BB96AA; Sun, 14 Aug 2016 05:53:35 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-yw0-x22e.google.com (mail-yw0-x22e.google.com [IPv6:2607:f8b0:4002:c05::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 061121B7B; Sun, 14 Aug 2016 05:53:35 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-yw0-x22e.google.com with SMTP id j12so12734128ywb.2; Sat, 13 Aug 2016 22:53:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=RaM3dVYlvXRPMFZLb8yN69GZIDDEa5ha9zG8kaX9QNU=; b=gGpbo1vAycUyWOTwN02KNZ5YKpusSMB0rYfBUvAfwNy5tabl1AjFljSKeUrtCF5IAA U9ojzgVyXbxvVOC2N2pcn30N8N+fLvfKf1Uf4Tv5RWSc+s6hujaDjFO2/Gc7NnsHkgEl N+8N4fN3wda5S4dJw54f1SpRUbAK9KKG0rmorWTZqKiowx8EhGGMdLPJ8HGPHgrV2wml 9c+0IXchmshN/oJrMPkS7I6p3MS2mvFIZqHXJ4uyi+FT0Am8PNbq4YtGVZGLRx9PB1R+ b27XX5CEonddvbLoEWKbGOGcn1UyawauDXFQxRfHhZ5Y3TA0DSo3D+hAfa0Q7PYMuJGE JSWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=RaM3dVYlvXRPMFZLb8yN69GZIDDEa5ha9zG8kaX9QNU=; b=BQtdntLXEB4JvMeJ/P1GERqzamebjq/qhYnJalYGRAuyCYXP4PeS3lOQDXwZX7iKyM bEq91OD16GoiQu51oEORkfuq6ZpVcucoIsLhI5ffW4KKK7JINLzFURpqULwJ1O649h0n WeJUOJVUWQjYncc0SvjjgYZ3HDu69g1kNy7JfzWQj6BxTYubnRheiFgJsL1zN92NXZAx icIN2jw0zYUpVoWmMZVlXiCwFubC/nPOG7LsZ6dR3H8ECRLTTTVe7UaocOZwlGRfBW7A jcJNPXWvUi7sL8QOimznNtrOF8/JUUDW6uCBEzEBnTIqm8r3ikxH0QLUYMDkn+1SZ+bZ UVSA== X-Gm-Message-State: AEkooutP2hEOoCavoZ8o3dDfH+7KENe7Cap3UwX6YWkw1p3iGKZ1vhD8QfDXF9wehwyBiP3DRsWE6CA6DmogsQ== X-Received: by 10.13.195.67 with SMTP id f64mr15761801ywd.1.1471154013865; Sat, 13 Aug 2016 22:53:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.161.37 with HTTP; Sat, 13 Aug 2016 22:53:33 -0700 (PDT) From: Zaphod Beeblebrox Date: Sun, 14 Aug 2016 01:53:33 -0400 Message-ID: Subject: ZFS corrupt DVA panic: can it be fixed? To: freebsd-fs , FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2016 05:53:35 -0000 Before this problem, I had a few crashes... which may have been hardware related. The hardware is (I think) fixed, but this problem remains. My searches seem to indicate that this has happened to other people. ... I've pasted here only the first two lines of the last 3 panic's I've had. panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144 #3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1 panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144 #3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1 panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144 #3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1 I gather that the machine runs until something causes the kernel to encounter the corrupt DVA. I gather from reading stuff that this is part of the structure that holds free space on the drive. Since the numbers are the same in each panic, I'm assuming that each panic is encountering the same one. This is also the panic that is not dumping properly to either USB or spinning disk. I have zdb -uuumcD running right now. It seems to estimate that it's going to take an awfully long time, but the estimation might be broken because it's on 159 of 171 of whatever it's reading. Now... question: is this fixable? Can I just mark off the space as unusable, maybe? Since this has happened to more than one person, I gather it's a significant hole in the claim that ZFS is crashproof (or that it doesn't need repair after crashing). Maybe this check can be added to scrub (or scrub + an option)? Or maybe when we run across it, we fix it? Does fixing it (in the theoretical sense) require knowing all the free space on the drive? Doesn't scrub do that? From owner-freebsd-fs@freebsd.org Sun Aug 14 20:04:09 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 90D06BBAD1E for ; Sun, 14 Aug 2016 20:04:09 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-yw0-x22b.google.com (mail-yw0-x22b.google.com [IPv6:2607:f8b0:4002:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 52D441B9E for ; Sun, 14 Aug 2016 20:04:09 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-yw0-x22b.google.com with SMTP id j12so17942397ywb.2 for ; Sun, 14 Aug 2016 13:04:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=KeA8ggAFEDS1QhKdaVX+tKXmWMPLFexTuiAUZpO3A6E=; b=Pfp3BC3y2bCt9vuLBEwWXSi+13dSFJPQMOPkIKdPgkbCde4gLEZnJebUnqal0ai0dA 2GohEuQX9eZWkEMnYpE7KK+C4MQ+tflZhn75bhMjUsH9GGaaaLwXPxQkA5XsekUQLp3X eDb5nXi+UtG5e182b4BzdXMDOLsl5YpdxzCUUH26ZLImX8Xx87JDMWi4bqOQgSZIO1fv wk6mE/1t2BWzFyz3bIMSpazJcqUs6UgJbDaXulFedXLdZ7eslnH86kYocui51glFNPES lErxPRCdUC7xZKKYo/+Rkju+b45JmQIDhnJKj7DOcHalTSVYpJxJQ7v0hqOqqO7aBd6h mzQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=KeA8ggAFEDS1QhKdaVX+tKXmWMPLFexTuiAUZpO3A6E=; b=V6EODAJYjWB1FunRKy4mFL9523HGGwm3cXZ8dYnm3kSOzB4fonDDx0HRme7gxylJcI y0+UVMkpoBBd2BfxbzB9BO5UdhfeiyhW49xl2rMeQMq4UqPXbUkw0TNE/ZIZzNY0GfWw 1gWbrQrjFs/eeJsTDVMdd+yktTPd1ReRWoYeiYEhPpjmqdbQiyzZBL4R1vXqXI0zuJ8I p9IsitavId0MIohFJgkBmeHIe9y3oQhR5dXIJfhghrIGBSVf1AaiJ+1c1z4AUmO2epQO 7CLfioD1HDgDDCJiJFNAFMDDiesznRbne9x6laiN+5IRIJehWJqeAvQTEk5OAK61fDfl aXsw== X-Gm-Message-State: AEkoouvwPpyWki6r/lf1qSdKS4el6YxV4UcB1yAYz3fzvWiwk9Np1M3Akr2sJP52XJ52P/8Il0ueuXqwacTVQA== X-Received: by 10.129.115.131 with SMTP id o125mr17405024ywc.99.1471205047769; Sun, 14 Aug 2016 13:04:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.161.37 with HTTP; Sun, 14 Aug 2016 13:04:07 -0700 (PDT) From: Zaphod Beeblebrox Date: Sun, 14 Aug 2016 16:04:07 -0400 Message-ID: Subject: zfs_recovery=1, zdb, mounted pool? To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2016 20:04:09 -0000 So... I found 319 of the errno 122 errors by running zdb. My question is this: Can I run with zfs_recovery=1 and have zdb fix these (which are free space leaked errors) while the system is running? From owner-freebsd-fs@freebsd.org Sun Aug 14 20:21:52 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88A48BBAFCA for ; Sun, 14 Aug 2016 20:21:52 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com [IPv6:2a00:1450:400c:c09::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1BFEB12F0 for ; Sun, 14 Aug 2016 20:21:52 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x241.google.com with SMTP id o80so7898744wme.0 for ; Sun, 14 Aug 2016 13:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=5R6cu/LHZsRzYjaOMPwyV7bv6xVtn+EQldW/WrjO9vk=; b=dT2l8/2YJBu0B63ZdHdXf8jzTF442PZMPuZpAg2v6gj9Bx1/4TEjhrbk5krXa8hHcn ULCaieAX9zSnKn+CT9FNu2TWQz6mvIwLLChMFK3IlLBLlHeLLlV34WXyB0jluZS4yi6N VrHFOa00HzU6npFZ/g24rHJ1lIlAXx7J6K8TGpI9RDDxGyEdVuhQFHoZGTtBSRPrZiZM YIO21NmbIw8GRT8gNUnj+NxgLNZ7WOsBsaXPF4K9qI1s/RNNTEwPCHHt3Li6o/X94oRQ mhH8RQny6Jdy1tC22SZdMIbPSlRKdlJ1HjGcxsSkfDKLfsnxASnnlxgfJWV2ITsEybx/ HoZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=5R6cu/LHZsRzYjaOMPwyV7bv6xVtn+EQldW/WrjO9vk=; b=D9AbMtvcuxienCGuIK4GbzWOTTRSt51dn5XoN6GxM5dqrnZcMseIAXOXi1jF/6W1zU yr/3fNgqHND/qV9cwRtN22PCd1wc6xfKIssgopDEPBCDVlAhe58mH4pS/lcOfyEUbATl DiAzQhp01G3pSlUPP50zHkDM7Ggpv8SzUZFgBCZndlRPHnlixcRAwXfjjHbwjocdfNVR tzHyCa9Y9J1/CbJSMIRqklkPupMetV0bPWWVxdua0WLiHUC+GuCxzWM0BVuvRJobovil 3G9YvAbwG9BfyPuXVm2TkK2agaf99URu27VLlnx3BdA6u+xLL+nHB207duXExyiKraIo c5Sw== X-Gm-Message-State: AEkoouuIegt27R4DI13nlofqi5fIwjge/ixhreFWms/jUZyLJnbMVIvHHnQhi1R18/LjwQ== X-Received: by 10.194.184.148 with SMTP id eu20mr26620245wjc.137.1471206110284; Sun, 14 Aug 2016 13:21:50 -0700 (PDT) Received: from macbook-air-de-benjamin.home (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85]) by smtp.gmail.com with ESMTPSA id r67sm13134423wmb.14.2016.08.14.13.21.49 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 14 Aug 2016 13:21:49 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: [iSCSI] Trying to reach max disk throughput From: Ben RUBSON In-Reply-To: <7CE3E62B-8251-4390-BD90-CF2F76F57CA7@gmail.com> Date: Sun, 14 Aug 2016 22:21:48 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <6B32251D-49B4-4E61-A5E8-08013B15C82B@gmail.com> <20160810114404.GA80485@brick> <7CE3E62B-8251-4390-BD90-CF2F76F57CA7@gmail.com> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2016 20:21:52 -0000 > On 10 Aug 2016, at 15:27, Ben RUBSON wrote: >=20 >>=20 >> On 10 Aug 2016, at 13:44, Edward Tomasz Napiera=C5=82a = wrote: >>=20 >> On 0810T1154, Ben RUBSON wrote: >>> Hello, >>>=20 >>> I'm facing something strange with iSCSI, I can't manage to reach the = expected disk throughput using one (read or write) thread. >>=20 >> [..] >>=20 >>> ### Initiator : iscsi disk throughput : >>>=20 >>> ## dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920 >>> 10737418240 bytes transferred in 34.731815 secs (309152234 = bytes/sec) - 295MB/s >>>=20 >>> With 2 parallel dd jobs : 345MB/s >>> With 4 parallel dd jobs : 502MB/s >>>=20 >>>=20 >>>=20 >>> ### Questions : >>>=20 >>> Why such a difference ? >>> Where are the 167MB/s (462-295) lost ? >>=20 >> Network delays, I suppose. >=20 > I just saw that iSER is available in FreeBSD 11, let's install BETA4 = and give it a try. OK, as a target I used Linux TGT, as iSER target (isert) is not FreeBSD = available yet. ### Target : local disk throughput, one thread : # dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920 10737418240 bytes (11 GB) copied, 21.3898 s, 502 MB/s ### Initiator : iscsi disk throughput, one thread : # dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920 10737418240 bytes transferred in 34.938676 secs (307321843 bytes/sec) - = 293 MB/s ### Initiator : iSER disk throughput, one thread : # dd if=3D/dev/da8 of=3D/dev/null bs=3D$((128*1024)) count=3D81920 10737418240 bytes transferred in 20.371947 secs (527068838 bytes/sec) - = 502 MB/s No need to comment, let's wait for the FreeBSD iSER target then ! Ben From owner-freebsd-fs@freebsd.org Sun Aug 14 23:21:21 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96AC7BB939A for ; Sun, 14 Aug 2016 23:21:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8640D1180 for ; Sun, 14 Aug 2016 23:21:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7ENLLG3030929 for ; Sun, 14 Aug 2016 23:21:21 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211491] System hangs after "Uptime" on reboot with ZFS Date: Sun, 14 Aug 2016 23:21:21 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA3 X-Bugzilla-Keywords: needs-qa X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: delphij@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: mfc-stable10? mfc-stable11? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2016 23:21:21 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211491 --- Comment #17 from Xin LI --- I can't seem to be able to reproduce this anymore on -CURRENT (currently at r304072), FYI. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sun Aug 14 23:50:28 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8CF3FBB9C77 for ; Sun, 14 Aug 2016 23:50:28 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 DV Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 65EA4151B for ; Sun, 14 Aug 2016 23:50:28 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from Xins-MacBook-Pro.local (c-73-189-16-150.hsd1.ca.comcast.net [73.189.16.150]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 6269E1D6D6; Sun, 14 Aug 2016 16:50:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1471218627; x=1471233027; bh=LOypc6xwXvwwp4k4yGYzcnv4yoVNX9VjN58cV1kwTps=; h=Subject:To:References:Cc:From:Date:In-Reply-To; b=YRwJYD9zDAy4wob3zje0fVIRyxhfpGv4v+oQKRIVveuHsRX7MhlP+zWc0BiSTOXs7 WSE8rIgwVNL1Lq6Cv/jvUFWMSL3c6245yN6yS5M7WPiTkLUFcQbe31W5T+5Izd9L4M 5CkGVRz9Cn6DVJw91svQZ1RQjfGyRAe1EkcaIC8A= Subject: Re: zfs_recovery=1, zdb, mounted pool? To: Zaphod Beeblebrox , freebsd-fs References: Cc: d@delphij.net From: Xin Li Message-ID: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net> Date: Sun, 14 Aug 2016 16:50:22 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2016 23:50:28 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU Content-Type: multipart/mixed; boundary="eFMcj68R53sJjl4IQb2Qkt31WeWffRswv" From: Xin Li To: Zaphod Beeblebrox , freebsd-fs Cc: d@delphij.net Message-ID: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net> Subject: Re: zfs_recovery=1, zdb, mounted pool? References: In-Reply-To: --eFMcj68R53sJjl4IQb2Qkt31WeWffRswv Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/14/16 13:04, Zaphod Beeblebrox wrote: > So... I found 319 of the errno 122 errors by running zdb. My question = is > this: >=20 > Can I run with zfs_recovery=3D1 and have zdb fix these (which are free = space > leaked errors) while the system is running? No. If I was you I would definitely do a full backup to a different place, recreate the pool and restore from the backup. It's not safe to use your pool as-is, don't do it for everybody's sake. Cheers, --eFMcj68R53sJjl4IQb2Qkt31WeWffRswv-- --UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJXsQPBAAoJEJW2GBstM+nsD2QP/2uZvmOfbT4z1c8JeonIYpHi 8iZqKHUXI4LendkOlO42apcQrn/+TtRRg/gfjdxMKUta2S7bEMTpOB2Z0N6z7/5x KBGKvnL79U1oeygOVPFKlPiNEiHzMi2pqQSAAZ6e+stiq7XpcJNgnNm0h8O7Vocn gbD9b2vL3waMrzliJkARg2mdNx87qcVgqEGC6ErlU45I4S4la7/mGhF4fPDVwwIW O2g3472RXxDQMgHeyIdzrb+UbPe/BecMLNfIK/oYYzddMXljfmIX/BpYShUW0Apk 0Fh5YwUwYjj/SpENFiGxvMwujqT3W3uBcNZXViOLy6V5Av4xDLcu9QXI4vQMBP1K rdhy+49grp5Wg5qmgTa5ITur/Rr7NQly/VjXSfbEVSV/36iLr3JueYL6vg9jWt9n nXgDBDgiMyw4Epds39gLCe+U9Vs7OICRlINmukWXB3J9vJZj0utuhadOvkRwfFig iZ7qFTTIBrkb5E64w4Y+Q/JRR2otKDAjhOX8m0aeCIaWVobI1jRrGDu4EBLbdC/l sSFnzA5wjbeYrV7V97PY1I6fRRcU5SQhQfJ1iuxORKT2IExCHiOoKEjOs9dlXBXR umi4xCE2iimWBiQqCug6aDjo+1qyls5elCHw1RXO3vITkfN5jmqhDafVBnwOkN1L 3vr2XgOL8UPFhjj+ns+e =Pcvg -----END PGP SIGNATURE----- --UEgjIwIK2kPTlLodCpwg5g6GbCoItvSPU-- From owner-freebsd-fs@freebsd.org Mon Aug 15 04:46:23 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CC587BB7A9D for ; Mon, 15 Aug 2016 04:46:23 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-yw0-x242.google.com (mail-yw0-x242.google.com [IPv6:2607:f8b0:4002:c05::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8BC4E1A93 for ; Mon, 15 Aug 2016 04:46:23 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-yw0-x242.google.com with SMTP id r9so1929456ywg.2 for ; Sun, 14 Aug 2016 21:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=j+jV/Pjx7AvBSN0jPZspiDandkMlwQSP6YXVWg3guJw=; b=pcQ53+Hjt+fvLv4nkpOtJx1No4UV6eCCAUgRiwT65crHzbtNF8pjcDJRc+OFX5xIyj 1yVRds0kGOQCS/ig1FbSpbthIo18CxxetbTa8Lcnb57hRNLWa7i6F+BI2xK8sg/VD3ZD SWAQx23Q1IGdbGDvoKgbQCuORy9G25almXvuTehJvAO7I0nDa8uvFxG+YQkkSmSpjBsv O6bjdDi26W9HXPSDqGEyl9Z8is0/oEQ++gDNhBoCOjJZvLtqHhRYYMnUPcwLXpSABuJh zlVRa4vJtK50AdrDOSlWkdmT+lSynQmIcv/c5dGlDZbJosYXe8rvLIgw2MryPHnRpznA BxGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=j+jV/Pjx7AvBSN0jPZspiDandkMlwQSP6YXVWg3guJw=; b=QwIl5SyoashtcY3N1JdCjYscMF4uQ3h6EldiejX37UOU/v8ESK1P5uGffwOiKXf6WW xjlP8n0EBFUSosnYKNrskFPpF2Y8caHecJSZS3OKI47s4PdOxKs73slmQEZss4kkJjGc 9GBsrdnbp5FK8gzWGIWEdImmW7p4gRbNmNE8SVPzZy9kPD60WkLSOB56i8ZSU7tvJcic caO6fx+LbQbeq6eIDHELzXjllf7Fs72uTUJ0IO/GTVqx8ij29g4tVCozORat6MgeEPBY PACF77R1P+6BrREpGGg3CvIQ732EISA6P5Iucs2SB42GQ1ujF0MXOJka57ERHUp5nKEl A4iA== X-Gm-Message-State: AEkoouvHDCDN025zWSIVhrM7dljEvdHXkPFyiuthxiIHMdwcMEqYxMnt/6ydBqlOckdmHbnueoUhz0/C8net8w== X-Received: by 10.129.38.212 with SMTP id m203mr20140560ywm.169.1471236382752; Sun, 14 Aug 2016 21:46:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.161.37 with HTTP; Sun, 14 Aug 2016 21:46:22 -0700 (PDT) In-Reply-To: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net> References: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net> From: Zaphod Beeblebrox Date: Mon, 15 Aug 2016 00:46:22 -0400 Message-ID: Subject: Re: zfs_recovery=1, zdb, mounted pool? To: Xin Li Cc: freebsd-fs , Xin LI Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2016 04:46:23 -0000 On Sun, Aug 14, 2016 at 7:50 PM, Xin Li wrote: > > > On 8/14/16 13:04, Zaphod Beeblebrox wrote: > > So... I found 319 of the errno 122 errors by running zdb. My question is > > this: > > > > Can I run with zfs_recovery=1 and have zdb fix these (which are free > space > > leaked errors) while the system is running? > > No. > > If I was you I would definitely do a full backup to a different place, > recreate the pool and restore from the backup. > > It's not safe to use your pool as-is, don't do it for everybody's sake. > So, then, do I start a big bug on this issue? Is there a bug on this issue? Seriously... it appears to have happened to multiple people. From owner-freebsd-fs@freebsd.org Mon Aug 15 05:02:00 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7A4A5BB7F19 for ; Mon, 15 Aug 2016 05:02:00 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 DV Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5B1B41616 for ; Mon, 15 Aug 2016 05:02:00 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from Xins-MBP.home.us.delphij.net (unknown [IPv6:2601:646:8880:a197:c892:fffe:6584:2452]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 9361A171B3; Sun, 14 Aug 2016 22:01:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1471237314; x=1471251714; bh=K6M0CWpskkKz4S8fuMuDRMeRUehDNoDmfrxuNeT2IsM=; h=Subject:To:References:Cc:From:Date:In-Reply-To; b=3XtH2RuT7Dgz2va15o7VZDsBPv3O/hgwzrvWTUU/egLVqqsLeVfjy9XKTRoagAilp 5Dh3+d3GXsiZGKD+JjI78b/tPcEmIJcKVxEngb4xqkZ6aKExs9wDxHN0BKB3j5sifh UEj53UmgQhiR1+MBOnbF0ujzRs2HlnFul93PT7A0= Subject: Re: zfs_recovery=1, zdb, mounted pool? To: Zaphod Beeblebrox References: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net> Cc: d@delphij.net, freebsd-fs From: Xin Li Message-ID: <3b1ad8c4-f073-998b-84ed-f906029572ba@delphij.net> Date: Sun, 14 Aug 2016 22:01:50 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2016 05:02:00 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn Content-Type: multipart/mixed; boundary="1pN9GAoBnfQJ1Bh3cEgdCVRkHI8hALxOB" From: Xin Li To: Zaphod Beeblebrox Cc: d@delphij.net, freebsd-fs Message-ID: <3b1ad8c4-f073-998b-84ed-f906029572ba@delphij.net> Subject: Re: zfs_recovery=1, zdb, mounted pool? References: <3a38203a-e397-9695-b147-2fb46fa92d0a@delphij.net> In-Reply-To: --1pN9GAoBnfQJ1Bh3cEgdCVRkHI8hALxOB Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 8/14/16 21:46, Zaphod Beeblebrox wrote: > On Sun, Aug 14, 2016 at 7:50 PM, Xin Li > wrote: >=20 >=20 >=20 > On 8/14/16 13:04, Zaphod Beeblebrox wrote: > > So... I found 319 of the errno 122 errors by running zdb. My > question is > > this: > > > > Can I run with zfs_recovery=3D1 and have zdb fix these (which are= > free space > > leaked errors) while the system is running? >=20 > No. >=20 > If I was you I would definitely do a full backup to a different pla= ce, > recreate the pool and restore from the backup. >=20 > It's not safe to use your pool as-is, don't do it for everybody's s= ake. >=20 >=20 > So, then, do I start a big bug on this issue? Is there a bug on this > issue? Seriously... it appears to have happened to multiple people. I don't think so -- zfs_recovery is the last resort option that disables certain assertions, which implies that your pool is already damaged beyond repair (i.e. beyond the redundancy margin that ZFS have had built in, e.g. multiple copies of metadata, RAID-Z, etc.), typically as a result of RAM issues. In theory it is possible to rebuild space map and recover the space, but note that space map have sufficient redundancy that, if you have see errors in it that can not be corrected by ZFS's self-healing, it's highly likely that there are much more damage to the pool already. If you don't have a reproduction case for this one that can reliably trigger a leak without hardware issue, I think it would be just a waste of time to file a bug. Cheers, --1pN9GAoBnfQJ1Bh3cEgdCVRkHI8hALxOB-- --8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJXsUzBAAoJEJW2GBstM+ns4/sQAKGk4r1/OkABoUAZHc+v5HT6 g/0ttoSHFK7jorK/Zz1jNHBG27WhN9eRokBZzy9bGvudjmhvbT0BVGJCzqtBo4yo maghRKiQO/9DL4Wwcy+P7w8tWrZrbEQOz8FKp31WO1NiFCw5Pq782kXwBiMcIG+E z9Up7IHIN77cQCsQUZkfzxMS/ZfiVFXex8glOxuXHAnWWK+1uNl+/62fQeYEN93Y JkXYGVGRrdZKD5gDsr6lGIEUyGCInj/QI5g99q5DfumLaZ016xE2GZmmIZnQbRH9 NKbE8O1abBFRt69eOVi6v1ojXm5RAq7td7OLejYcs/RuKsTYZpZxq1xWOFWB6HV6 ql73SBnnt5R3andKE4gP4xjl+3/eWv+hHVdNtGHmmVjth2rRTtUzDeGIdA0TKpcB haHWB3H3xiJwkKLGiCSVLxaI22v48UcQ0GUR6PMHlyQojjWcyoRnZTKZoNNhu/zM 6ZncKWW3J+D9alNA3VAKWOZ0rkaShyWgvM8GGgHSu2GemiR++rN7OpBj3K6ZIYaD I3yh1V0zULDgOQAfXyIeEovdelZy2aLNwtlKg1/nl2Wn7WU3Jo5Zp9q3BH49ZhAN ZyG1bBCWQI9nckb2LF5JJq9jk5SJfboYrwgL1S9aBCcv0YWyMi1FcxURoaRZbMRM gIZEX4XA6oUympA34CSO =xBiX -----END PGP SIGNATURE----- --8BhltCdlINBPOGkaXaUR2rjEMaMk3t0Nn-- From owner-freebsd-fs@freebsd.org Mon Aug 15 12:11:33 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 950C9BBBD31 for ; Mon, 15 Aug 2016 12:11:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 84C1816BE for ; Mon, 15 Aug 2016 12:11:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7FCBXqg048877 for ; Mon, 15 Aug 2016 12:11:33 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211381] L2ARC degraded, repeatedly, on Samsung SSD 950 Pro nvme Date: Mon, 15 Aug 2016 12:11:33 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: braddeicide@hotmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2016 12:11:33 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211381 --- Comment #12 from braddeicide@hotmail.com --- Been running r300039 in previously problematic mismatched sector size configuration for a week, looks good. # geli is 4k, underlying device is 512 diskinfo -v /dev/nvd0p3.eli | grep sectorsize 4096 # sectorsize diskinfo -v /dev/nvd0p3 | grep sectorsize 512 # sectorsize # zfs-stats L2 ARC Summary: (HEALTHY) # Cache is growing pool alloc free read write read write ------------- ----- ----- ----- ----- ----- ----- cache - - - - - - nvd0p3.eli 139G 311G 3 3 40.0K 427K # guess compress_failures weren't a valid indicator kstat.zfs.misc.arcstats.l2_compress_failures: 11644501 kstat.zfs.misc.arcstats.l2_writes_error: 0 --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Mon Aug 15 13:30:19 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C522CBBA83D for ; Mon, 15 Aug 2016 13:30:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B5054152E for ; Mon, 15 Aug 2016 13:30:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7FDUJnh039970 for ; Mon, 15 Aug 2016 13:30:19 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211381] L2ARC degraded, repeatedly, on Samsung SSD 950 Pro nvme Date: Mon, 15 Aug 2016 13:30:19 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2016 13:30:19 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211381 --- Comment #13 from Andriy Gapon --- (In reply to braddeicide from comment #12) "Compress failure" only means that the compression didn't save any space and thus a buffer eligible for the compression was placed in L2ARC without it. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Mon Aug 15 13:31:03 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9352BBA97A for ; Mon, 15 Aug 2016 13:31:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C9064161D for ; Mon, 15 Aug 2016 13:31:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7FDV3jd042156 for ; Mon, 15 Aug 2016 13:31:03 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211381] L2ARC degraded, repeatedly, on Samsung SSD 950 Pro nvme Date: Mon, 15 Aug 2016 13:31:03 +0000 X-Bugzilla-Reason: AssignedTo CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: avg@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2016 13:31:03 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211381 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-fs@FreeBSD.org |avg@FreeBSD.org CC| |freebsd-fs@FreeBSD.org --=20 You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug.= From owner-freebsd-fs@freebsd.org Tue Aug 16 09:09:40 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9BCAABBABFD for ; Tue, 16 Aug 2016 09:09:40 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com [IPv6:2a00:1450:400c:c09::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2B082130B for ; Tue, 16 Aug 2016 09:09:40 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x241.google.com with SMTP id i138so15208734wmf.3 for ; Tue, 16 Aug 2016 02:09:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-transfer-encoding:subject:message-id:date:to :mime-version; bh=YzHYA+vFcmMaXFMLD7do891dojllXu46gyMNBQBp9W8=; b=trZkUelkodfmGSP9Bv1df9DZI2h2rvBjC6rE+u1OSjuyqtZGAzBqWYEwt7fH1afvc8 P52+y0xgWglgEjKUl2MePPviIEozOODZgnYi4JCQsA6dEMo7kglgTR5Gi/BILh4Ls+xe 2FZ8Y1O7BuIb1jFYDyfC4Y33jPuZSx949TkJOVJVg5Vvd9PZxOExs/i3/AAdgS1+yroI egMzyxsSHrpAIpEu/EWDL5r/nV4mXnzSmsqwzTT/K8K6x0Rpma6X0Sy1+gI3pMxBoTQw mzAIxOfqixZay9EyHgWkMUQzAfd/atSVBxXk/Ia7w/NYHfIsYwhrgEPr6K9C50Grc+qu XmoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-transfer-encoding:subject :message-id:date:to:mime-version; bh=YzHYA+vFcmMaXFMLD7do891dojllXu46gyMNBQBp9W8=; b=J6B0sfh8Pv7kecE8PaTR3wvCi2JyqAxjISDppHEG0CB52/PkyxOiyLSlVCsePgEvLi +xvIVqT+xk2HHSp/kI+pQQQo7hieC5Nds+qxZ6j337dLJKeC2sUjtR3CnbWOWEzD6QSW Hdwj/HPGjTAqGKDPQaWrJDPGONPlG4A/nSBdYJFpLxYRHQ+N1VSpSmd7zBMRX2nnspPJ jfeAT2jVSXQgYDKxufv3vRfXu/scTdbDCMul8n8fo14arYO2WOOwyqGAK3bc4BQkTVtf sqi7RUZ59fKetJ/50TahGljZ34JqdmVYnXCbsNrCHOweQjs+vpOQMT7cIWmtKLa+sxc4 YwEw== X-Gm-Message-State: AEkooutCkP81RD6eYyrAOSaHnNOKVb8MqtdHg2OrjgRIgFVk5wPm9wGlJpd6wCyi2yKzLw== X-Received: by 10.28.146.211 with SMTP id u202mr20628860wmd.54.1471338578374; Tue, 16 Aug 2016 02:09:38 -0700 (PDT) Received: from macbook-air-de-benjamin-1.home (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85]) by smtp.gmail.com with ESMTPSA id h7sm25683963wjd.17.2016.08.16.02.09.37 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 16 Aug 2016 02:09:37 -0700 (PDT) From: Ben RUBSON Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: ZFS does not correctly import by label Message-Id: <39AA40F6-1387-4A1D-857D-C43F03FDC240@gmail.com> Date: Tue, 16 Aug 2016 11:09:36 +0200 To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2016 09:09:40 -0000 Hello, Sounds like ZFS does not correctly import cache and spares by label. Example : # zpool add home cache label/G12000KU2RVJAch # zpool add home spare label/G1207PFGPTKDXhm=20 # zpool status home (...) home ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 label/G1203PGGLJWZXhm ONLINE 0 0 0 label/G1204PHKXHJ2Xhm ONLINE 0 0 0 (...) logs mirror-2 ONLINE 0 0 0 label/G12000KU2RVJAlg ONLINE 0 0 0 label/G12010KU22RVAlg ONLINE 0 0 0 (...) cache label/G12000KU2RVJAch ONLINE 0 0 0 spares label/G1207PFGPTKDXhm AVAIL # zpool export home # zpool import -d /dev/label/ home # zpool status home (...) home ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 label/G1203PGGLJWZXhm ONLINE 0 0 0 label/G1204PHKXHJ2Xhm ONLINE 0 0 0 (...) logs mirror-2 ONLINE 0 0 0 label/G12000KU2RVJAlg ONLINE 0 0 0 label/G12010KU22RVAlg ONLINE 0 0 0 (...) cache da5p7 ONLINE 0 0 0 spares da4p1 AVAIL # uname -v FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016 = root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC=20 Did I miss something ? Thank you very much ! Best regards, Ben From owner-freebsd-fs@freebsd.org Tue Aug 16 10:54:55 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0005BBB18B for ; Tue, 16 Aug 2016 10:54:55 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6DBF316E0 for ; Tue, 16 Aug 2016 10:54:55 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x22c.google.com with SMTP id i5so158823531wmg.0 for ; Tue, 16 Aug 2016 03:54:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=Jv3ZAbJDAVkN5huSZfIdJKgL4q7n633iJ/hhieIvFu4=; b=RLQFZELStLTFH+Z0UbK+R2dTSBPaDB/eMotmwn+uTqBfkTyaMIkID1XLls1zrEF6vh +wDlQIfJei7tB7DBywcOPYjck/yZp6U3hfiZ8t6w70G3ZW4cpdQoENw+pau37k4iy5iL qoIhz4bUBmwZ+Rl7OoQRAniUR9Z6ZAgO62gB7xiVrkqahsIFHaOoZYtRNHEIfIN4bkEY OhY1cfXah4fQS2Blk9vZ3V7s4tHw7FH8gPfgpRqsAju1/OZAofA9uh1k1Kg99g/urCTu aYQC2+uYKsOhirubKqVm6mkg2P0AoeUY1ImgGvpH1I1k05IAUcbdlbDzjf2Rjp0tRdMi AeHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=Jv3ZAbJDAVkN5huSZfIdJKgL4q7n633iJ/hhieIvFu4=; b=e4xNPMc/DoFV0lohVpLOIN+CtKuBFNWR576CPOIfFrwtYvatveylnCn3nk2x/dxutr K09jaSWe5Zb1c3BXq5Ro8tvihSHFpxEojFEQK8vMMUguQxf0qSKRcLb+HETarzCLWB7z scJEHADWe/6kIGbB0U6pFKJCONgKB0gmxacPcybgaNx0nF4abiU708XnkV7E48lV3WCH 3cBls/krpAY29xBhGBAR4S/m3lbs/v/226PEzGUSNXP6rIeDPsCwIVSMIIq1HBg4iJXR HRPZUsUjZyPBRM08g46AsQ2kPf0CTc3XEyx8bXOGxhx0S89+pJF98XHMCCbdzMNXF7Yw YRrQ== X-Gm-Message-State: AEkoouvi3BbMPio/RA8XxniA1v0wuO1gX/FEEFIby0N4p70uDijg/Uyb4UAUnDujCudk0w== X-Received: by 10.28.179.139 with SMTP id c133mr5608490wmf.104.1471344893400; Tue, 16 Aug 2016 03:54:53 -0700 (PDT) Received: from macbook-air-de-benjamin-1.home (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85]) by smtp.gmail.com with ESMTPSA id f3sm26163136wjh.2.2016.08.16.03.54.52 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 16 Aug 2016 03:54:52 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: ZFS does not correctly import by label From: Ben RUBSON In-Reply-To: <39AA40F6-1387-4A1D-857D-C43F03FDC240@gmail.com> Date: Tue, 16 Aug 2016 12:54:51 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <2B3ED90E-8659-49F4-BAE4-624724D2FA10@gmail.com> References: <39AA40F6-1387-4A1D-857D-C43F03FDC240@gmail.com> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2016 10:54:56 -0000 > On 16 Aug 2016, at 11:09, Ben RUBSON wrote: >=20 > Hello, >=20 > Sounds like ZFS does not correctly import cache and spares by label. >=20 > Example : >=20 > # zpool add home cache label/G12000KU2RVJAch > # zpool add home spare label/G1207PFGPTKDXhm=20 > # zpool status home > (...) > home ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > label/G1203PGGLJWZXhm ONLINE 0 0 0 > label/G1204PHKXHJ2Xhm ONLINE 0 0 0 > (...) > logs > mirror-2 ONLINE 0 0 0 > label/G12000KU2RVJAlg ONLINE 0 0 0 > label/G12010KU22RVAlg ONLINE 0 0 0 > (...) > cache > label/G12000KU2RVJAch ONLINE 0 0 0 > spares > label/G1207PFGPTKDXhm AVAIL >=20 > # zpool export home > # zpool import -d /dev/label/ home > # zpool status home > (...) > home ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > label/G1203PGGLJWZXhm ONLINE 0 0 0 > label/G1204PHKXHJ2Xhm ONLINE 0 0 0 > (...) > logs > mirror-2 ONLINE 0 0 0 > label/G12000KU2RVJAlg ONLINE 0 0 0 > label/G12010KU22RVAlg ONLINE 0 0 0 > (...) > cache > da5p7 ONLINE 0 0 0 > spares > da4p1 AVAIL >=20 > # uname -v > FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016 = root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC=20 I just tested my pool under FreeBSD 11-RC1, the issue does not occur, = cache and spares are correctly imported by label. Could it be possible to backport required changes to 10.3 ? Many thanks ! Ben From owner-freebsd-fs@freebsd.org Tue Aug 16 19:34:20 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D68C4BBCCFB for ; Tue, 16 Aug 2016 19:34:20 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9E30E1F3C for ; Tue, 16 Aug 2016 19:34:20 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bZk7s-000P7Q-88 for freebsd-fs@freebsd.org; Tue, 16 Aug 2016 22:34:16 +0300 Date: Tue, 16 Aug 2016 22:34:16 +0300 From: Slawa Olhovchenkov To: freebsd-fs@freebsd.org Subject: ZFS ARC under memory pressure Message-ID: <20160816193416.GM8192@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2016 19:34:20 -0000 I see issuses with ZFS ARC inder memory pressure. ZFS ARC size can be dramaticaly reduced, up to arc_min. As I see memory pressure event cause call arc_lowmem and set needfree: arc.c:arc_lowmem needfree = btoc(arc_c >> arc_shrink_shift); After this, arc_available_memory return negative vaules (PAGESIZE * (-needfree)) until needfree is zero. Independent how too much memory freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every loop interation). arc_c droped to minimum value if arc_size fast enough droped. No control current to initial memory allocation. As result, I can see needless arc reclaim, from 10x to 100x times. Can some one check me and comment this? From owner-freebsd-fs@freebsd.org Tue Aug 16 21:02:34 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CD09BBB7FA for ; Tue, 16 Aug 2016 21:02:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4331118D4 for ; Tue, 16 Aug 2016 21:02:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7GL2YYw095143 for ; Tue, 16 Aug 2016 21:02:34 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Tue, 16 Aug 2016 21:02:34 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2016 21:02:34 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 --- Comment #1 from commit-hook@freebsd.org --- A commit references this bug: Author: mckusick Date: Tue Aug 16 21:02:30 UTC 2016 New revision: 304239 URL: https://svnweb.freebsd.org/changeset/base/304239 Log: Bug 211013 reports that a write error to a UFS filesystem running with softupdates panics the kernel. The problem that has been pointed out is that when there is a transient write error on certain metadata blocks, specifically directory blocks (PAGEDEP), inode blocks (INODEDEP), indirect pointer blocks (INDIRDEPS), and cylinder group (BMSAFEMAP, but only when journaling is enabled), we get a panic in one of the routines called by softdep_disk_io_initiation that the I/O is "already started" when we retry the write. These dependency types potentially need to do roll-backs when called by softdep_disk_io_initiation before doing a write and then a roll-forward when called by softdep_disk_write_complete after the I/O completes. The panic happens when there is a transient error. At the top of softdep_disk_write_complete we check to see if the write had an error and if an error occurred we just return. This return is correct most of the time because the main role of the routines called by softdep_disk_write_complete is to process the now-completed dependencies so that the next I/O steps can happen. But for the four types listed above, they do not get to do their rollback operations. This causes the panic when softdep_disk_io_initiation gets called on the second attempt to do the write and the roll-back routines find that the roll-backs have already been done. As an aside I note that there is also the problem that the buffer will have been unlocked and thus made visible to the filesystem and to user applications with the roll-backs in place. The way to resolve the problem is to add a flag to the routines called by softdep_disk_write_complete for the four dependency types noted that indicates whether the write was successful (WRITESUCCEEDED). If the write does not succeed, they do just the roll-backs and then return. If the write was successful they also do their usual processing of the now-completed dependencies. The fix was tested by selectively injecting write errors for buffers holding dependencies of each of the four types noted above and then verifying that the kernel no longer paniced and that following the successful retry of the write that the filesystem could be unmounted and successfully checked cleanly. PR: 211013 Reviewed by: kib Changes: head/sys/ufs/ffs/ffs_softdep.c head/sys/ufs/ffs/softdep.h --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Tue Aug 16 21:25:24 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D26CBBBBFDE for ; Tue, 16 Aug 2016 21:25:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B45E21C05 for ; Tue, 16 Aug 2016 21:25:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7GLPOuF092484 for ; Tue, 16 Aug 2016 21:25:24 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Tue, 16 Aug 2016 21:25:24 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: mckusick@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2016 21:25:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 Kirk McKusick changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mckusick@FreeBSD.org Status|New |In Progress --- Comment #2 from Kirk McKusick --- Patch has been applied. If no further problems are reported, bug will be closed. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 02:01:07 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA1E2BBCF86 for ; Wed, 17 Aug 2016 02:01:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9948319F7 for ; Wed, 17 Aug 2016 02:01:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7H217WO082129 for ; Wed, 17 Aug 2016 02:01:07 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Wed, 17 Aug 2016 02:01:07 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: karl@denninger.net X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 02:01:07 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 --- Comment #3 from karl@denninger.net --- I trashed the card that caused this, but will see if I can reproduce and wi= ll update in any event. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 02:03:31 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA273BBC126 for ; Wed, 17 Aug 2016 02:03:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 996441D23 for ; Wed, 17 Aug 2016 02:03:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7H23V6v068788 for ; Wed, 17 Aug 2016 02:03:31 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Wed, 17 Aug 2016 02:03:31 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: karl@denninger.net X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 02:03:31 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 --- Comment #4 from karl@denninger.net --- (In reply to karl from comment #3) Is this expected to be MFC'd back against 11.0-PRE (and should it apply cleanly?) --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 05:39:00 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57591BBCAAB for ; Wed, 17 Aug 2016 05:39:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 462861EAB for ; Wed, 17 Aug 2016 05:39:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7H5d0sV052423 for ; Wed, 17 Aug 2016 05:39:00 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Wed, 17 Aug 2016 05:39:00 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: mckusick@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 05:39:00 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 --- Comment #5 from Kirk McKusick --- Though I did not specify an MFC, it should apply easily to 11.0. I do plan = to do an MFC to 11.0 once it has been released. Since it is not a common bug I don't want to slow the process of getting 11.0 out the door hence the pause= to MFC. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 07:18:51 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1541BBB3D1 for ; Wed, 17 Aug 2016 07:18:51 +0000 (UTC) (envelope-from mgamsjager@gmail.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7CA241EBA for ; Wed, 17 Aug 2016 07:18:51 +0000 (UTC) (envelope-from mgamsjager@gmail.com) Received: by mail-io0-x230.google.com with SMTP id q83so129090645iod.1 for ; Wed, 17 Aug 2016 00:18:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:cc; bh=x4WMCXJNByJ+2wnajUZDpldZMVbWJTAxjhdsCp+F8KA=; b=IDEBGyWxgEMGm7GnHUZyts679E6/oZR7rqknrzGDsr98OrH3r68LyJhvRPnH01pRK3 w2/MASgcj9/Pof9mPzeaY9s3hq7YdMRYlwX/6PMLkyPDpsMkF8BEvQU8pEZC7MwQTY0y RpWY8GzUthLlppKBtSLdcYWPb3bRoAd5ig85QcDX84MsGE2AspWeMXB4VIobAF9UhxG4 eBB4SoazqMpfoxYHKgrcBNDlXh8c1/hTnJ3SOrV8WLjq/HS6NC1U7kgLQ9+YVV18ZLSf vqwQD2TZ0PeKhUZG5gdPA8OQvKtkXTcOWciOGgFhs4ArUlYfGuPHmiBgo0dK6X+YWWuy MOUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:cc; bh=x4WMCXJNByJ+2wnajUZDpldZMVbWJTAxjhdsCp+F8KA=; b=EMqksGdE/o3vcmdwjA/qhVEf9KIpazsMW7jtwXd3LJWACuWNZyHwxc48QtL18T36Ba TU5qBhIAGXQetJcTLec8pDTJI6EpnMEG/82F6qbUeC3G3Gydec9cPvcIExTfLbXJGFIm f4SrCpTtVsLwbE0UjOlXE4e9I5F9xRd1nFTH+CNUjjPKqVTVjnj9nx6erdHJiPevAVVj KLth+C8uowmxFL7fVb1dJeeYcxRiVEYptZAba7rsGjNJhBD7v1NjbF/0Dlzqt48kqql4 77AbQ00+K6rNejkgRi2j+HqBpyazL6py4h/qiKmpwMkgpu4lrch3JsZhZ57AlxGN01u3 uezQ== X-Gm-Message-State: AEkoouuzKdXx0R/iYtLQXxfo0/EWscY0JiZH4r37fxz+oIbsJN0UXi+w0aogSkrDQHMHNcHxdc0daoeCcKrz3A== X-Received: by 10.107.139.8 with SMTP id n8mr44439153iod.96.1471418330671; Wed, 17 Aug 2016 00:18:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.63.197 with HTTP; Wed, 17 Aug 2016 00:18:20 -0700 (PDT) In-Reply-To: <20160816193416.GM8192@zxy.spb.ru> References: <20160816193416.GM8192@zxy.spb.ru> From: Matthias Gamsjager Date: Wed, 17 Aug 2016 09:18:20 +0200 Message-ID: Subject: Re: ZFS ARC under memory pressure Cc: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 07:18:51 -0000 On 16 August 2016 at 21:34, Slawa Olhovchenkov wrote: > I see issuses with ZFS ARC inder memory pressure. > ZFS ARC size can be dramaticaly reduced, up to arc_min. > > As I see memory pressure event cause call arc_lowmem and set needfree: > > arc.c:arc_lowmem > > needfree = btoc(arc_c >> arc_shrink_shift); > > After this, arc_available_memory return negative vaules (PAGESIZE * > (-needfree)) until needfree is zero. Independent how too much memory > freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= > arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every > loop interation). > > arc_c droped to minimum value if arc_size fast enough droped. > > No control current to initial memory allocation. > > As result, I can see needless arc reclaim, from 10x to 100x times. > > Can some one check me and comment this? > _______________________________________________ > What version are you on? From owner-freebsd-fs@freebsd.org Wed Aug 17 07:31:56 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F30A8BBB9BD for ; Wed, 17 Aug 2016 07:31:56 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) by mx1.freebsd.org (Postfix) with ESMTP id B41CA1A79 for ; Wed, 17 Aug 2016 07:31:55 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 4D78F4C4C83E; Wed, 17 Aug 2016 09:25:33 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hDxnviEepP0m; Wed, 17 Aug 2016 09:25:31 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id 1B6424C4C839; Wed, 17 Aug 2016 09:25:31 +0200 (CEST) Reply-To: juergen.gotteswinter@internetx.com Subject: Re: HAST + ZFS + NFS + CARP References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com> <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan> <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> To: Borja Marcos , freebsd-fs@freebsd.org From: InterNetX - Juergen Gotteswinter Organization: InterNetX GmbH Message-ID: <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> Date: Wed, 17 Aug 2016 09:25:30 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 07:31:57 -0000 Am 11.08.2016 um 11:24 schrieb Borja Marcos: > >> On 11 Aug 2016, at 11:10, Julien Cigar wrote: >> >> As I said in a previous post I tested the zfs send/receive approach (with >> zrep) and it works (more or less) perfectly.. so I concur in all what you >> said, especially about off-site replicate and synchronous replication. >> >> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment, >> I'm in the early tests, haven't done any heavy writes yet, but ATM it >> works as expected, I havent' managed to corrupt the zpool. > > I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport > (Ethernet) for low-level filesystem operations. > > In case something went wrong, that approach could risk corrupting a pool. Although, frankly, > ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some > silent corruption. try dual split import :D i mean, zpool -f import on 2 machines hooked up to the same disk chassis. kaboom, really ugly kaboom. thats what is very likely to happen sooner or later especially when it comes to homegrown automatism solutions. even the commercial parts where much more time/work goes into such solutions fail in a regular manner > > The advantage of ZFS send/receive of datasets is, however, that you can consider it > essentially atomic. A transport corruption should not cause trouble (apart from a failed > "zfs receive") and with snapshot retention you can even roll back. You can’t roll back > zpool replications :) > > ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback > to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets > aren’t mounted and zfs receive won’t complain. > > > Cheers, > > > > > Borja. > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Wed Aug 17 08:53:27 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F88EBBA6EC for ; Wed, 17 Aug 2016 08:53:27 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0D1031016 for ; Wed, 17 Aug 2016 08:53:26 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11]) by proxypop01.sare.net (Postfix) with ESMTPSA id C69B99DCA35; Wed, 17 Aug 2016 10:53:17 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Borja Marcos In-Reply-To: <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> Date: Wed, 17 Aug 2016 10:53:17 +0200 Cc: freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es> References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com> <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan> <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> To: juergen.gotteswinter@internetx.com X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 08:53:27 -0000 > On 17 Aug 2016, at 09:25, InterNetX - Juergen Gotteswinter = wrote: > try dual split import :D i mean, zpool -f import on 2 machines hooked = up > to the same disk chassis. >=20 > kaboom, really ugly kaboom. thats what is very likely to happen sooner > or later especially when it comes to homegrown automatism solutions. > even the commercial parts where much more time/work goes into such > solutions fail in a regular manner Well, don=E2=80=99t expect to father children after shooting your balls! = ;) I am not a big fan of such closely coupled solutions. There are quite some failure modes that can break such a configuration, not just a = brainless =E2=80=9Cdual split import=E2=80=9D as you say :) Misbehaving software (read, a ZFS bug) can render the pool unusable and, = no matter how many redundant servers you have connected to your chassis, you are toast. = Using incremental replication over a network is much more robust, and it offers a lot of fault = isolation. Moreover, you can place the servers in different buildings, etc. Networks even offer a more than reasonable protection from electrical = problems. Especially if you get paranoid and use fiber, in which case protection is absolute. Borja. From owner-freebsd-fs@freebsd.org Wed Aug 17 08:54:41 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80208BBA83D for ; Wed, 17 Aug 2016 08:54:41 +0000 (UTC) (envelope-from julien@perdition.city) Received: from relay-b01.edpnet.be (relay-b01.edpnet.be [212.71.1.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "edpnet.email", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2CCCC1287 for ; Wed, 17 Aug 2016 08:54:40 +0000 (UTC) (envelope-from julien@perdition.city) X-ASG-Debug-ID: 1471424054-0a7ff569f634acc30001-3nHGF7 Received: from mordor.lan (213.219.167.114.bro01.dyn.edpnet.net [213.219.167.114]) by relay-b01.edpnet.be with ESMTP id mK3LosE254GpDeyp (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 17 Aug 2016 10:54:15 +0200 (CEST) X-Barracuda-Envelope-From: julien@perdition.city X-Barracuda-Effective-Source-IP: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114] X-Barracuda-Apparent-Source-IP: 213.219.167.114 Date: Wed, 17 Aug 2016 10:54:13 +0200 From: Julien Cigar To: InterNetX - Juergen Gotteswinter Cc: Borja Marcos , freebsd-fs@freebsd.org Subject: Re: HAST + ZFS + NFS + CARP Message-ID: <20160817085413.GE22506@mordor.lan> X-ASG-Orig-Subj: Re: HAST + ZFS + NFS + CARP References: <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="EXKGNeO8l0xGFBjy" Content-Disposition: inline In-Reply-To: <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> User-Agent: Mutt/1.6.1 (2016-04-27) X-Barracuda-Connect: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114] X-Barracuda-Start-Time: 1471424054 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://212.71.1.221:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 4432 X-Virus-Scanned: by bsmtpd at edpnet.be X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5000 1.0000 0.7500 X-Barracuda-Spam-Score: 0.75 X-Barracuda-Spam-Status: No, SCORE=0.75 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32083 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 08:54:41 -0000 --EXKGNeO8l0xGFBjy Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter = wrote: >=20 >=20 > Am 11.08.2016 um 11:24 schrieb Borja Marcos: > >=20 > >> On 11 Aug 2016, at 11:10, Julien Cigar wrote: > >> > >> As I said in a previous post I tested the zfs send/receive approach (w= ith > >> zrep) and it works (more or less) perfectly.. so I concur in all what = you > >> said, especially about off-site replicate and synchronous replication. > >> > >> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment,= =20 > >> I'm in the early tests, haven't done any heavy writes yet, but ATM it= =20 > >> works as expected, I havent' managed to corrupt the zpool. > >=20 > > I must be too old school, but I don=E2=80=99t quite like the idea of us= ing an essentially unreliable transport > > (Ethernet) for low-level filesystem operations. > >=20 > > In case something went wrong, that approach could risk corrupting a poo= l. Although, frankly, > > ZFS is extremely resilient. One of mine even survived a SAS HBA problem= that caused some > > silent corruption. >=20 > try dual split import :D i mean, zpool -f import on 2 machines hooked up > to the same disk chassis. Yes this is the first thing on the list to avoid .. :) I'm still busy to test the whole setup here, including the=20 MASTER -> BACKUP failover script (CARP), but I think you can prevent that thanks to: - As long as ctld is running on the BACKUP the disks are locked=20 and you can't import the pool (even with -f) for ex (filer2 is the BACKUP): https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f - The shared pool should not be mounted at boot, and you should ensure that the failover script is not executed during boot time too: this is to handle the case wherein both machines turn off and/or re-ignite at the same time. Indeed, the CARP interface can "flip" it's status if both machines are powered on at the same time, for ex: https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and you will have a split-brain scenario - Sometimes you'll need to reboot the MASTER for some $reasons (freebsd-update, etc) and the MASTER -> BACKUP switch should not happen, this can be handled with a trigger file or something like that - I've still have to check if the order is OK, but I think that as long as you shutdown the replication interface and that you adapt the advskew (including the config file) of the CARP interface before the=20 zpool import -f in the failover script you can be relatively confident=20 that nothing will be written on the iSCSI targets - A zpool scrub should be run at regular intervals This is my MASTER -> BACKUP CARP script ATM https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 Julien >=20 > kaboom, really ugly kaboom. thats what is very likely to happen sooner > or later especially when it comes to homegrown automatism solutions. > even the commercial parts where much more time/work goes into such > solutions fail in a regular manner >=20 > >=20 > > The advantage of ZFS send/receive of datasets is, however, that you can= consider it > > essentially atomic. A transport corruption should not cause trouble (ap= art from a failed > > "zfs receive") and with snapshot retention you can even roll back. You = can=E2=80=99t roll back > > zpool replications :) > >=20 > > ZFS receive does a lot of sanity checks as well. As long as your zfs re= ceive doesn=E2=80=99t involve a rollback > > to the latest snapshot, it won=E2=80=99t destroy anything by mistake. J= ust make sure that your replica datasets > > aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain. > >=20 > >=20 > > Cheers, > >=20 > >=20 > >=20 > >=20 > > Borja. > >=20 > >=20 > >=20 > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --=20 Julien Cigar Belgian Biodiversity Platform (http://www.biodiversity.be) PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. --EXKGNeO8l0xGFBjy Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCgAGBQJXtCYyAAoJELK7NxCiBCPAjjgQAOF0zl3cvzfi6jXRoSS141wK lWv3WeLLzjnzdq7k45i1LKRypyC8RRP4AlqCTcKIO/gbVWcKqTXb4VwTymyGhXvW 3dOYOcu38NIwzWZ95dEDT1dqCwKCvtlPzG+VJJ93Kr2jbCeoMxmZTZIgWGibjU46 ES7ozWvj9tMLWrg5blqiTVgsmR1OCEBhiahJvWPHHhOJmm8LAAh/HciT8tLM1Dd1 6skOIawLuGVKnGth12O9TpakuqBds8Ru3jry+1+EeERP6xDZRtJh0IUT2I57gJ2X H8kyB4e4Dg9pVwtvLj7QLZcq7vK821pRrmvKkWo5OIQt8qPRjy2UxXoUbft1nPpK RrMpo0J1Zb0riZoCLaVBkPSXNor9DXqwN2ExfxCq9WUBBYClBLdgxn1EAW0dmVwv LearQLK4BdlCJrIJIQI2hpMiu0qAIfBuNlCsbifZQzbtjEPwk9s1MNDihMhydshc PvSlqNIh1LkfQ4ka7FiYvGzaLfWTi7ZYYVl+SL4UvMX8YmvCdOGOUBf5bOjZkjRI +0SHWic0JDM7R4chYGmTL9WFSFuBnqtNoQyy97c8bimqM2oV4pF7pEN1GfxR9w8Y 2pQ2ghSC40lhCTOUv8tGS3XKzkBp5J4BUSpu7fhhMSI52WJzIvNOwkTLmbnCoEku hMfj6gWoa0TEYf6tj3di =355Z -----END PGP SIGNATURE----- --EXKGNeO8l0xGFBjy-- From owner-freebsd-fs@freebsd.org Wed Aug 17 09:03:03 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5827BBBC081 for ; Wed, 17 Aug 2016 09:03:03 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 050BF1B18 for ; Wed, 17 Aug 2016 09:03:02 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id D9EDB4C4C89E; Wed, 17 Aug 2016 11:02:59 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gqFxgZu8qn7T; Wed, 17 Aug 2016 11:02:57 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id E05084C4C89D; Wed, 17 Aug 2016 11:02:57 +0200 (CEST) Reply-To: juergen.gotteswinter@internetx.com Subject: Re: HAST + ZFS + NFS + CARP References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com> <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan> <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es> To: Borja Marcos Cc: freebsd-fs@freebsd.org From: InterNetX - Juergen Gotteswinter Organization: InterNetX GmbH Message-ID: Date: Wed, 17 Aug 2016 11:02:57 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 09:03:03 -0000 Am 17.08.2016 um 10:53 schrieb Borja Marcos: > >> On 17 Aug 2016, at 09:25, InterNetX - Juergen Gotteswinter wrote: >> try dual split import :D i mean, zpool -f import on 2 machines hooked up >> to the same disk chassis. >> >> kaboom, really ugly kaboom. thats what is very likely to happen sooner >> or later especially when it comes to homegrown automatism solutions. >> even the commercial parts where much more time/work goes into such >> solutions fail in a regular manner > > Well, don’t expect to father children after shooting your balls! ;) > > I am not a big fan of such closely coupled solutions. There are quite > some failure modes that can break such a configuration, not just a brainless > “dual split import” as you say :) > > Misbehaving software (read, a ZFS bug) can render the pool unusable and, no matter how many > redundant servers you have connected to your chassis, you are toast. Using incremental replication > over a network is much more robust, and it offers a lot of fault isolation. Moreover, you can place the > servers in different buildings, etc. in my case it was caused by rsf-1 cluster software > > Networks even offer a more than reasonable protection from electrical problems. Especially if you get > paranoid and use fiber, in which case protection is absolute. > > > > Borja. > From owner-freebsd-fs@freebsd.org Wed Aug 17 09:05:51 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E904BBC258 for ; Wed, 17 Aug 2016 09:05:51 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) by mx1.freebsd.org (Postfix) with ESMTP id 121D21CE0 for ; Wed, 17 Aug 2016 09:05:50 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 0AFB945FC0FB; Wed, 17 Aug 2016 11:05:49 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5tJoAb6RKY8e; Wed, 17 Aug 2016 11:05:46 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id 9E15A4C4C89E; Wed, 17 Aug 2016 11:05:46 +0200 (CEST) Reply-To: juergen.gotteswinter@internetx.com Subject: Re: HAST + ZFS + NFS + CARP References: <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> To: Julien Cigar Cc: Borja Marcos , freebsd-fs@freebsd.org From: InterNetX - Juergen Gotteswinter Organization: InterNetX GmbH Message-ID: <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> Date: Wed, 17 Aug 2016 11:05:46 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160817085413.GE22506@mordor.lan> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 09:05:51 -0000 Am 17.08.2016 um 10:54 schrieb Julien Cigar: > On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote: >> >> >> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>> >>>> On 11 Aug 2016, at 11:10, Julien Cigar wrote: >>>> >>>> As I said in a previous post I tested the zfs send/receive approach (with >>>> zrep) and it works (more or less) perfectly.. so I concur in all what you >>>> said, especially about off-site replicate and synchronous replication. >>>> >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment, >>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it >>>> works as expected, I havent' managed to corrupt the zpool. >>> >>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport >>> (Ethernet) for low-level filesystem operations. >>> >>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly, >>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some >>> silent corruption. >> >> try dual split import :D i mean, zpool -f import on 2 machines hooked up >> to the same disk chassis. > > Yes this is the first thing on the list to avoid .. :) > > I'm still busy to test the whole setup here, including the > MASTER -> BACKUP failover script (CARP), but I think you can prevent > that thanks to: > > - As long as ctld is running on the BACKUP the disks are locked > and you can't import the pool (even with -f) for ex (filer2 is the > BACKUP): > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > > - The shared pool should not be mounted at boot, and you should ensure > that the failover script is not executed during boot time too: this is > to handle the case wherein both machines turn off and/or re-ignite at > the same time. Indeed, the CARP interface can "flip" it's status if both > machines are powered on at the same time, for ex: > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and > you will have a split-brain scenario > > - Sometimes you'll need to reboot the MASTER for some $reasons > (freebsd-update, etc) and the MASTER -> BACKUP switch should not > happen, this can be handled with a trigger file or something like that > > - I've still have to check if the order is OK, but I think that as long > as you shutdown the replication interface and that you adapt the > advskew (including the config file) of the CARP interface before the > zpool import -f in the failover script you can be relatively confident > that nothing will be written on the iSCSI targets > > - A zpool scrub should be run at regular intervals > > This is my MASTER -> BACKUP CARP script ATM > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > > Julien > 100€ question without detailed looking at that script. yes from a first view its super simple, but: why are solutions like rsf-1 such more powerful / featurerich. Theres a reason for, which is that they try to cover every possible situation (which makes more than sense for this). That script works for sure, within very limited cases imho >> >> kaboom, really ugly kaboom. thats what is very likely to happen sooner >> or later especially when it comes to homegrown automatism solutions. >> even the commercial parts where much more time/work goes into such >> solutions fail in a regular manner >> >>> >>> The advantage of ZFS send/receive of datasets is, however, that you can consider it >>> essentially atomic. A transport corruption should not cause trouble (apart from a failed >>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back >>> zpool replications :) >>> >>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback >>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets >>> aren’t mounted and zfs receive won’t complain. >>> >>> >>> Cheers, >>> >>> >>> >>> >>> Borja. >>> >>> >>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Wed Aug 17 09:11:11 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62487BBC574 for ; Wed, 17 Aug 2016 09:11:11 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E75451440 for ; Wed, 17 Aug 2016 09:11:10 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: by mail-wm0-x22c.google.com with SMTP id o80so219354795wme.1 for ; Wed, 17 Aug 2016 02:11:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=2/mQLGUGA0xLtkVNsiIpOFEvVb4XC3yo3Cor9uoHpHg=; b=Fj7dlse4etxfhovQJY5YSDwV658pe8wE/ykwyCPRly37kj41Jz9LX/UChUjMV7bast 3ISzag94nXCxqkkTEqUyxqidxIUbOON26Bc9Yylilzb39VK24va3cfx70W3yS0vQ76zV naNiRZZEbY5xhtFrAKgZ/rPWfAWzuYo3M7SKxUGzDO+UU++wzkSlDK9OOVQsyGxdqh4y p4kExMjsUuAssuv8U1oofjLt9PfQEC4o0xcPzbD2Mei1/ze82Cm+w0T0SScb8zttdsvs 6GvFWupEYJ/M+nZM3bssBDRktoOBCZEW6yzB01v0J553IjufAhD8aPFBNVSaE61IPGif DPYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=2/mQLGUGA0xLtkVNsiIpOFEvVb4XC3yo3Cor9uoHpHg=; b=I5ntNkNPzHNgWwxDjzZsq3di3Kd46YCnrAy5WuFRklGzram7prwGqkeWWIkOzUYvzO aRgCfOrBdavdOZ0tRoq7lE4kvXmdTssXwTqVs4HCnL7b/NHgiCAlMY51F0+AWit3Z1B5 KzeXAEKLIpF15KlxV+kSI5ExfID2TYrk44UEtWH/rCDu96yPjzV3K+GYUijDBXwWwQa7 LEV+1TTh0M2+BEqzAK5P+N4qQy9GgJUBCPNlQTSx30dd7wVBJaeGTxKTg4LHqv7E1UqC TPffA5ZGzZ4mNnWxneFKU6se3AL/j81dfiIoh2lmAogiHYEGZ61VW/YaLW2XiOEKcxeJ 2Jqw== X-Gm-Message-State: AEkoouubrunXDFO1SOPGMa8bgUHSWkEjilpJoqtXSM+7x6CM/bMGv+nfc5ZUiEG5Le4PfBq5ILUM1UmICEQe7Q== X-Received: by 10.194.175.106 with SMTP id bz10mr42852112wjc.42.1471425069456; Wed, 17 Aug 2016 02:11:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.54.202 with HTTP; Wed, 17 Aug 2016 02:11:08 -0700 (PDT) In-Reply-To: <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es> References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com> <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan> <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es> From: krad Date: Wed, 17 Aug 2016 10:11:08 +0100 Message-ID: Subject: Re: HAST + ZFS + NFS + CARP To: Borja Marcos Cc: juergen.gotteswinter@internetx.com, FreeBSD FS Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 09:11:11 -0000 I totally agree here i would used some batch replication in general. Yes it doesnt provide the ha you require, but then if you need that maybe a different approach like a distributed file system is a better solution. Even then though I would still have my standard replication to a box not part of the distributed filesystem via rsync or something, just for ass covering. Admittedly this gets problematic when the datasets have large deltas and/or objects. On 17 August 2016 at 09:53, Borja Marcos wrote: > > > On 17 Aug 2016, at 09:25, InterNetX - Juergen Gotteswinter < > juergen.gotteswinter@internetx.com> wrote: > > try dual split import :D i mean, zpool -f import on 2 machines hooked u= p > > to the same disk chassis. > > > > kaboom, really ugly kaboom. thats what is very likely to happen sooner > > or later especially when it comes to homegrown automatism solutions. > > even the commercial parts where much more time/work goes into such > > solutions fail in a regular manner > > Well, don=E2=80=99t expect to father children after shooting your balls! = ;) > > I am not a big fan of such closely coupled solutions. There are quite > some failure modes that can break such a configuration, not just a > brainless > =E2=80=9Cdual split import=E2=80=9D as you say :) > > Misbehaving software (read, a ZFS bug) can render the pool unusable and, > no matter how many > redundant servers you have connected to your chassis, you are toast. Usin= g > incremental replication > over a network is much more robust, and it offers a lot of fault > isolation. Moreover, you can place the > servers in different buildings, etc. > > Networks even offer a more than reasonable protection from electrical > problems. Especially if you get > paranoid and use fiber, in which case protection is absolute. > > > > Borja. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Wed Aug 17 09:15:52 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E02C2BBC8EB for ; Wed, 17 Aug 2016 09:15:52 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu1176c.smtpx.saremail.com (cu1176c.smtpx.saremail.com [195.16.148.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9F6EC1A88 for ; Wed, 17 Aug 2016 09:15:52 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11]) by proxypop02.sare.net (Postfix) with ESMTPSA id 35D0B9DC642; Wed, 17 Aug 2016 11:15:49 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Borja Marcos In-Reply-To: Date: Wed, 17 Aug 2016 11:15:48 +0200 Cc: juergen.gotteswinter@internetx.com, FreeBSD FS Content-Transfer-Encoding: quoted-printable Message-Id: <7EECBD48-5980-4387-8AAE-91D89F576DA1@sarenet.es> References: <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com> <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan> <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <1AE36A3B-A2BA-47D2-A872-1E7E9EFA201D@sarenet.es> To: krad X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 09:15:53 -0000 > On 17 Aug 2016, at 11:11, krad wrote: >=20 > I totally agree here i would used some batch replication in general. = Yes it doesnt provide the ha you require, but then if you need that = maybe a different approach like a distributed file system is a better = solution. Even then though I would still have my standard replication to = a box not part of the distributed filesystem via rsync or something, = just for ass covering. Admittedly this gets problematic when the = datasets have large deltas and/or objects. If your deltas are large you need a network with enough bandwidth to = support it anyway. And rsync can be a nightmare depending on the number of files you keep and their sizes. That=E2=80=99s an = advantage of ZFS. In simple terms, an incremental send just copies a = portion of a transaction log together with its associated data blocks. The = number of files does not hurt performance so much as it does with rsync, which can be unusable. And if you have real time requirements for replication (databases) using = the built-in mechanisms in your DBMS will be generally more robust. Borja. From owner-freebsd-fs@freebsd.org Wed Aug 17 10:05:23 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05C67BBDC85 for ; Wed, 17 Aug 2016 10:05:23 +0000 (UTC) (envelope-from julien@perdition.city) Received: from relay-b02.edpnet.be (relay-b02.edpnet.be [212.71.1.222]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "edpnet.email", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A051015BA for ; Wed, 17 Aug 2016 10:05:22 +0000 (UTC) (envelope-from julien@perdition.city) X-ASG-Debug-ID: 1471427542-0a7b8d2a6d1db9eb0001-3nHGF7 Received: from mordor.lan (213.219.167.114.bro01.dyn.edpnet.net [213.219.167.114]) by relay-b02.edpnet.be with ESMTP id rmxDiDGxUMqjvQIQ (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 17 Aug 2016 11:52:24 +0200 (CEST) X-Barracuda-Envelope-From: julien@perdition.city X-Barracuda-Effective-Source-IP: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114] X-Barracuda-Apparent-Source-IP: 213.219.167.114 Date: Wed, 17 Aug 2016 11:52:22 +0200 From: Julien Cigar To: InterNetX - Juergen Gotteswinter Cc: Borja Marcos , freebsd-fs@freebsd.org Subject: Re: HAST + ZFS + NFS + CARP Message-ID: <20160817095222.GG22506@mordor.lan> X-ASG-Orig-Subj: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="I3tAPq1Rm2pUxvsp" Content-Disposition: inline In-Reply-To: <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> User-Agent: Mutt/1.6.1 (2016-04-27) X-Barracuda-Connect: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114] X-Barracuda-Start-Time: 1471427543 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://212.71.1.222:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 5396 X-Virus-Scanned: by bsmtpd at edpnet.be X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5000 1.0000 0.0100 X-Barracuda-Spam-Score: 0.01 X-Barracuda-Spam-Status: No, SCORE=0.01 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32085 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 10:05:23 -0000 --I3tAPq1Rm2pUxvsp Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter = wrote: >=20 >=20 > Am 17.08.2016 um 10:54 schrieb Julien Cigar: > > On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswin= ter wrote: > >> > >> > >> Am 11.08.2016 um 11:24 schrieb Borja Marcos: > >>> > >>>> On 11 Aug 2016, at 11:10, Julien Cigar wrote: > >>>> > >>>> As I said in a previous post I tested the zfs send/receive approach = (with > >>>> zrep) and it works (more or less) perfectly.. so I concur in all wha= t you > >>>> said, especially about off-site replicate and synchronous replicatio= n. > >>>> > >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment= ,=20 > >>>> I'm in the early tests, haven't done any heavy writes yet, but ATM i= t=20 > >>>> works as expected, I havent' managed to corrupt the zpool. > >>> > >>> I must be too old school, but I don=E2=80=99t quite like the idea of = using an essentially unreliable transport > >>> (Ethernet) for low-level filesystem operations. > >>> > >>> In case something went wrong, that approach could risk corrupting a p= ool. Although, frankly, > >>> ZFS is extremely resilient. One of mine even survived a SAS HBA probl= em that caused some > >>> silent corruption. > >> > >> try dual split import :D i mean, zpool -f import on 2 machines hooked = up > >> to the same disk chassis. > >=20 > > Yes this is the first thing on the list to avoid .. :) > >=20 > > I'm still busy to test the whole setup here, including the=20 > > MASTER -> BACKUP failover script (CARP), but I think you can prevent > > that thanks to: > >=20 > > - As long as ctld is running on the BACKUP the disks are locked=20 > > and you can't import the pool (even with -f) for ex (filer2 is the > > BACKUP): > > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > >=20 > > - The shared pool should not be mounted at boot, and you should ensure > > that the failover script is not executed during boot time too: this is > > to handle the case wherein both machines turn off and/or re-ignite at > > the same time. Indeed, the CARP interface can "flip" it's status if both > > machines are powered on at the same time, for ex: > > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and > > you will have a split-brain scenario > >=20 > > - Sometimes you'll need to reboot the MASTER for some $reasons > > (freebsd-update, etc) and the MASTER -> BACKUP switch should not > > happen, this can be handled with a trigger file or something like that > >=20 > > - I've still have to check if the order is OK, but I think that as long > > as you shutdown the replication interface and that you adapt the > > advskew (including the config file) of the CARP interface before the=20 > > zpool import -f in the failover script you can be relatively confident= =20 > > that nothing will be written on the iSCSI targets > >=20 > > - A zpool scrub should be run at regular intervals > >=20 > > This is my MASTER -> BACKUP CARP script ATM > > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > >=20 > > Julien > >=20 >=20 > 100=E2=82=AC question without detailed looking at that script. yes from a= first > view its super simple, but: why are solutions like rsf-1 such more > powerful / featurerich. Theres a reason for, which is that they try to > cover every possible situation (which makes more than sense for this). I've never used "rsf-1" so I can't say much more about it, but I have=20 no doubts about it's ability to handle "complex situations", where=20 multiple nodes / networks are involved. >=20 > That script works for sure, within very limited cases imho >=20 > >> > >> kaboom, really ugly kaboom. thats what is very likely to happen sooner > >> or later especially when it comes to homegrown automatism solutions. > >> even the commercial parts where much more time/work goes into such > >> solutions fail in a regular manner > >> > >>> > >>> The advantage of ZFS send/receive of datasets is, however, that you c= an consider it > >>> essentially atomic. A transport corruption should not cause trouble (= apart from a failed > >>> "zfs receive") and with snapshot retention you can even roll back. Yo= u can=E2=80=99t roll back > >>> zpool replications :) > >>> > >>> ZFS receive does a lot of sanity checks as well. As long as your zfs = receive doesn=E2=80=99t involve a rollback > >>> to the latest snapshot, it won=E2=80=99t destroy anything by mistake.= Just make sure that your replica datasets > >>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain. > >>> > >>> > >>> Cheers, > >>> > >>> > >>> > >>> > >>> Borja. > >>> > >>> > >>> > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >>> > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >=20 --=20 Julien Cigar Belgian Biodiversity Platform (http://www.biodiversity.be) PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. --I3tAPq1Rm2pUxvsp Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCgAGBQJXtDPSAAoJELK7NxCiBCPAQWQP/RBRHxh6kwjEjfVRPQd3y9ky omHqCV+ej068aB0J0D44wXdFKYWrIPNX28Mfg5muaIWZvRmwUH2zLKNgxLFKpzNS y8XyY0SktMzsBYZVHicu6US/l+5+BTfNes2HTdB0592etvtPuSW/E6xZCwwe4mga XZmc4vNByAViWqnH6+B7cQTviLx3K8ZQU2JRZMrrkLKOqjoOH5K6xrc4rq67jU0z j9t2kQ90X8cdMEMdWuz8o4NCZtM3T70sjswHPvd/8GwBKdsVlJlQuhQNECIPYsGz bvh4t37HK3SkL2k91JgPysWdqNxoUuF8Q4wg91Vn+0riWvdVxyJpWODu+y1qLXk9 eUNYU/bWAXz2iPuKw41JwclvQfFhG5+ND1Q9WyqR3I5QMxZub5T/64mgRNu2wTZ+ bXeKgjq6bhM55L2GzHyl5LGZOkxWK+HTpgBuPATE27Ya0Ass3EEB86aXBsylkMqD dnNfht3QAv1xKsXzteoaiJ2t0Hcyzu2vqdScE9oJY8/k8aiHl9JXMoCo932MogYU mZGkydJrT2BxqvAbSo83e+fg+IwVLsiKU1zFATTztT9fIXlYmlAjMaoC4h9yrYBb pMo5X8ThyY8wduglq7V+zikRWWBohRn/jInDMKRWsExzAQvFAFyWyHafxOxMk8E2 bwPvxdjqwagH4b1S7a5D =bUas -----END PGP SIGNATURE----- --I3tAPq1Rm2pUxvsp-- From owner-freebsd-fs@freebsd.org Wed Aug 17 10:55:36 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CCEEBBCEB0 for ; Wed, 17 Aug 2016 10:55:36 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4C6AD1959 for ; Wed, 17 Aug 2016 10:55:36 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HAtZNv002911 for ; Wed, 17 Aug 2016 10:55:36 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Wed, 17 Aug 2016 10:55:35 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: bdrewery@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 10:55:36 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 Bryan Drewery changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bdrewery@FreeBSD.org --- Comment #6 from Bryan Drewery --- Depending on how long you wanted to let this be tested in head, it may very well make the timeline to MFC into releng/11.0. Even if an unrare case, it seems worth it to me to merge this if it fits into the timeline. How long were you wanting to let this bake in head? --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 11:33:45 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 06FFEBBC125 for ; Wed, 17 Aug 2016 11:33:45 +0000 (UTC) (envelope-from julien@perdition.city) Received: from relay-b01.edpnet.be (relay-b01.edpnet.be [212.71.1.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "edpnet.email", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ADFD314AE for ; Wed, 17 Aug 2016 11:33:44 +0000 (UTC) (envelope-from julien@perdition.city) X-ASG-Debug-ID: 1471433619-0a7ff52c9e23b260001-3nHGF7 Received: from mordor.lan (213.219.167.114.bro01.dyn.edpnet.net [213.219.167.114]) by relay-b01.edpnet.be with ESMTP id Ml8DFxGODwBFIhWx (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 17 Aug 2016 13:33:41 +0200 (CEST) X-Barracuda-Envelope-From: julien@perdition.city X-Barracuda-Effective-Source-IP: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114] X-Barracuda-Apparent-Source-IP: 213.219.167.114 Date: Wed, 17 Aug 2016 13:33:39 +0200 From: Julien Cigar To: InterNetX - Juergen Gotteswinter Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS + NFS + CARP Message-ID: <20160817113339.GH22506@mordor.lan> X-ASG-Orig-Subj: Re: HAST + ZFS + NFS + CARP References: <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="qVHblb/y9DPlgkHs" Content-Disposition: inline In-Reply-To: <20160817095222.GG22506@mordor.lan> User-Agent: Mutt/1.6.1 (2016-04-27) X-Barracuda-Connect: 213.219.167.114.bro01.dyn.edpnet.net[213.219.167.114] X-Barracuda-Start-Time: 1471433620 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://212.71.1.221:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 6138 X-Virus-Scanned: by bsmtpd at edpnet.be X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.5000 1.0000 0.0100 X-Barracuda-Spam-Score: 0.01 X-Barracuda-Spam-Status: No, SCORE=0.01 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32086 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 11:33:45 -0000 --qVHblb/y9DPlgkHs Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 17, 2016 at 11:52:22AM +0200, Julien Cigar wrote: > On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinte= r wrote: > >=20 > >=20 > > Am 17.08.2016 um 10:54 schrieb Julien Cigar: > > > On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gottesw= inter wrote: > > >> > > >> > > >> Am 11.08.2016 um 11:24 schrieb Borja Marcos: > > >>> > > >>>> On 11 Aug 2016, at 11:10, Julien Cigar wro= te: > > >>>> > > >>>> As I said in a previous post I tested the zfs send/receive approac= h (with > > >>>> zrep) and it works (more or less) perfectly.. so I concur in all w= hat you > > >>>> said, especially about off-site replicate and synchronous replicat= ion. > > >>>> > > >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the mome= nt,=20 > > >>>> I'm in the early tests, haven't done any heavy writes yet, but ATM= it=20 > > >>>> works as expected, I havent' managed to corrupt the zpool. > > >>> > > >>> I must be too old school, but I don=E2=80=99t quite like the idea o= f using an essentially unreliable transport > > >>> (Ethernet) for low-level filesystem operations. > > >>> > > >>> In case something went wrong, that approach could risk corrupting a= pool. Although, frankly, > > >>> ZFS is extremely resilient. One of mine even survived a SAS HBA pro= blem that caused some > > >>> silent corruption. > > >> > > >> try dual split import :D i mean, zpool -f import on 2 machines hooke= d up > > >> to the same disk chassis. > > >=20 > > > Yes this is the first thing on the list to avoid .. :) > > >=20 > > > I'm still busy to test the whole setup here, including the=20 > > > MASTER -> BACKUP failover script (CARP), but I think you can prevent > > > that thanks to: > > >=20 > > > - As long as ctld is running on the BACKUP the disks are locked=20 > > > and you can't import the pool (even with -f) for ex (filer2 is the > > > BACKUP): > > > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > > >=20 > > > - The shared pool should not be mounted at boot, and you should ensure > > > that the failover script is not executed during boot time too: this is > > > to handle the case wherein both machines turn off and/or re-ignite at > > > the same time. Indeed, the CARP interface can "flip" it's status if b= oth > > > machines are powered on at the same time, for ex: > > > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and > > > you will have a split-brain scenario > > >=20 > > > - Sometimes you'll need to reboot the MASTER for some $reasons > > > (freebsd-update, etc) and the MASTER -> BACKUP switch should not > > > happen, this can be handled with a trigger file or something like that > > >=20 > > > - I've still have to check if the order is OK, but I think that as lo= ng > > > as you shutdown the replication interface and that you adapt the > > > advskew (including the config file) of the CARP interface before the= =20 > > > zpool import -f in the failover script you can be relatively confiden= t=20 > > > that nothing will be written on the iSCSI targets > > >=20 > > > - A zpool scrub should be run at regular intervals > > >=20 > > > This is my MASTER -> BACKUP CARP script ATM > > > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > > >=20 > > > Julien > > >=20 > >=20 > > 100=E2=82=AC question without detailed looking at that script. yes from= a first > > view its super simple, but: why are solutions like rsf-1 such more > > powerful / featurerich. Theres a reason for, which is that they try to > > cover every possible situation (which makes more than sense for this). >=20 > I've never used "rsf-1" so I can't say much more about it, but I have=20 > no doubts about it's ability to handle "complex situations", where=20 > multiple nodes / networks are involved. BTW for simple cases (two nodes, same network, one active node, ...) we could use both: ZFS + iSCSI + CARP on the two nodes, and=20 zfs send|zfs receive on a third one >=20 > >=20 > > That script works for sure, within very limited cases imho > >=20 > > >> > > >> kaboom, really ugly kaboom. thats what is very likely to happen soon= er > > >> or later especially when it comes to homegrown automatism solutions. > > >> even the commercial parts where much more time/work goes into such > > >> solutions fail in a regular manner > > >> > > >>> > > >>> The advantage of ZFS send/receive of datasets is, however, that you= can consider it > > >>> essentially atomic. A transport corruption should not cause trouble= (apart from a failed > > >>> "zfs receive") and with snapshot retention you can even roll back. = You can=E2=80=99t roll back > > >>> zpool replications :) > > >>> > > >>> ZFS receive does a lot of sanity checks as well. As long as your zf= s receive doesn=E2=80=99t involve a rollback > > >>> to the latest snapshot, it won=E2=80=99t destroy anything by mistak= e. Just make sure that your replica datasets > > >>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain. > > >>> > > >>> > > >>> Cheers, > > >>> > > >>> > > >>> > > >>> > > >>> Borja. > > >>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> freebsd-fs@freebsd.org mailing list > > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.or= g" > > >>> > > >> _______________________________________________ > > >> freebsd-fs@freebsd.org mailing list > > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > >=20 >=20 > --=20 > Julien Cigar > Belgian Biodiversity Platform (http://www.biodiversity.be) > PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 > No trees were killed in the creation of this message. > However, many electrons were terribly inconvenienced. --=20 Julien Cigar Belgian Biodiversity Platform (http://www.biodiversity.be) PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. --qVHblb/y9DPlgkHs Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCgAGBQJXtEuQAAoJELK7NxCiBCPA09UP/0Z7hUd/IzKJJRZ038i0Js1c 4tWlU8vuN3wP3ASg1hX4+UZzGnT8U5IvW0jIWKj5BN4e4fS5MnMnz1CoM31eUJdM /CZUOw6RMX/uRnKCQwgkGyvtOr3kSXmbJ6lFu9dBNbtj4YQosqR9GsaJOl4zJoXT XQN7gbzVefFlO0FXeu9OtJwv1GYb0oFNcvOqqujM+nXrNfW0Y9jQF85QSZZmDnz9 LJrjy2JhZPQmqiM8QGSytl/XMKNbdlKijm5dLmBSUNMoSnPFW24zf7ORMzBgk25v M/h2tnsg/pKY1iNDJAlbQ/Qa+4VSWw4sjdIiVyLjUUD6x9GbEJ48m0Bx9tIZJzH6 LzX0Q6cNtmluvPSQt2UGEqVGgdogSCkP8HNbaeYeRw38P172Muc5yZ535ej0Z8CJ /pPxruN/yIZPCS0FLIFJyt8O7J/lNnKOzt7K5YDXPadLfXe23EatKAI3EerjY2vc JdtTah2GKPp16Qag1sSK2wpdRIxXJUbuz5kRk6ZdgC/RmdsT63Q8h9X6RM7lx7W1 hW6Wlk/ApEpx2BRlUNWWKhfRdKvyDqQ0DW6tRQCDsXPk8usaaerUByUPjIEsAPD8 s8gh4CyJp5hbK47uGMthURRCmE6xzAbyefGy7TLVmDohgW6JAfAnB1JvcWbHjpMz 39snWvMT/9HH0UojgpqX =Urre -----END PGP SIGNATURE----- --qVHblb/y9DPlgkHs-- From owner-freebsd-fs@freebsd.org Wed Aug 17 13:33:59 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D3B7BBD1FF for ; Wed, 17 Aug 2016 13:33:59 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B36A2168B for ; Wed, 17 Aug 2016 13:33:58 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: by mail-wm0-x22d.google.com with SMTP id q128so199330985wma.1 for ; Wed, 17 Aug 2016 06:33:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=1pQ9J0u2kWdyDKgpDBmQZlA7DwF9tH9t6h79mTx4eb0=; b=jFh9FT/lawwdWOLWFE7m2Vqv4oeAgj2gKYn3IIrTn+v3XnP1fFkA8WnbdNfbzGUAS2 fbrrrEUmshl+GZQ8KtFqBWOx8jP17+n/T2v9t0vD4fJTRBVcEbSSZbU/uz3XANKN2znL CFyjBXKoKi2lWmVOCV+BhTb5IlyC2X2eh/9UQgv1LbOwHp2FEDCnHc6SGiXw0/kEOvRa JEX5f+wuP1t9dwuY67Qpgtsn/qq6pHdsCIUn95TnxmLRi6By9N/rfWvY+UqR64ofNaV6 GZ3CV3OG619PsIEKPYj5OeeqaPTIe0C/+1FDDSEghiUXuqZj+8ITnwboi8QeoLJRSCEC Xkkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=1pQ9J0u2kWdyDKgpDBmQZlA7DwF9tH9t6h79mTx4eb0=; b=A9EQzV/S0XUZq+fqV5LIEHe3iRfX5YASinHbpX59Bp/jhz8ljNDCSKXeMx1/7ahpNl SZeDWOwfLkdeeZpalkO1b/jL0JPw+FsBfTayfIthIFle+j6OYYBdFymsNVMZlnedh7ri budfha7DWbXO/igSuNlti5/EzV0bMdCRUK10jZuPe/PNvG7VaTgYsRDd/MHZKzYJXtcr pLabTbDiagb/PCDEuvOSU23iWmJuwwppVThoyKQTEpNkZOmfebHQW7WCp6ejszKEniIq 6Yz8WaPQrywi6trnvvqMWYIp+7inQbIVcM5wcf8lOkV+/3BzZnFGipxiNEc1Ki+dMtsx gNXA== X-Gm-Message-State: AEkooutVFoX6aS9XVRwkrAdVdIRXg007jldAUEsYh4sLqaXDoQq2lhOhsnnr0HrgDAc7tEayDUJajKB3fAKm4g== X-Received: by 10.194.175.106 with SMTP id bz10mr44195283wjc.42.1471440837173; Wed, 17 Aug 2016 06:33:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.54.202 with HTTP; Wed, 17 Aug 2016 06:33:56 -0700 (PDT) In-Reply-To: <20160817113339.GH22506@mordor.lan> References: <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <20160817113339.GH22506@mordor.lan> From: krad Date: Wed, 17 Aug 2016 14:33:56 +0100 Message-ID: Subject: Re: HAST + ZFS + NFS + CARP To: Julien Cigar Cc: InterNetX - Juergen Gotteswinter , FreeBSD FS Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 13:33:59 -0000 What are peoples experiences on running something like moosfs on top of zfs? It looks really compelling on certain levels, but i'm not sure about the reality in a production network yet. On 17 August 2016 at 12:33, Julien Cigar wrote: > On Wed, Aug 17, 2016 at 11:52:22AM +0200, Julien Cigar wrote: > > On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen > Gotteswinter wrote: > > > > > > > > > Am 17.08.2016 um 10:54 schrieb Julien Cigar: > > > > On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen > Gotteswinter wrote: > > > >> > > > >> > > > >> Am 11.08.2016 um 11:24 schrieb Borja Marcos: > > > >>> > > > >>>> On 11 Aug 2016, at 11:10, Julien Cigar > wrote: > > > >>>> > > > >>>> As I said in a previous post I tested the zfs send/receive > approach (with > > > >>>> zrep) and it works (more or less) perfectly.. so I concur in all > what you > > > >>>> said, especially about off-site replicate and synchronous > replication. > > > >>>> > > > >>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the > moment, > > > >>>> I'm in the early tests, haven't done any heavy writes yet, but > ATM it > > > >>>> works as expected, I havent' managed to corrupt the zpool. > > > >>> > > > >>> I must be too old school, but I don=E2=80=99t quite like the idea= of using > an essentially unreliable transport > > > >>> (Ethernet) for low-level filesystem operations. > > > >>> > > > >>> In case something went wrong, that approach could risk corrupting > a pool. Although, frankly, > > > >>> ZFS is extremely resilient. One of mine even survived a SAS HBA > problem that caused some > > > >>> silent corruption. > > > >> > > > >> try dual split import :D i mean, zpool -f import on 2 machines > hooked up > > > >> to the same disk chassis. > > > > > > > > Yes this is the first thing on the list to avoid .. :) > > > > > > > > I'm still busy to test the whole setup here, including the > > > > MASTER -> BACKUP failover script (CARP), but I think you can preven= t > > > > that thanks to: > > > > > > > > - As long as ctld is running on the BACKUP the disks are locked > > > > and you can't import the pool (even with -f) for ex (filer2 is the > > > > BACKUP): > > > > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > > > > > > > > - The shared pool should not be mounted at boot, and you should > ensure > > > > that the failover script is not executed during boot time too: this > is > > > > to handle the case wherein both machines turn off and/or re-ignite = at > > > > the same time. Indeed, the CARP interface can "flip" it's status if > both > > > > machines are powered on at the same time, for ex: > > > > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf > and > > > > you will have a split-brain scenario > > > > > > > > - Sometimes you'll need to reboot the MASTER for some $reasons > > > > (freebsd-update, etc) and the MASTER -> BACKUP switch should not > > > > happen, this can be handled with a trigger file or something like > that > > > > > > > > - I've still have to check if the order is OK, but I think that as > long > > > > as you shutdown the replication interface and that you adapt the > > > > advskew (including the config file) of the CARP interface before th= e > > > > zpool import -f in the failover script you can be relatively > confident > > > > that nothing will be written on the iSCSI targets > > > > > > > > - A zpool scrub should be run at regular intervals > > > > > > > > This is my MASTER -> BACKUP CARP script ATM > > > > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > > > > > > > > Julien > > > > > > > > > > 100=E2=82=AC question without detailed looking at that script. yes fr= om a first > > > view its super simple, but: why are solutions like rsf-1 such more > > > powerful / featurerich. Theres a reason for, which is that they try t= o > > > cover every possible situation (which makes more than sense for this)= . > > > > I've never used "rsf-1" so I can't say much more about it, but I have > > no doubts about it's ability to handle "complex situations", where > > multiple nodes / networks are involved. > > BTW for simple cases (two nodes, same network, one active node, ...) we > could use both: ZFS + iSCSI + CARP on the two nodes, and > zfs send|zfs receive on a third one > > > > > > > > > That script works for sure, within very limited cases imho > > > > > > >> > > > >> kaboom, really ugly kaboom. thats what is very likely to happen > sooner > > > >> or later especially when it comes to homegrown automatism solution= s. > > > >> even the commercial parts where much more time/work goes into such > > > >> solutions fail in a regular manner > > > >> > > > >>> > > > >>> The advantage of ZFS send/receive of datasets is, however, that > you can consider it > > > >>> essentially atomic. A transport corruption should not cause > trouble (apart from a failed > > > >>> "zfs receive") and with snapshot retention you can even roll back= . > You can=E2=80=99t roll back > > > >>> zpool replications :) > > > >>> > > > >>> ZFS receive does a lot of sanity checks as well. As long as your > zfs receive doesn=E2=80=99t involve a rollback > > > >>> to the latest snapshot, it won=E2=80=99t destroy anything by mist= ake. Just > make sure that your replica datasets > > > >>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain. > > > >>> > > > >>> > > > >>> Cheers, > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> Borja. > > > >>> > > > >>> > > > >>> > > > >>> _______________________________________________ > > > >>> freebsd-fs@freebsd.org mailing list > > > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@ > freebsd.org" > > > >>> > > > >> _______________________________________________ > > > >> freebsd-fs@freebsd.org mailing list > > > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@ > freebsd.org" > > > > > > > > -- > > Julien Cigar > > Belgian Biodiversity Platform (http://www.biodiversity.be) > > PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 > > No trees were killed in the creation of this message. > > However, many electrons were terribly inconvenienced. > > > > -- > Julien Cigar > Belgian Biodiversity Platform (http://www.biodiversity.be) > PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 > No trees were killed in the creation of this message. > However, many electrons were terribly inconvenienced. > From owner-freebsd-fs@freebsd.org Wed Aug 17 15:38:06 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8C69BBC661 for ; Wed, 17 Aug 2016 15:38:06 +0000 (UTC) (envelope-from joe@getsomewhere.net) Received: from prak.gameowls.com (prak.gameowls.com [IPv6:2001:19f0:5c00:950b:5400:ff:fe14:46b7]) by mx1.freebsd.org (Postfix) with ESMTP id 84A3D15EE for ; Wed, 17 Aug 2016 15:38:06 +0000 (UTC) (envelope-from joe@getsomewhere.net) Received: from [IPv6:2001:470:c412:beef:135:c8df:2d0e:4ea6] (unknown [IPv6:2001:470:c412:beef:135:c8df:2d0e:4ea6]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by prak.gameowls.com (Postfix) with ESMTPSA id B6C001863E; Wed, 17 Aug 2016 10:37:58 -0500 (CDT) From: Joe Love Message-Id: Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: MooseFS on FreeBSD (was: HAST + ZFS + NFS + CARP) Date: Wed, 17 Aug 2016 10:37:57 -0500 References: <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <20160817113339.GH22506@mordor.lan> To: krad , FreeBSD FS In-Reply-To: X-Mailer: Apple Mail (2.3124) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 15:38:06 -0000 On Aug 17, 2016, at 8:33 AM, krad wrote: >=20 > What are peoples experiences on running something like moosfs on top = of > zfs? It looks really compelling on certain levels, but i'm not sure = about > the reality in a production network yet. >=20 I did some experimenting with MooseFS on a test cluster (using ZFS as = the local storage on the nodes). That was nearly a year ago, when I = decided it wasn=E2=80=99t a good fit as a storage backend to vmware = (primarily because of the overhead involved with traversing from vmware = over nfs to moosefs). They did a bunch of tweaking back then as I kept prodding a bit for = better operations on FreeBSD, but they ultimately ran into a stumbling = block with I/O caching in FreeBSD. Here is their analysis of what they ran into, and the solution they came = up with for higher throughput: https://sourceforge.net/p/moosefs/mailman/message/34483159/ = I=E2=80=99d love to get around to testing MooseFS as a backing store for = bhyve, but I currently lack the power & network ports to connect up my = test nodes again. -Joe From owner-freebsd-fs@freebsd.org Wed Aug 17 15:50:48 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 085AEBBCD64 for ; Wed, 17 Aug 2016 15:50:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EC144143F for ; Wed, 17 Aug 2016 15:50:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HFolt7078797 for ; Wed, 17 Aug 2016 15:50:47 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211491] System hangs after "Uptime" on reboot with ZFS Date: Wed, 17 Aug 2016 15:50:47 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA3 X-Bugzilla-Keywords: needs-qa X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: vangyzen@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: mfc-stable10? mfc-stable11? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 15:50:48 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211491 --- Comment #18 from Eric van Gyzen --- I can still reproduce this on head at r304162. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 16:13:29 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C667ABBD6D8 for ; Wed, 17 Aug 2016 16:13:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B62D117E5 for ; Wed, 17 Aug 2016 16:13:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HGDT5b067181 for ; Wed, 17 Aug 2016 16:13:29 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Wed, 17 Aug 2016 16:13:29 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: mckusick@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 16:13:29 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 --- Comment #7 from Kirk McKusick --- I would like a week or two in head just to be sure that it does not break a= ny existing code usage. Note that 304230 has to be merged at the same time as this one (304239) since this one uses the new LIST_CONCAT added to queue.h = in 304230. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 16:18:44 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 852BABBD79E for ; Wed, 17 Aug 2016 16:18:44 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5157619A0 for ; Wed, 17 Aug 2016 16:18:44 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by mail-it0-x230.google.com with SMTP id e63so3988181ith.1 for ; Wed, 17 Aug 2016 09:18:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kateley-com.20150623.gappssmtp.com; s=20150623; h=reply-to:subject:references:to:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=+AZDDlwiY4qKm3640B1HdOHHyaLkjajh2cy6kanlXJ0=; b=LnlukCi7ufpHeIBYKvfIpo0LfADSfRMid0gkDoyRQk5cvrvpXvpFX1EtZ7ZM8UAupH +LBUOceNnSZZ9cXD+PgHYJ74VNBqAvCxR4eAdnMZ9k0MwZ8BQ0yXgyLx2JKahLnTWGUs dQYmIMUWgd3C4lKtALB5Cmrp0+KR5CvfKlH4rrMXuLIPcmm6LEmB+4PtGzJEVpIKxeIU 9RuS3HaGkZa5koskaj6JiIzr1yRNvkFqKf1qUBpPgtbINYl5atwk+7JjCK/OGV35Xfef CBnY1uEvqkKzDFu4Z485aDWYWA8Uho1VsrX0XvBuCY2eqsFKe5mA1doWEdD4RUdl9GY8 X+JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=+AZDDlwiY4qKm3640B1HdOHHyaLkjajh2cy6kanlXJ0=; b=EgFlbtuhg6Rttxose3EGZUM1wiu6r4t+N0woTxnp2irauuFgnIFWKz29XNnX8gZd0X KBySFV8YzziBXkVYRYj3n2IV46vnJqvWpRMxz4J85bXXpd3UYE31nbP1Nybv4zidK+S0 aqvOrfVtepH/Wp+eVg9p30ul8H4zZa56q9RuhL3yHJPBAQDkvVsR3BKccl7kC0wDcTrk +Zu+NnnhbPeUkk6bDNACXebBRpXYYfi415PFjwL/bxv1KcBepeOwmStals/FjxC+uHIC rggSFgR0y3LDbk0UipNGd9AoJr2BVJGe5SzfkgWFR9hQ4Gqlsdv59Yh1c4Su677+8ewB 8ivg== X-Gm-Message-State: AEkoouvY+XjHGHLT0pIJEJfFVVUE+AOHTx1DXEdhJXk4fe8sA3dO5bHa9z949aN3A+Y88w== X-Received: by 10.36.53.83 with SMTP id k80mr28919210ita.59.1471450723555; Wed, 17 Aug 2016 09:18:43 -0700 (PDT) Received: from Kateleyco-iMac.local (c-50-188-36-30.hsd1.mn.comcast.net. [50.188.36.30]) by smtp.googlemail.com with ESMTPSA id o16sm265032itg.15.2016.08.17.09.18.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Aug 2016 09:18:42 -0700 (PDT) Reply-To: linda@kateley.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> To: freebsd-fs@freebsd.org From: Linda Kateley Organization: Kateley Company Message-ID: <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> Date: Wed, 17 Aug 2016 11:18:42 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160817095222.GG22506@mordor.lan> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 16:18:44 -0000 The question I always ask, as an architect, is "can you lose 1 minute worth of data?" If you can, then batched replication is perfect. If you can't.. then HA. Every place I have positioned it, rsf-1 has worked extremely well. If i remember right, it works at the dmu. I would suggest try it. They have been trying to have a full freebsd solution, I have several customers running it well. linda On 8/17/16 4:52 AM, Julien Cigar wrote: > On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter wrote: >> >> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote: >>>> >>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>> On 11 Aug 2016, at 11:10, Julien Cigar wrote: >>>>>> >>>>>> As I said in a previous post I tested the zfs send/receive approach (with >>>>>> zrep) and it works (more or less) perfectly.. so I concur in all what you >>>>>> said, especially about off-site replicate and synchronous replication. >>>>>> >>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment, >>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it >>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport >>>>> (Ethernet) for low-level filesystem operations. >>>>> >>>>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly, >>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some >>>>> silent corruption. >>>> try dual split import :D i mean, zpool -f import on 2 machines hooked up >>>> to the same disk chassis. >>> Yes this is the first thing on the list to avoid .. :) >>> >>> I'm still busy to test the whole setup here, including the >>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>> that thanks to: >>> >>> - As long as ctld is running on the BACKUP the disks are locked >>> and you can't import the pool (even with -f) for ex (filer2 is the >>> BACKUP): >>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>> >>> - The shared pool should not be mounted at boot, and you should ensure >>> that the failover script is not executed during boot time too: this is >>> to handle the case wherein both machines turn off and/or re-ignite at >>> the same time. Indeed, the CARP interface can "flip" it's status if both >>> machines are powered on at the same time, for ex: >>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>> you will have a split-brain scenario >>> >>> - Sometimes you'll need to reboot the MASTER for some $reasons >>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>> happen, this can be handled with a trigger file or something like that >>> >>> - I've still have to check if the order is OK, but I think that as long >>> as you shutdown the replication interface and that you adapt the >>> advskew (including the config file) of the CARP interface before the >>> zpool import -f in the failover script you can be relatively confident >>> that nothing will be written on the iSCSI targets >>> >>> - A zpool scrub should be run at regular intervals >>> >>> This is my MASTER -> BACKUP CARP script ATM >>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>> >>> Julien >>> >> 100€ question without detailed looking at that script. yes from a first >> view its super simple, but: why are solutions like rsf-1 such more >> powerful / featurerich. Theres a reason for, which is that they try to >> cover every possible situation (which makes more than sense for this). > I've never used "rsf-1" so I can't say much more about it, but I have > no doubts about it's ability to handle "complex situations", where > multiple nodes / networks are involved. > >> That script works for sure, within very limited cases imho >> >>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner >>>> or later especially when it comes to homegrown automatism solutions. >>>> even the commercial parts where much more time/work goes into such >>>> solutions fail in a regular manner >>>> >>>>> The advantage of ZFS send/receive of datasets is, however, that you can consider it >>>>> essentially atomic. A transport corruption should not cause trouble (apart from a failed >>>>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back >>>>> zpool replications :) >>>>> >>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback >>>>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets >>>>> aren’t mounted and zfs receive won’t complain. >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> >>>>> >>>>> Borja. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>> >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Wed Aug 17 16:29:49 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61C42BBDC16 for ; Wed, 17 Aug 2016 16:29:49 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5165911D2 for ; Wed, 17 Aug 2016 16:29:49 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7HGTnDB096239 for ; Wed, 17 Aug 2016 16:29:49 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211939] ZFS does not correctly import cache and spares by label Date: Wed, 17 Aug 2016 16:29:49 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 16:29:49 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211939 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-fs@FreeBSD.org --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Wed Aug 17 16:55:32 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A0F1BBD5C1 for ; Wed, 17 Aug 2016 16:55:32 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: from mail-pf0-x22f.google.com (mail-pf0-x22f.google.com [IPv6:2607:f8b0:400e:c00::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 134711583 for ; Wed, 17 Aug 2016 16:55:32 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: by mail-pf0-x22f.google.com with SMTP id x72so39167264pfd.2 for ; Wed, 17 Aug 2016 09:55:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :message-id:cc:from:subject:date:to; bh=xE4l3qDTiVBBpnuU2Ob3pRGs/CtQZQjzm8vKAY1p06U=; b=z4OTC7Ly3fCMDuOjEHpc9Cig/nogUrIXGgi3hQ41Lyptz1M6e54h3LT7md/vRPsfkP a68UZky+rNWcPEFmyXzettoZwmgSSrOe/0U9wQi/B2UWo5cYF9rI870ynSPxbF/2urmP e0eyAcizx58wfw/qdW049pB3Wn5B2ilC3pcxI6uqgl6Ftd03ELlDIrKLhtDD+CVE2La7 ZaByEor91+XNvlxYf2wBvj7N7iyFONJKvCpSGP+C1ojGV0dNFAsHEAzkRkkodSSuYrpQ 3WLskiVc6jV9Mtc+1R2gz6uVP3LVIgkq+HsXzJHH7J8YAHygqgGJhdG9AvNnKidbHhlB GByw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:references:in-reply-to:mime-version :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=xE4l3qDTiVBBpnuU2Ob3pRGs/CtQZQjzm8vKAY1p06U=; b=hKgujPjfDZ0Qd665Ire/ugtEoD6HiCYUxNyNT/KQjFUaEK9MP5cn7YtG2x3XgPx8D3 Dli3P1+zk4W895BUdiKG8lXd0PeSVYR/RLuSCC3tfO1ZHnul7bhjyOFs9PsTjeq4E1pu 3Qx9grrci1IhlGKZLSQafuqg5U9f94lBE23xjS6jL2r+e1eWec360Pdi8mGvfE1qjEK3 LZV8biCM/Ll2eO5H7Q8LiMihZ+XEacwUDTWN5agNiGqnCV5nzxAdWQ6kGkEn5se6/4me dmBAGge7zFTheBkuJHDsAERw7VRi6E8rbhFYtKerADWWhAyTMX+BmmqzcwNCEs0RJOzw NT1A== X-Gm-Message-State: AEkoousl3AcukQv28quFXrYiJzK3oRSy8W7JbXT9P227jJ6m3xtxVbZiC5Q4sXk/XAD9zA== X-Received: by 10.98.147.14 with SMTP id b14mr24930241pfe.103.1471452931348; Wed, 17 Aug 2016 09:55:31 -0700 (PDT) Received: from [192.168.0.10] (cpe-70-118-225-173.kc.res.rr.com. [70.118.225.173]) by smtp.gmail.com with ESMTPSA id n10sm11441pap.16.2016.08.17.09.55.29 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 17 Aug 2016 09:55:30 -0700 (PDT) References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> In-Reply-To: <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> Mime-Version: 1.0 (1.0) Message-Id: <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> Cc: freebsd-fs@freebsd.org X-Mailer: iPhone Mail (14A5341a) From: Chris Watson Subject: Re: HAST + ZFS + NFS + CARP Date: Wed, 17 Aug 2016 11:55:27 -0500 To: linda@kateley.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 16:55:32 -0000 Of course, if you are willing to accept some amount of data loss that opens u= p a lot more options. :) Some may find that acceptable though. Like turning off fsync with PostgreSQL= to get much higher throughput. As little no as you are made *very* aware of= the risks.=20 It's good to have input in this thread from one with more experience with RS= F-1 than the rest of us. You confirm what others have that said about RSF-1,= that it's stable and works well. What were you deploying it on? Chris Sent from my iPhone 5 > On Aug 17, 2016, at 11:18 AM, Linda Kateley wrote: >=20 > The question I always ask, as an architect, is "can you lose 1 minute wort= h of data?" If you can, then batched replication is perfect. If you can't.. t= hen HA. Every place I have positioned it, rsf-1 has worked extremely well. I= f i remember right, it works at the dmu. I would suggest try it. They have b= een trying to have a full freebsd solution, I have several customers running= it well. >=20 > linda >=20 >=20 >> On 8/17/16 4:52 AM, Julien Cigar wrote: >>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswint= er wrote: >>>=20 >>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswi= nter wrote: >>>>>=20 >>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar wrote= : >>>>>>>=20 >>>>>>> As I said in a previous post I tested the zfs send/receive approach (= with >>>>>>> zrep) and it works (more or less) perfectly.. so I concur in all wha= t you >>>>>>> said, especially about off-site replicate and synchronous replicatio= n. >>>>>>>=20 >>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment= , >>>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM i= t >>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>> I must be too old school, but I don=E2=80=99t quite like the idea of u= sing an essentially unreliable transport >>>>>> (Ethernet) for low-level filesystem operations. >>>>>>=20 >>>>>> In case something went wrong, that approach could risk corrupting a p= ool. Although, frankly, >>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA probl= em that caused some >>>>>> silent corruption. >>>>> try dual split import :D i mean, zpool -f import on 2 machines hooked u= p >>>>> to the same disk chassis. >>>> Yes this is the first thing on the list to avoid .. :) >>>>=20 >>>> I'm still busy to test the whole setup here, including the >>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>>> that thanks to: >>>>=20 >>>> - As long as ctld is running on the BACKUP the disks are locked >>>> and you can't import the pool (even with -f) for ex (filer2 is the >>>> BACKUP): >>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>=20 >>>> - The shared pool should not be mounted at boot, and you should ensure >>>> that the failover script is not executed during boot time too: this is >>>> to handle the case wherein both machines turn off and/or re-ignite at >>>> the same time. Indeed, the CARP interface can "flip" it's status if bot= h >>>> machines are powered on at the same time, for ex: >>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>>> you will have a split-brain scenario >>>>=20 >>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>> happen, this can be handled with a trigger file or something like that >>>>=20 >>>> - I've still have to check if the order is OK, but I think that as long= >>>> as you shutdown the replication interface and that you adapt the >>>> advskew (including the config file) of the CARP interface before the >>>> zpool import -f in the failover script you can be relatively confident >>>> that nothing will be written on the iSCSI targets >>>>=20 >>>> - A zpool scrub should be run at regular intervals >>>>=20 >>>> This is my MASTER -> BACKUP CARP script ATM >>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>=20 >>>> Julien >>>>=20 >>> 100=E2=82=AC question without detailed looking at that script. yes from a= first >>> view its super simple, but: why are solutions like rsf-1 such more >>> powerful / featurerich. Theres a reason for, which is that they try to >>> cover every possible situation (which makes more than sense for this). >> I've never used "rsf-1" so I can't say much more about it, but I have >> no doubts about it's ability to handle "complex situations", where >> multiple nodes / networks are involved. >>=20 >>> That script works for sure, within very limited cases imho >>>=20 >>>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner= >>>>> or later especially when it comes to homegrown automatism solutions. >>>>> even the commercial parts where much more time/work goes into such >>>>> solutions fail in a regular manner >>>>>=20 >>>>>> The advantage of ZFS send/receive of datasets is, however, that you c= an consider it >>>>>> essentially atomic. A transport corruption should not cause trouble (= apart from a failed >>>>>> "zfs receive") and with snapshot retention you can even roll back. Yo= u can=E2=80=99t roll back >>>>>> zpool replications :) >>>>>>=20 >>>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs r= eceive doesn=E2=80=99t involve a rollback >>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by mistake.= Just make sure that your replica datasets >>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain. >>>>>>=20 >>>>>>=20 >>>>>> Cheers, >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> Borja. >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org mailing list >>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"= >>>>>>=20 >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Wed Aug 17 18:03:22 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5D116BBDB71 for ; Wed, 17 Aug 2016 18:03:22 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-io0-x235.google.com (mail-io0-x235.google.com [IPv6:2607:f8b0:4001:c06::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 238BD1E0C for ; Wed, 17 Aug 2016 18:03:22 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by mail-io0-x235.google.com with SMTP id b62so142309336iod.3 for ; Wed, 17 Aug 2016 11:03:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kateley-com.20150623.gappssmtp.com; s=20150623; h=reply-to:subject:references:to:cc:from:organization:message-id:date :user-agent:mime-version:in-reply-to; bh=W3CZ6snaEsSDrvxIVpZVNCrQBRMZ/J6cyLqa9EV80C8=; b=m4odc/p8dKvuk9pI2qKktnCxG0TXSMmJ9J4HDk9woDch65sNz3aorvH8mYlw/nQn4j dRGpZ/Sz6gQDfnkCwFnpsU5xGy+6NDETgP5TMikl8GMTfgs7Y1vh1tBdozE+d4ytMdHy /ig8gabKXaf7TPEgLjmcCSmurPzEPyCKpKcgknYktUvb7tP9mrhgTLqjUBrOO5C9H9Q5 JE9W5zD9jJx8Obc1Co1aqtdZnUzv7z2yyyMAYg08m1JSyPfbUY19aHJvoa6veP40pUlZ VOW5oX3IRN8AwZhov/uyaBttWILOZ4qPRQarlgw11QCa/Zk6p7YCXX40me8kL5iffOKS cThw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:cc:from :organization:message-id:date:user-agent:mime-version:in-reply-to; bh=W3CZ6snaEsSDrvxIVpZVNCrQBRMZ/J6cyLqa9EV80C8=; b=Ishima+vkpdLgxxt4yajXhpNZyheJT9WhlTVm97a75crfu1hg279S1/gEFiAigW+NZ n9DReiIiqGrX0Bo4R9lIyUjCVFnKaNcpytkaYSt7DIecpjk2xF4k4UevAZZfyD0+OO4J QYjslJE52i0i+mzCgVUp+LXcbsgjyAHg1TqajeEUhAoYewiNch39ia5aMSTK50AaSfHU n+zzl1qcH/qbfGuaAgHesI5TZoMNkZDmmCsZ/y5v1cOPlzSbkkBv9IiE5Hzl+SsbTxDe NG96sT4C4GmGN5V/EFkkHlTlQClttHGMYVfj3nVsUNGklXBoc64jYs9/q10Ui8Cv9xrh bqZA== X-Gm-Message-State: AEkoous5PE/JZw9Dj7MXD2Raoa76xi1bRt1RwCbMY3V6+rPdgu/oosucVd8rpQLVmzBSxw== X-Received: by 10.107.152.201 with SMTP id a192mr54177775ioe.24.1471457001313; Wed, 17 Aug 2016 11:03:21 -0700 (PDT) Received: from Kateleyco-iMac.local (c-50-188-36-30.hsd1.mn.comcast.net. [50.188.36.30]) by smtp.googlemail.com with ESMTPSA id v195sm438837itc.8.2016.08.17.11.03.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Aug 2016 11:03:20 -0700 (PDT) Reply-To: linda@kateley.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> To: Chris Watson , linda@kateley.com Cc: freebsd-fs@freebsd.org From: Linda Kateley Organization: Kateley Company Message-ID: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> Date: Wed, 17 Aug 2016 13:03:19 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 18:03:22 -0000 I just do consulting so I don't always get to see the end of the project. Although we are starting to do more ongoing support so we can see the progress.. I have worked with some of the guys from high-availability.com for maybe 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work beautifully with omni/illumos. The one customer I have running it in prod is an isp in south america running openstack and zfs on freebsd as iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i have some contacts there. Ping me offlist. You do risk losing data if you batch zfs send. It is very hard to run that real time. You have to take the snap then send the snap. Most people run in cron, even if it's not in cron, you would want one to finish before you started the next. If you lose the sending host before the receive is complete you won't have a full copy. With zfs though you will probably still have the data on the sending host, however long it takes to bring it back up. RSF-1 runs in the zfs stack and send the writes to the second system. It's kind of pricey, but actually much less expensive than commercial alternatives. Anytime you run anything sync it adds latency but makes things safer.. There is also a cool tool I like, called zerto for vmware that sits in the hypervisor and sends a sync copy of a write locally and then an async remotely. It's pretty cool. Although I haven't run it myself, have a bunch of customers running it. I believe it works with proxmox too. Most people I run into (these days) don't mind losing 5 or even 30 minutes of data. Small shops. They usually have a copy somewhere else. Or the cost of 5-30 minutes isn't that great. I used work as a datacenter architect for sun/oracle with only fortune 500. There losing 1 sec could put large companies out of business. I worked with banks and exchanges. They couldn't ever lose a single transaction. Most people nowadays do the replication/availability in the application though and don't care about underlying hardware, especially disk. On 8/17/16 11:55 AM, Chris Watson wrote: > Of course, if you are willing to accept some amount of data loss that > opens up a lot more options. :) > > Some may find that acceptable though. Like turning off fsync with > PostgreSQL to get much higher throughput. As little no as you are made > *very* aware of the risks. > > It's good to have input in this thread from one with more experience > with RSF-1 than the rest of us. You confirm what others have that said > about RSF-1, that it's stable and works well. What were you deploying > it on? > > Chris > > Sent from my iPhone 5 > > On Aug 17, 2016, at 11:18 AM, Linda Kateley > wrote: > >> The question I always ask, as an architect, is "can you lose 1 minute >> worth of data?" If you can, then batched replication is perfect. If >> you can't.. then HA. Every place I have positioned it, rsf-1 has >> worked extremely well. If i remember right, it works at the dmu. I >> would suggest try it. They have been trying to have a full freebsd >> solution, I have several customers running it well. >> >> linda >> >> >> On 8/17/16 4:52 AM, Julien Cigar wrote: >>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen >>> Gotteswinter wrote: >>>> >>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen >>>>> Gotteswinter wrote: >>>>>> >>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>> > wrote: >>>>>>>> >>>>>>>> As I said in a previous post I tested the zfs send/receive >>>>>>>> approach (with >>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in >>>>>>>> all what you >>>>>>>> said, especially about off-site replicate and synchronous >>>>>>>> replication. >>>>>>>> >>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the >>>>>>>> moment, >>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but >>>>>>>> ATM it >>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>> I must be too old school, but I don’t quite like the idea of >>>>>>> using an essentially unreliable transport >>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>> >>>>>>> In case something went wrong, that approach could risk >>>>>>> corrupting a pool. Although, frankly, >>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA >>>>>>> problem that caused some >>>>>>> silent corruption. >>>>>> try dual split import :D i mean, zpool -f import on 2 machines >>>>>> hooked up >>>>>> to the same disk chassis. >>>>> Yes this is the first thing on the list to avoid .. :) >>>>> >>>>> I'm still busy to test the whole setup here, including the >>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>>>> that thanks to: >>>>> >>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>> and you can't import the pool (even with -f) for ex (filer2 is the >>>>> BACKUP): >>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>> >>>>> - The shared pool should not be mounted at boot, and you should ensure >>>>> that the failover script is not executed during boot time too: this is >>>>> to handle the case wherein both machines turn off and/or re-ignite at >>>>> the same time. Indeed, the CARP interface can "flip" it's status >>>>> if both >>>>> machines are powered on at the same time, for ex: >>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>>>> you will have a split-brain scenario >>>>> >>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>> happen, this can be handled with a trigger file or something like that >>>>> >>>>> - I've still have to check if the order is OK, but I think that as >>>>> long >>>>> as you shutdown the replication interface and that you adapt the >>>>> advskew (including the config file) of the CARP interface before the >>>>> zpool import -f in the failover script you can be relatively confident >>>>> that nothing will be written on the iSCSI targets >>>>> >>>>> - A zpool scrub should be run at regular intervals >>>>> >>>>> This is my MASTER -> BACKUP CARP script ATM >>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>> >>>>> Julien >>>>> >>>> 100€ question without detailed looking at that script. yes from a first >>>> view its super simple, but: why are solutions like rsf-1 such more >>>> powerful / featurerich. Theres a reason for, which is that they try to >>>> cover every possible situation (which makes more than sense for this). >>> I've never used "rsf-1" so I can't say much more about it, but I have >>> no doubts about it's ability to handle "complex situations", where >>> multiple nodes / networks are involved. >>> >>>> That script works for sure, within very limited cases imho >>>> >>>>>> kaboom, really ugly kaboom. thats what is very likely to happen >>>>>> sooner >>>>>> or later especially when it comes to homegrown automatism solutions. >>>>>> even the commercial parts where much more time/work goes into such >>>>>> solutions fail in a regular manner >>>>>> >>>>>>> The advantage of ZFS send/receive of datasets is, however, that >>>>>>> you can consider it >>>>>>> essentially atomic. A transport corruption should not cause >>>>>>> trouble (apart from a failed >>>>>>> "zfs receive") and with snapshot retention you can even roll >>>>>>> back. You can’t roll back >>>>>>> zpool replications :) >>>>>>> >>>>>>> ZFS receive does a lot of sanity checks as well. As long as your >>>>>>> zfs receive doesn’t involve a rollback >>>>>>> to the latest snapshot, it won’t destroy anything by mistake. >>>>>>> Just make sure that your replica datasets >>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Borja. >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>> To unsubscribe, send any mail to >>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>>> " >>>>>>> >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org mailing list >>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>> To unsubscribe, send any mail to >>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>> " >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >> " From owner-freebsd-fs@freebsd.org Wed Aug 17 21:14:38 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D109CBBDD43 for ; Wed, 17 Aug 2016 21:14:38 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5E3DC1BFD for ; Wed, 17 Aug 2016 21:14:38 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x22d.google.com with SMTP id q128so332750wma.1 for ; Wed, 17 Aug 2016 14:14:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=iRwMJh7xj9RZqVd/43NQql03DdsVtfxdT39eMpReWfQ=; b=S0mt3nR4YsAdeBuBq2W6ZdhRmCCQb9ooZ3IEx7DTMs0Fwa6GB4lAY6GpakaFPsLLxw ZVaJE7zx60eThmKeTjzVQwVL5E4XK5nsPm5DOrz5FaqRjYZZkoDvi/RnCfEz8cqprWgN HeXPdJhBmORZkyZkrOm4OxJJLNh5Mo7q2ImvepBI7DE/JzmTxsNX+F2GQfqeoEsvF/k2 1yGSyHUEow7Cn+mkZ2Qpg5a2soLvVARIXoMBm56jbYpvY6o2RZVDHluhkrvbxG5HKoyR jrQ1b7OZdUXllXyi6Fb/Zqq1V6HUV2LdOGyIXgeZ41k6+HFoq6i/F2agG5GRWe0+Jg5a 33Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=iRwMJh7xj9RZqVd/43NQql03DdsVtfxdT39eMpReWfQ=; b=RyLmWGaJCDV0qtu5VonIxVqDDCRgje6/IgolOQeieJ9kxRGTTLaksGCO0lL2DnVjM9 DOFbWgp9t+xu7DG++CW2G499eVjBq7YIcc6BTosNYM9IecNjGooU5tRDjXwpJIcZRhJx uj5HbgPCj3RLoWghYBcl5zm5lD1c+aDzIDGXLb3Xn0JtW3i5J6AXzvi2/ZM51mOnGwGE P1Q3KaJBSK570bwtcBcRctJbr22YbMxwHjrPR4Kyf+0wGO9jyOeedzWIN9MSmHjDda0R OPYZkZjQsf72N1hMfhsZ1Gg9pbN7sT4N/mffv9sJTvDKdLfVEA8/2UEz0yujRPB35OdX RlrQ== X-Gm-Message-State: AEkoouvkPPofkyRyWZDEJ1yJYPcRnnYA5sZ3j9s3nvJ1mHn4PNY8Fei8ONPHAu3OUdBpKA== X-Received: by 10.194.221.134 with SMTP id qe6mr45663864wjc.165.1471468476428; Wed, 17 Aug 2016 14:14:36 -0700 (PDT) Received: from macbook-air-de-benjamin.home (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85]) by smtp.gmail.com with ESMTPSA id a9sm240189wjf.16.2016.08.17.14.14.35 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 17 Aug 2016 14:14:35 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Ben RUBSON In-Reply-To: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> Date: Wed, 17 Aug 2016 23:14:34 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com> References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 21:14:38 -0000 > On 17 Aug 2016, at 20:03, Linda Kateley wrote: >=20 > RSF-1 runs in the zfs stack and send the writes to the second system. Linda, do you have any link to a documentation about this RSF-1 = operation mode ? According to what I red about RSF-1, storage is shared between nodes, = and RSF-1 manages the failover, we do not have 2 different storages. (so I don't really understand how writes are sent to the "second = system") In addition, RSF-1 does not seem to help with long-distance replication = to a different storage. But I may be wrong ? This is where ZFS send/receive helps. Or even a nicer solution I proposed a few weeks ago : = https://www.illumos.org/issues/7166 (but a lot of work to achieve). Ben > On 8/17/16 11:55 AM, Chris Watson wrote: >> Of course, if you are willing to accept some amount of data loss that = opens up a lot more options. :) >>=20 >> Some may find that acceptable though. Like turning off fsync with = PostgreSQL to get much higher throughput. As little no as you are made = *very* aware of the risks. >>=20 >> It's good to have input in this thread from one with more experience = with RSF-1 than the rest of us. You confirm what others have that said = about RSF-1, that it's stable and works well. What were you deploying it = on? >>=20 >> Chris >>=20 >> Sent from my iPhone 5 >>=20 >> On Aug 17, 2016, at 11:18 AM, Linda Kateley > wrote: >>=20 >>> The question I always ask, as an architect, is "can you lose 1 = minute worth of data?" If you can, then batched replication is perfect. = If you can't.. then HA. Every place I have positioned it, rsf-1 has = worked extremely well. If i remember right, it works at the dmu. I would = suggest try it. They have been trying to have a full freebsd solution, I = have several customers running it well. >>>=20 >>> linda >>>=20 >>>=20 >>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen = Gotteswinter wrote: >>>>>=20 >>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen = Gotteswinter wrote: >>>>>>>=20 >>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar > wrote: >>>>>>>>>=20 >>>>>>>>> As I said in a previous post I tested the zfs send/receive = approach (with >>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in = all what you >>>>>>>>> said, especially about off-site replicate and synchronous = replication. >>>>>>>>>=20 >>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the = moment, >>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but = ATM it >>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>> I must be too old school, but I don=E2=80=99t quite like the = idea of using an essentially unreliable transport >>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>>=20 >>>>>>>> In case something went wrong, that approach could risk = corrupting a pool. Although, frankly, >>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA = problem that caused some >>>>>>>> silent corruption. >>>>>>> try dual split import :D i mean, zpool -f import on 2 machines = hooked up >>>>>>> to the same disk chassis. >>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>>=20 >>>>>> I'm still busy to test the whole setup here, including the >>>>>> MASTER -> BACKUP failover script (CARP), but I think you can = prevent >>>>>> that thanks to: >>>>>>=20 >>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>> and you can't import the pool (even with -f) for ex (filer2 is = the >>>>>> BACKUP): >>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>>>=20 >>>>>> - The shared pool should not be mounted at boot, and you should = ensure >>>>>> that the failover script is not executed during boot time too: = this is >>>>>> to handle the case wherein both machines turn off and/or = re-ignite at >>>>>> the same time. Indeed, the CARP interface can "flip" it's status = if both >>>>>> machines are powered on at the same time, for ex: >>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf = and >>>>>> you will have a split-brain scenario >>>>>>=20 >>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>>> happen, this can be handled with a trigger file or something like = that >>>>>>=20 >>>>>> - I've still have to check if the order is OK, but I think that = as long >>>>>> as you shutdown the replication interface and that you adapt the >>>>>> advskew (including the config file) of the CARP interface before = the >>>>>> zpool import -f in the failover script you can be relatively = confident >>>>>> that nothing will be written on the iSCSI targets >>>>>>=20 >>>>>> - A zpool scrub should be run at regular intervals >>>>>>=20 >>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>>>=20 >>>>>> Julien >>>>>>=20 >>>>> 100=E2=82=AC question without detailed looking at that script. yes = from a first >>>>> view its super simple, but: why are solutions like rsf-1 such more >>>>> powerful / featurerich. Theres a reason for, which is that they = try to >>>>> cover every possible situation (which makes more than sense for = this). >>>> I've never used "rsf-1" so I can't say much more about it, but I = have >>>> no doubts about it's ability to handle "complex situations", where >>>> multiple nodes / networks are involved. >>>>=20 >>>>> That script works for sure, within very limited cases imho >>>>>=20 >>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen = sooner >>>>>>> or later especially when it comes to homegrown automatism = solutions. >>>>>>> even the commercial parts where much more time/work goes into = such >>>>>>> solutions fail in a regular manner >>>>>>>=20 >>>>>>>> The advantage of ZFS send/receive of datasets is, however, that = you can consider it >>>>>>>> essentially atomic. A transport corruption should not cause = trouble (apart from a failed >>>>>>>> "zfs receive") and with snapshot retention you can even roll = back. You can=E2=80=99t roll back >>>>>>>> zpool replications :) >>>>>>>>=20 >>>>>>>> ZFS receive does a lot of sanity checks as well. As long as = your zfs receive doesn=E2=80=99t involve a rollback >>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by = mistake. Just make sure that your replica datasets >>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Cheers, >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Borja. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing = list >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org = " >>>>>>>>=20 >>>>>>> _______________________________________________ >>>>>>> freebsd-fs@freebsd.org mailing = list >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org = " >>>=20 >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org = " >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Thu Aug 18 07:32:32 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 42D7ABBE15A for ; Thu, 18 Aug 2016 07:32:32 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C02C51A25 for ; Thu, 18 Aug 2016 07:32:31 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 359D145FC0FB; Thu, 18 Aug 2016 09:32:23 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fV-FbX44V-04; Thu, 18 Aug 2016 09:32:16 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id E59964C4C688; Thu, 18 Aug 2016 09:32:16 +0200 (CEST) Reply-To: juergen.gotteswinter@internetx.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> To: linda@kateley.com, Chris Watson Cc: freebsd-fs@freebsd.org From: InterNetX - Juergen Gotteswinter Organization: InterNetX GmbH Message-ID: <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> Date: Thu, 18 Aug 2016 09:32:14 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 07:32:32 -0000 Am 17.08.2016 um 20:03 schrieb Linda Kateley: > I just do consulting so I don't always get to see the end of the > project. Although we are starting to do more ongoing support so we can > see the progress.. > > I have worked with some of the guys from high-availability.com for maybe > 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work > beautifully with omni/illumos. The one customer I have running it in > prod is an isp in south america running openstack and zfs on freebsd as > iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i > have some contacts there. Ping me offlist. no offense, but it sounds a bit like marketing. here: running nexenta ha setup since several years with one catastrophic failure due to split brain > > You do risk losing data if you batch zfs send. It is very hard to run > that real time. depends on how much data changes aka delta size You have to take the snap then send the snap. Most > people run in cron, even if it's not in cron, you would want one to > finish before you started the next. thats the reason why lock files where invented, tools like zrep handle that themself via additional zfs properties or, if one does not trust a single layer -- snip -- #!/bin/sh if [ ! -f /var/run/replic ] ; then touch /var/run/replic /blah/path/zrep sync all >> /var/log/zfsrepli.log rm -f /var/run/replic fi -- snip -- something like this, simple If you lose the sending host before > the receive is complete you won't have a full copy. if rsf fails, and you end up in split brain you loose way more. been there, seen that. With zfs though you > will probably still have the data on the sending host, however long it > takes to bring it back up. RSF-1 runs in the zfs stack and send the > writes to the second system. It's kind of pricey, but actually much less > expensive than commercial alternatives. > > Anytime you run anything sync it adds latency but makes things safer.. not surprising, it all depends on the usecase > There is also a cool tool I like, called zerto for vmware that sits in > the hypervisor and sends a sync copy of a write locally and then an > async remotely. It's pretty cool. Although I haven't run it myself, have > a bunch of customers running it. I believe it works with proxmox too. > > Most people I run into (these days) don't mind losing 5 or even 30 > minutes of data. Small shops. you talk about minutes, what delta size are we talking here about? why not using zrep in a loop for example They usually have a copy somewhere else. > Or the cost of 5-30 minutes isn't that great. I used work as a > datacenter architect for sun/oracle with only fortune 500. There losing > 1 sec could put large companies out of business. I worked with banks and > exchanges. again, usecase. i bet 99% on this list are not operating fortune 500 bank filers They couldn't ever lose a single transaction. Most people > nowadays do the replication/availability in the application though and > don't care about underlying hardware, especially disk. > > > On 8/17/16 11:55 AM, Chris Watson wrote: >> Of course, if you are willing to accept some amount of data loss that >> opens up a lot more options. :) >> >> Some may find that acceptable though. Like turning off fsync with >> PostgreSQL to get much higher throughput. As little no as you are made >> *very* aware of the risks. >> >> It's good to have input in this thread from one with more experience >> with RSF-1 than the rest of us. You confirm what others have that said >> about RSF-1, that it's stable and works well. What were you deploying >> it on? >> >> Chris >> >> Sent from my iPhone 5 >> >> On Aug 17, 2016, at 11:18 AM, Linda Kateley > > wrote: >> >>> The question I always ask, as an architect, is "can you lose 1 minute >>> worth of data?" If you can, then batched replication is perfect. If >>> you can't.. then HA. Every place I have positioned it, rsf-1 has >>> worked extremely well. If i remember right, it works at the dmu. I >>> would suggest try it. They have been trying to have a full freebsd >>> solution, I have several customers running it well. >>> >>> linda >>> >>> >>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen >>>> Gotteswinter wrote: >>>>> >>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen >>>>>> Gotteswinter wrote: >>>>>>> >>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>> > wrote: >>>>>>>>> >>>>>>>>> As I said in a previous post I tested the zfs send/receive >>>>>>>>> approach (with >>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in >>>>>>>>> all what you >>>>>>>>> said, especially about off-site replicate and synchronous >>>>>>>>> replication. >>>>>>>>> >>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the >>>>>>>>> moment, >>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but >>>>>>>>> ATM it >>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>> I must be too old school, but I don’t quite like the idea of >>>>>>>> using an essentially unreliable transport >>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>> >>>>>>>> In case something went wrong, that approach could risk >>>>>>>> corrupting a pool. Although, frankly, >>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA >>>>>>>> problem that caused some >>>>>>>> silent corruption. >>>>>>> try dual split import :D i mean, zpool -f import on 2 machines >>>>>>> hooked up >>>>>>> to the same disk chassis. >>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>> >>>>>> I'm still busy to test the whole setup here, including the >>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>>>>> that thanks to: >>>>>> >>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>> and you can't import the pool (even with -f) for ex (filer2 is the >>>>>> BACKUP): >>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>>> >>>>>> - The shared pool should not be mounted at boot, and you should >>>>>> ensure >>>>>> that the failover script is not executed during boot time too: >>>>>> this is >>>>>> to handle the case wherein both machines turn off and/or re-ignite at >>>>>> the same time. Indeed, the CARP interface can "flip" it's status >>>>>> if both >>>>>> machines are powered on at the same time, for ex: >>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>>>>> you will have a split-brain scenario >>>>>> >>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>>> happen, this can be handled with a trigger file or something like >>>>>> that >>>>>> >>>>>> - I've still have to check if the order is OK, but I think that as >>>>>> long >>>>>> as you shutdown the replication interface and that you adapt the >>>>>> advskew (including the config file) of the CARP interface before the >>>>>> zpool import -f in the failover script you can be relatively >>>>>> confident >>>>>> that nothing will be written on the iSCSI targets >>>>>> >>>>>> - A zpool scrub should be run at regular intervals >>>>>> >>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>>> >>>>>> Julien >>>>>> >>>>> 100€ question without detailed looking at that script. yes from a >>>>> first >>>>> view its super simple, but: why are solutions like rsf-1 such more >>>>> powerful / featurerich. Theres a reason for, which is that they try to >>>>> cover every possible situation (which makes more than sense for this). >>>> I've never used "rsf-1" so I can't say much more about it, but I have >>>> no doubts about it's ability to handle "complex situations", where >>>> multiple nodes / networks are involved. >>>> >>>>> That script works for sure, within very limited cases imho >>>>> >>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen >>>>>>> sooner >>>>>>> or later especially when it comes to homegrown automatism solutions. >>>>>>> even the commercial parts where much more time/work goes into such >>>>>>> solutions fail in a regular manner >>>>>>> >>>>>>>> The advantage of ZFS send/receive of datasets is, however, that >>>>>>>> you can consider it >>>>>>>> essentially atomic. A transport corruption should not cause >>>>>>>> trouble (apart from a failed >>>>>>>> "zfs receive") and with snapshot retention you can even roll >>>>>>>> back. You can’t roll back >>>>>>>> zpool replications :) >>>>>>>> >>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your >>>>>>>> zfs receive doesn’t involve a rollback >>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. >>>>>>>> Just make sure that your replica datasets >>>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Borja. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to >>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>>>> " >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>> To unsubscribe, send any mail to >>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>>> " >>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >>> " > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Thu Aug 18 07:34:17 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7BE46BBE1F3 for ; Thu, 18 Aug 2016 07:34:17 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) by mx1.freebsd.org (Postfix) with ESMTP id 2796C1B1C for ; Thu, 18 Aug 2016 07:34:16 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 12A5D4C4C804; Thu, 18 Aug 2016 09:34:15 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uYYVsmXkrWdz; Thu, 18 Aug 2016 09:34:12 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id 2E51A4C4C688; Thu, 18 Aug 2016 09:34:12 +0200 (CEST) Reply-To: juergen.gotteswinter@internetx.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com> To: Ben RUBSON , freebsd-fs@freebsd.org From: InterNetX - Juergen Gotteswinter Organization: InterNetX GmbH Message-ID: Date: Thu, 18 Aug 2016 09:34:10 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 07:34:17 -0000 Am 17.08.2016 um 23:14 schrieb Ben RUBSON: > >> On 17 Aug 2016, at 20:03, Linda Kateley wrote: >> >> RSF-1 runs in the zfs stack and send the writes to the second system. > > Linda, do you have any link to a documentation about this RSF-1 operation mode ? > > According to what I red about RSF-1, storage is shared between nodes, and RSF-1 manages the failover, we do not have 2 different storages. > (so I don't really understand how writes are sent to the "second system") yes this is how i know rsf-1, too. external cross cabled sas jbods hooked up to two headnodes. it all works (or fails) if activating / disabling of sas channels works like expected. > > In addition, RSF-1 does not seem to help with long-distance replication to a different storage. i think theres something called metro replication, but did not dig further into that. might be part of nexenta > But I may be wrong ? > This is where ZFS send/receive helps. > Or even a nicer solution I proposed a few weeks ago : https://www.illumos.org/issues/7166 (but a lot of work to achieve). > > Ben > >> On 8/17/16 11:55 AM, Chris Watson wrote: >>> Of course, if you are willing to accept some amount of data loss that opens up a lot more options. :) >>> >>> Some may find that acceptable though. Like turning off fsync with PostgreSQL to get much higher throughput. As little no as you are made *very* aware of the risks. >>> >>> It's good to have input in this thread from one with more experience with RSF-1 than the rest of us. You confirm what others have that said about RSF-1, that it's stable and works well. What were you deploying it on? >>> >>> Chris >>> >>> Sent from my iPhone 5 >>> >>> On Aug 17, 2016, at 11:18 AM, Linda Kateley > wrote: >>> >>>> The question I always ask, as an architect, is "can you lose 1 minute worth of data?" If you can, then batched replication is perfect. If you can't.. then HA. Every place I have positioned it, rsf-1 has worked extremely well. If i remember right, it works at the dmu. I would suggest try it. They have been trying to have a full freebsd solution, I have several customers running it well. >>>> >>>> linda >>>> >>>> >>>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter wrote: >>>>>> >>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote: >>>>>>>> >>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar > wrote: >>>>>>>>>> >>>>>>>>>> As I said in a previous post I tested the zfs send/receive approach (with >>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in all what you >>>>>>>>>> said, especially about off-site replicate and synchronous replication. >>>>>>>>>> >>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment, >>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it >>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport >>>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>>> >>>>>>>>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly, >>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some >>>>>>>>> silent corruption. >>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines hooked up >>>>>>>> to the same disk chassis. >>>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>>> >>>>>>> I'm still busy to test the whole setup here, including the >>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>>>>>> that thanks to: >>>>>>> >>>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>>> and you can't import the pool (even with -f) for ex (filer2 is the >>>>>>> BACKUP): >>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>>>> >>>>>>> - The shared pool should not be mounted at boot, and you should ensure >>>>>>> that the failover script is not executed during boot time too: this is >>>>>>> to handle the case wherein both machines turn off and/or re-ignite at >>>>>>> the same time. Indeed, the CARP interface can "flip" it's status if both >>>>>>> machines are powered on at the same time, for ex: >>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>>>>>> you will have a split-brain scenario >>>>>>> >>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>>>> happen, this can be handled with a trigger file or something like that >>>>>>> >>>>>>> - I've still have to check if the order is OK, but I think that as long >>>>>>> as you shutdown the replication interface and that you adapt the >>>>>>> advskew (including the config file) of the CARP interface before the >>>>>>> zpool import -f in the failover script you can be relatively confident >>>>>>> that nothing will be written on the iSCSI targets >>>>>>> >>>>>>> - A zpool scrub should be run at regular intervals >>>>>>> >>>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>>>> >>>>>>> Julien >>>>>>> >>>>>> 100€ question without detailed looking at that script. yes from a first >>>>>> view its super simple, but: why are solutions like rsf-1 such more >>>>>> powerful / featurerich. Theres a reason for, which is that they try to >>>>>> cover every possible situation (which makes more than sense for this). >>>>> I've never used "rsf-1" so I can't say much more about it, but I have >>>>> no doubts about it's ability to handle "complex situations", where >>>>> multiple nodes / networks are involved. >>>>> >>>>>> That script works for sure, within very limited cases imho >>>>>> >>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner >>>>>>>> or later especially when it comes to homegrown automatism solutions. >>>>>>>> even the commercial parts where much more time/work goes into such >>>>>>>> solutions fail in a regular manner >>>>>>>> >>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that you can consider it >>>>>>>>> essentially atomic. A transport corruption should not cause trouble (apart from a failed >>>>>>>>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back >>>>>>>>> zpool replications :) >>>>>>>>> >>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback >>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets >>>>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Borja. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org " >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org " >>>> >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org " >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Thu Aug 18 07:36:28 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AB55DBBE275 for ; Thu, 18 Aug 2016 07:36:28 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com [IPv6:2a00:1450:400c:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 15E3A1C24 for ; Thu, 18 Aug 2016 07:36:28 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: by mail-wm0-x232.google.com with SMTP id f65so227964671wmi.0 for ; Thu, 18 Aug 2016 00:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=NcF8jKkpWLdAdzRFFppe2SYWjpmHImGWrvZiVxsktUc=; b=a3ZYPJG8FFMs5RwYItLWeRGIrh/E/VEoB/k3gapwvsnpn8a/OFgD7ENh+W5M27HJeZ b/fTIJukHY8RRrRIr5VrH2aSuivNG0pxcWhNz030HbiTVz9SmEwfd6a8u+saJpVxfgVq RfvxwtzoAsIODMZbNVPqOSGNCL6mU36BN+Z6ux1Jw51BJC1qQeEpuqEvuHLUT5C/Pee/ RUSsK4OtRHPxtswhqEmBIegBt4QInX1mAPGKvEoTN60voRbRT/kZkVAu43sJ8e0uTNHa NU7emyxpjhlHa3GwJvMYTin6bhlEWdcNZPn+UkKeluiaKYfl2T11qbmZZq5ZMcgZN1Ou IQSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=NcF8jKkpWLdAdzRFFppe2SYWjpmHImGWrvZiVxsktUc=; b=lWced+U/egwcJoSht+CHRN2eVawX66XKpEQE1FaLWu7XLSsDCY2A72habXx/BNpDDT Fp0eKr5UsMqgdvqdH8V/jkbjfqZ/Sg6ChSZEFFBdSPdWP/Gckg0Dl8QLKtjm851GYweX iQD4W02f9gyzsrJd5NZxvYOjFI+u7e16IQpz+U+ntqnSoXH4SU+ma+qj4EyRgvCkyYAe IC7Nfl0m1AiOhEYShaJI5Ig5+z5nUi3HRmfbxyKtVOQHtFuWLOVzvMfTy+gQDC9+ngEd 7vQFrYLIv+WZqJp4w6yr5R/vv+LNCBj6/WxxbHCicMfRBE2z8uXqm6IN3z4ejmm+CId7 YuWQ== X-Gm-Message-State: AEkooutrj5TjLikhaF8aG7M1kcwLc/O7Z1JFFcVDvrBrcpy+R0QKly5gTk3gtddy0WORkJvjPcXE0ENMsFf/TA== X-Received: by 10.28.139.144 with SMTP id n138mr1116835wmd.71.1471505785461; Thu, 18 Aug 2016 00:36:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.54.202 with HTTP; Thu, 18 Aug 2016 00:36:24 -0700 (PDT) In-Reply-To: <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> From: krad Date: Thu, 18 Aug 2016 08:36:24 +0100 Message-ID: Subject: Re: HAST + ZFS + NFS + CARP To: InterNetX - Juergen Gotteswinter Cc: linda@kateley.com, Chris Watson , FreeBSD FS Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 07:36:28 -0000 I didnt think touch was atomic, mkdir is though On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter < juergen.gotteswinter@internetx.com> wrote: > > > Am 17.08.2016 um 20:03 schrieb Linda Kateley: > > I just do consulting so I don't always get to see the end of the > > project. Although we are starting to do more ongoing support so we can > > see the progress.. > > > > I have worked with some of the guys from high-availability.com for mayb= e > > 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work > > beautifully with omni/illumos. The one customer I have running it in > > prod is an isp in south america running openstack and zfs on freebsd as > > iscsi. Big boxes, 90+ drives per frame. If someone would like try it, = i > > have some contacts there. Ping me offlist. > > no offense, but it sounds a bit like marketing. > > here: running nexenta ha setup since several years with one catastrophic > failure due to split brain > > > > > You do risk losing data if you batch zfs send. It is very hard to run > > that real time. > > depends on how much data changes aka delta size > > > You have to take the snap then send the snap. Most > > people run in cron, even if it's not in cron, you would want one to > > finish before you started the next. > > thats the reason why lock files where invented, tools like zrep handle > that themself via additional zfs properties > > or, if one does not trust a single layer > > -- snip -- > #!/bin/sh > if [ ! -f /var/run/replic ] ; then > touch /var/run/replic > /blah/path/zrep sync all >> /var/log/zfsrepli.log > rm -f /var/run/replic > fi > -- snip -- > > something like this, simple > > If you lose the sending host before > > the receive is complete you won't have a full copy. > > if rsf fails, and you end up in split brain you loose way more. been > there, seen that. > > With zfs though you > > will probably still have the data on the sending host, however long it > > takes to bring it back up. RSF-1 runs in the zfs stack and send the > > writes to the second system. It's kind of pricey, but actually much les= s > > expensive than commercial alternatives. > > > > Anytime you run anything sync it adds latency but makes things safer.. > > not surprising, it all depends on the usecase > > > There is also a cool tool I like, called zerto for vmware that sits in > > the hypervisor and sends a sync copy of a write locally and then an > > async remotely. It's pretty cool. Although I haven't run it myself, hav= e > > a bunch of customers running it. I believe it works with proxmox too. > > > > Most people I run into (these days) don't mind losing 5 or even 30 > > minutes of data. Small shops. > > you talk about minutes, what delta size are we talking here about? why > not using zrep in a loop for example > > They usually have a copy somewhere else. > > Or the cost of 5-30 minutes isn't that great. I used work as a > > datacenter architect for sun/oracle with only fortune 500. There losing > > 1 sec could put large companies out of business. I worked with banks an= d > > exchanges. > > again, usecase. i bet 99% on this list are not operating fortune 500 > bank filers > > They couldn't ever lose a single transaction. Most people > > nowadays do the replication/availability in the application though and > > don't care about underlying hardware, especially disk. > > > > > > On 8/17/16 11:55 AM, Chris Watson wrote: > >> Of course, if you are willing to accept some amount of data loss that > >> opens up a lot more options. :) > >> > >> Some may find that acceptable though. Like turning off fsync with > >> PostgreSQL to get much higher throughput. As little no as you are made > >> *very* aware of the risks. > >> > >> It's good to have input in this thread from one with more experience > >> with RSF-1 than the rest of us. You confirm what others have that said > >> about RSF-1, that it's stable and works well. What were you deploying > >> it on? > >> > >> Chris > >> > >> Sent from my iPhone 5 > >> > >> On Aug 17, 2016, at 11:18 AM, Linda Kateley >> > wrote: > >> > >>> The question I always ask, as an architect, is "can you lose 1 minute > >>> worth of data?" If you can, then batched replication is perfect. If > >>> you can't.. then HA. Every place I have positioned it, rsf-1 has > >>> worked extremely well. If i remember right, it works at the dmu. I > >>> would suggest try it. They have been trying to have a full freebsd > >>> solution, I have several customers running it well. > >>> > >>> linda > >>> > >>> > >>> On 8/17/16 4:52 AM, Julien Cigar wrote: > >>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen > >>>> Gotteswinter wrote: > >>>>> > >>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: > >>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen > >>>>>> Gotteswinter wrote: > >>>>>>> > >>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: > >>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>>> > wrote: > >>>>>>>>> > >>>>>>>>> As I said in a previous post I tested the zfs send/receive > >>>>>>>>> approach (with > >>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in > >>>>>>>>> all what you > >>>>>>>>> said, especially about off-site replicate and synchronous > >>>>>>>>> replication. > >>>>>>>>> > >>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the > >>>>>>>>> moment, > >>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but > >>>>>>>>> ATM it > >>>>>>>>> works as expected, I havent' managed to corrupt the zpool. > >>>>>>>> I must be too old school, but I don=E2=80=99t quite like the ide= a of > >>>>>>>> using an essentially unreliable transport > >>>>>>>> (Ethernet) for low-level filesystem operations. > >>>>>>>> > >>>>>>>> In case something went wrong, that approach could risk > >>>>>>>> corrupting a pool. Although, frankly, > >>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA > >>>>>>>> problem that caused some > >>>>>>>> silent corruption. > >>>>>>> try dual split import :D i mean, zpool -f import on 2 machines > >>>>>>> hooked up > >>>>>>> to the same disk chassis. > >>>>>> Yes this is the first thing on the list to avoid .. :) > >>>>>> > >>>>>> I'm still busy to test the whole setup here, including the > >>>>>> MASTER -> BACKUP failover script (CARP), but I think you can preve= nt > >>>>>> that thanks to: > >>>>>> > >>>>>> - As long as ctld is running on the BACKUP the disks are locked > >>>>>> and you can't import the pool (even with -f) for ex (filer2 is the > >>>>>> BACKUP): > >>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > >>>>>> > >>>>>> - The shared pool should not be mounted at boot, and you should > >>>>>> ensure > >>>>>> that the failover script is not executed during boot time too: > >>>>>> this is > >>>>>> to handle the case wherein both machines turn off and/or re-ignite > at > >>>>>> the same time. Indeed, the CARP interface can "flip" it's status > >>>>>> if both > >>>>>> machines are powered on at the same time, for ex: > >>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf > and > >>>>>> you will have a split-brain scenario > >>>>>> > >>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons > >>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not > >>>>>> happen, this can be handled with a trigger file or something like > >>>>>> that > >>>>>> > >>>>>> - I've still have to check if the order is OK, but I think that as > >>>>>> long > >>>>>> as you shutdown the replication interface and that you adapt the > >>>>>> advskew (including the config file) of the CARP interface before t= he > >>>>>> zpool import -f in the failover script you can be relatively > >>>>>> confident > >>>>>> that nothing will be written on the iSCSI targets > >>>>>> > >>>>>> - A zpool scrub should be run at regular intervals > >>>>>> > >>>>>> This is my MASTER -> BACKUP CARP script ATM > >>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > >>>>>> > >>>>>> Julien > >>>>>> > >>>>> 100=E2=82=AC question without detailed looking at that script. yes = from a > >>>>> first > >>>>> view its super simple, but: why are solutions like rsf-1 such more > >>>>> powerful / featurerich. Theres a reason for, which is that they try > to > >>>>> cover every possible situation (which makes more than sense for > this). > >>>> I've never used "rsf-1" so I can't say much more about it, but I hav= e > >>>> no doubts about it's ability to handle "complex situations", where > >>>> multiple nodes / networks are involved. > >>>> > >>>>> That script works for sure, within very limited cases imho > >>>>> > >>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen > >>>>>>> sooner > >>>>>>> or later especially when it comes to homegrown automatism > solutions. > >>>>>>> even the commercial parts where much more time/work goes into suc= h > >>>>>>> solutions fail in a regular manner > >>>>>>> > >>>>>>>> The advantage of ZFS send/receive of datasets is, however, that > >>>>>>>> you can consider it > >>>>>>>> essentially atomic. A transport corruption should not cause > >>>>>>>> trouble (apart from a failed > >>>>>>>> "zfs receive") and with snapshot retention you can even roll > >>>>>>>> back. You can=E2=80=99t roll back > >>>>>>>> zpool replications :) > >>>>>>>> > >>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your > >>>>>>>> zfs receive doesn=E2=80=99t involve a rollback > >>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by mis= take. > >>>>>>>> Just make sure that your replica datasets > >>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain. > >>>>>>>> > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Borja. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> freebsd-fs@freebsd.org mailing > list > >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>>>>>> To unsubscribe, send any mail to > >>>>>>>> "freebsd-fs-unsubscribe@freebsd.org > >>>>>>>> " > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> freebsd-fs@freebsd.org mailing > list > >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>>>>> To unsubscribe, send any mail to > >>>>>>> "freebsd-fs-unsubscribe@freebsd.org > >>>>>>> " > >>> > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > >>> " > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Thu Aug 18 07:38:28 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86807BBE2E9 for ; Thu, 18 Aug 2016 07:38:28 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 085D11CE8 for ; Thu, 18 Aug 2016 07:38:27 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 70F6849FC2B9; Thu, 18 Aug 2016 09:38:25 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gp8C3KOyuofX; Thu, 18 Aug 2016 09:38:18 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id DBA524C4C688; Thu, 18 Aug 2016 09:38:18 +0200 (CEST) Reply-To: juergen.gotteswinter@internetx.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> To: krad Cc: linda@kateley.com, Chris Watson , FreeBSD FS From: InterNetX - Juergen Gotteswinter Organization: InterNetX GmbH Message-ID: <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com> Date: Thu, 18 Aug 2016 09:38:16 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 07:38:28 -0000 uhm, dont really investigated if it is or not. add a "sync" after that? or replace it? but anyway, thanks for the hint. will dig into this! Am 18.08.2016 um 09:36 schrieb krad: > I didnt think touch was atomic, mkdir is though > > On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter > > wrote: > > > > Am 17.08.2016 um 20:03 schrieb Linda Kateley: > > I just do consulting so I don't always get to see the end of the > > project. Although we are starting to do more ongoing support so we can > > see the progress.. > > > > I have worked with some of the guys from high-availability.com for maybe > > 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work > > beautifully with omni/illumos. The one customer I have running it in > > prod is an isp in south america running openstack and zfs on freebsd as > > iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i > > have some contacts there. Ping me offlist. > > no offense, but it sounds a bit like marketing. > > here: running nexenta ha setup since several years with one catastrophic > failure due to split brain > > > > > You do risk losing data if you batch zfs send. It is very hard to run > > that real time. > > depends on how much data changes aka delta size > > > You have to take the snap then send the snap. Most > > people run in cron, even if it's not in cron, you would want one to > > finish before you started the next. > > thats the reason why lock files where invented, tools like zrep handle > that themself via additional zfs properties > > or, if one does not trust a single layer > > -- snip -- > #!/bin/sh > if [ ! -f /var/run/replic ] ; then > touch /var/run/replic > /blah/path/zrep sync all >> /var/log/zfsrepli.log > rm -f /var/run/replic > fi > -- snip -- > > something like this, simple > > If you lose the sending host before > > the receive is complete you won't have a full copy. > > if rsf fails, and you end up in split brain you loose way more. been > there, seen that. > > With zfs though you > > will probably still have the data on the sending host, however long it > > takes to bring it back up. RSF-1 runs in the zfs stack and send the > > writes to the second system. It's kind of pricey, but actually much less > > expensive than commercial alternatives. > > > > Anytime you run anything sync it adds latency but makes things safer.. > > not surprising, it all depends on the usecase > > > There is also a cool tool I like, called zerto for vmware that sits in > > the hypervisor and sends a sync copy of a write locally and then an > > async remotely. It's pretty cool. Although I haven't run it myself, have > > a bunch of customers running it. I believe it works with proxmox too. > > > > Most people I run into (these days) don't mind losing 5 or even 30 > > minutes of data. Small shops. > > you talk about minutes, what delta size are we talking here about? why > not using zrep in a loop for example > > They usually have a copy somewhere else. > > Or the cost of 5-30 minutes isn't that great. I used work as a > > datacenter architect for sun/oracle with only fortune 500. There losing > > 1 sec could put large companies out of business. I worked with banks and > > exchanges. > > again, usecase. i bet 99% on this list are not operating fortune 500 > bank filers > > They couldn't ever lose a single transaction. Most people > > nowadays do the replication/availability in the application though and > > don't care about underlying hardware, especially disk. > > > > > > On 8/17/16 11:55 AM, Chris Watson wrote: > >> Of course, if you are willing to accept some amount of data loss that > >> opens up a lot more options. :) > >> > >> Some may find that acceptable though. Like turning off fsync with > >> PostgreSQL to get much higher throughput. As little no as you are > made > >> *very* aware of the risks. > >> > >> It's good to have input in this thread from one with more experience > >> with RSF-1 than the rest of us. You confirm what others have that > said > >> about RSF-1, that it's stable and works well. What were you deploying > >> it on? > >> > >> Chris > >> > >> Sent from my iPhone 5 > >> > >> On Aug 17, 2016, at 11:18 AM, Linda Kateley > >> >> wrote: > >> > >>> The question I always ask, as an architect, is "can you lose 1 > minute > >>> worth of data?" If you can, then batched replication is perfect. If > >>> you can't.. then HA. Every place I have positioned it, rsf-1 has > >>> worked extremely well. If i remember right, it works at the dmu. I > >>> would suggest try it. They have been trying to have a full freebsd > >>> solution, I have several customers running it well. > >>> > >>> linda > >>> > >>> > >>> On 8/17/16 4:52 AM, Julien Cigar wrote: > >>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen > >>>> Gotteswinter wrote: > >>>>> > >>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: > >>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen > >>>>>> Gotteswinter wrote: > >>>>>>> > >>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: > >>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>>> >> wrote: > >>>>>>>>> > >>>>>>>>> As I said in a previous post I tested the zfs send/receive > >>>>>>>>> approach (with > >>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in > >>>>>>>>> all what you > >>>>>>>>> said, especially about off-site replicate and synchronous > >>>>>>>>> replication. > >>>>>>>>> > >>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the > >>>>>>>>> moment, > >>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but > >>>>>>>>> ATM it > >>>>>>>>> works as expected, I havent' managed to corrupt the zpool. > >>>>>>>> I must be too old school, but I don’t quite like the idea of > >>>>>>>> using an essentially unreliable transport > >>>>>>>> (Ethernet) for low-level filesystem operations. > >>>>>>>> > >>>>>>>> In case something went wrong, that approach could risk > >>>>>>>> corrupting a pool. Although, frankly, > >>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA > >>>>>>>> problem that caused some > >>>>>>>> silent corruption. > >>>>>>> try dual split import :D i mean, zpool -f import on 2 machines > >>>>>>> hooked up > >>>>>>> to the same disk chassis. > >>>>>> Yes this is the first thing on the list to avoid .. :) > >>>>>> > >>>>>> I'm still busy to test the whole setup here, including the > >>>>>> MASTER -> BACKUP failover script (CARP), but I think you can > prevent > >>>>>> that thanks to: > >>>>>> > >>>>>> - As long as ctld is running on the BACKUP the disks are locked > >>>>>> and you can't import the pool (even with -f) for ex (filer2 > is the > >>>>>> BACKUP): > >>>>>> > https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > > >>>>>> > >>>>>> - The shared pool should not be mounted at boot, and you should > >>>>>> ensure > >>>>>> that the failover script is not executed during boot time too: > >>>>>> this is > >>>>>> to handle the case wherein both machines turn off and/or > re-ignite at > >>>>>> the same time. Indeed, the CARP interface can "flip" it's status > >>>>>> if both > >>>>>> machines are powered on at the same time, for ex: > >>>>>> > https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf > and > >>>>>> you will have a split-brain scenario > >>>>>> > >>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons > >>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not > >>>>>> happen, this can be handled with a trigger file or something like > >>>>>> that > >>>>>> > >>>>>> - I've still have to check if the order is OK, but I think > that as > >>>>>> long > >>>>>> as you shutdown the replication interface and that you adapt the > >>>>>> advskew (including the config file) of the CARP interface > before the > >>>>>> zpool import -f in the failover script you can be relatively > >>>>>> confident > >>>>>> that nothing will be written on the iSCSI targets > >>>>>> > >>>>>> - A zpool scrub should be run at regular intervals > >>>>>> > >>>>>> This is my MASTER -> BACKUP CARP script ATM > >>>>>> > https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > > >>>>>> > >>>>>> Julien > >>>>>> > >>>>> 100€ question without detailed looking at that script. yes from a > >>>>> first > >>>>> view its super simple, but: why are solutions like rsf-1 such more > >>>>> powerful / featurerich. Theres a reason for, which is that > they try to > >>>>> cover every possible situation (which makes more than sense > for this). > >>>> I've never used "rsf-1" so I can't say much more about it, but > I have > >>>> no doubts about it's ability to handle "complex situations", where > >>>> multiple nodes / networks are involved. > >>>> > >>>>> That script works for sure, within very limited cases imho > >>>>> > >>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen > >>>>>>> sooner > >>>>>>> or later especially when it comes to homegrown automatism > solutions. > >>>>>>> even the commercial parts where much more time/work goes > into such > >>>>>>> solutions fail in a regular manner > >>>>>>> > >>>>>>>> The advantage of ZFS send/receive of datasets is, however, that > >>>>>>>> you can consider it > >>>>>>>> essentially atomic. A transport corruption should not cause > >>>>>>>> trouble (apart from a failed > >>>>>>>> "zfs receive") and with snapshot retention you can even roll > >>>>>>>> back. You can’t roll back > >>>>>>>> zpool replications :) > >>>>>>>> > >>>>>>>> ZFS receive does a lot of sanity checks as well. As long as > your > >>>>>>>> zfs receive doesn’t involve a rollback > >>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. > >>>>>>>> Just make sure that your replica datasets > >>>>>>>> aren’t mounted and zfs receive won’t complain. > >>>>>>>> > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Borja. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> freebsd-fs@freebsd.org > > > mailing list > >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>>>>>>> To unsubscribe, send any mail to > >>>>>>>> "freebsd-fs-unsubscribe@freebsd.org > > >>>>>>>> >" > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> freebsd-fs@freebsd.org > > > mailing list > >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>>>>>> To unsubscribe, send any mail to > >>>>>>> "freebsd-fs-unsubscribe@freebsd.org > > >>>>>>> >" > >>> > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org > > > mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> To unsubscribe, send any mail to > "freebsd-fs-unsubscribe@freebsd.org > > >>> >" > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to > "freebsd-fs-unsubscribe@freebsd.org > " > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > " > > From owner-freebsd-fs@freebsd.org Thu Aug 18 07:40:55 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3A361BBE374 for ; Thu, 18 Aug 2016 07:40:55 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com [IPv6:2a00:1450:400c:c09::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B61BC1DF8 for ; Thu, 18 Aug 2016 07:40:54 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x234.google.com with SMTP id f65so228085641wmi.0 for ; Thu, 18 Aug 2016 00:40:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=LZ9hgXyuFbwJYTi8xbl0jQXPmRYJBO/VHIlzpHKBLTo=; b=I2uh/TPpePQ+UKWE9AHHeLMgwhk+porBbOPM0IuUc4WdV71l9Sbqr1giaEoa93GY+C Wt8GU9/5ldAPMf3cxXV4PysATfAjcRStENhjFrDYyTRmcUoHcmk+bZSW/Jv4WN7w4XQx PpQUZTMdIpK5IjufUQsmix8cBvuXtyDLOXayGQR8gGSIKW0p4vAZzfSfvpJNXx6afD9y r1NIECKuXQXW1Zyu0SIyIOJUpbkoF1MHZjZ7ReJvhzX3Inkh7D+g0W0WR6iSHXSzT4Ut FJfw4u1xhPzF7xVXzS9nF7CoCfe5VfjMonrS6HtWsEgtC3B2WmMuiMWBR+4Oe56B8ilg gXgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=LZ9hgXyuFbwJYTi8xbl0jQXPmRYJBO/VHIlzpHKBLTo=; b=I8fBDbKxurWDOSwyfMttenEGh1R4djiu7o+db1zOu4S3chwPqmnYNlEqXOcMuEOojZ jq+luo6uIOloby34PnDLK0blyvHaDWAKCk3kGUzVGNyQJWfRe7L4a566OkWbqSx1QAFu xJGy7EMGcL3HNZXwblFDePlCiWVMuAp2j3NGG15EH5Q4pUx3K0tbJA29CPb7XskIK3EQ ZeOWGTZGHFLN+P4EUM+25ChpgdgY0ObPLg8unssWb/rZz57KfAot8TtSVit8KPs3oLat P1UG+Sx8ncqBbKfmKXYCMPAktPzcEJINYLYKtbf5/uMaihne3prwG9y4KT7oFhfQ7EIi O4kA== X-Gm-Message-State: AEkooutERVixPeZVl47pS/1GBNfmZiYZVfRnINe9iZZ8ADMXWrbJ7YzJR/ASuP8WjVqgxA== X-Received: by 10.194.77.97 with SMTP id r1mr817033wjw.83.1471506052538; Thu, 18 Aug 2016 00:40:52 -0700 (PDT) Received: from macbook-air-de-benjamin-1.home (LFbn-1-7077-85.w90-116.abo.wanadoo.fr. [90.116.246.85]) by smtp.gmail.com with ESMTPSA id i80sm1276096wmf.11.2016.08.18.00.40.51 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 18 Aug 2016 00:40:51 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Ben RUBSON In-Reply-To: <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com> Date: Thu, 18 Aug 2016 09:40:50 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com> To: FreeBSD FS X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 07:40:55 -0000 Yep this is better : if mkdir then do_your_job rm -rf fi > On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter = wrote: >=20 > uhm, dont really investigated if it is or not. add a "sync" after = that? > or replace it? >=20 > but anyway, thanks for the hint. will dig into this! >=20 > Am 18.08.2016 um 09:36 schrieb krad: >> I didnt think touch was atomic, mkdir is though >>=20 >> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter >> > > wrote: >>=20 >>=20 >>=20 >> Am 17.08.2016 um 20:03 schrieb Linda Kateley: >>> I just do consulting so I don't always get to see the end of the >>> project. Although we are starting to do more ongoing support so we = can >>> see the progress.. >>>=20 >>> I have worked with some of the guys from high-availability.com = for maybe >>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does = work >>> beautifully with omni/illumos. The one customer I have running it in >>> prod is an isp in south america running openstack and zfs on freebsd = as >>> iscsi. Big boxes, 90+ drives per frame. If someone would like try = it, i >>> have some contacts there. Ping me offlist. >>=20 >> no offense, but it sounds a bit like marketing. >>=20 >> here: running nexenta ha setup since several years with one = catastrophic >> failure due to split brain >>=20 >>>=20 >>> You do risk losing data if you batch zfs send. It is very hard to = run >>> that real time. >>=20 >> depends on how much data changes aka delta size >>=20 >>=20 >> You have to take the snap then send the snap. Most >>> people run in cron, even if it's not in cron, you would want one to >>> finish before you started the next. >>=20 >> thats the reason why lock files where invented, tools like zrep = handle >> that themself via additional zfs properties >>=20 >> or, if one does not trust a single layer >>=20 >> -- snip -- >> #!/bin/sh >> if [ ! -f /var/run/replic ] ; then >> touch /var/run/replic >> /blah/path/zrep sync all >> /var/log/zfsrepli.log >> rm -f /var/run/replic >> fi >> -- snip -- >>=20 >> something like this, simple >>=20 >> If you lose the sending host before >>> the receive is complete you won't have a full copy. >>=20 >> if rsf fails, and you end up in split brain you loose way more. = been >> there, seen that. >>=20 >> With zfs though you >>> will probably still have the data on the sending host, however long = it >>> takes to bring it back up. RSF-1 runs in the zfs stack and send the >>> writes to the second system. It's kind of pricey, but actually much = less >>> expensive than commercial alternatives. >>>=20 >>> Anytime you run anything sync it adds latency but makes things = safer.. >>=20 >> not surprising, it all depends on the usecase >>=20 >>> There is also a cool tool I like, called zerto for vmware that sits = in >>> the hypervisor and sends a sync copy of a write locally and then an >>> async remotely. It's pretty cool. Although I haven't run it myself, = have >>> a bunch of customers running it. I believe it works with proxmox = too. >>>=20 >>> Most people I run into (these days) don't mind losing 5 or even 30 >>> minutes of data. Small shops. >>=20 >> you talk about minutes, what delta size are we talking here about? = why >> not using zrep in a loop for example >>=20 >> They usually have a copy somewhere else. >>> Or the cost of 5-30 minutes isn't that great. I used work as a >>> datacenter architect for sun/oracle with only fortune 500. There = losing >>> 1 sec could put large companies out of business. I worked with banks = and >>> exchanges. >>=20 >> again, usecase. i bet 99% on this list are not operating fortune = 500 >> bank filers >>=20 >> They couldn't ever lose a single transaction. Most people >>> nowadays do the replication/availability in the application though = and >>> don't care about underlying hardware, especially disk. >>>=20 >>>=20 >>> On 8/17/16 11:55 AM, Chris Watson wrote: >>>> Of course, if you are willing to accept some amount of data loss = that >>>> opens up a lot more options. :) >>>>=20 >>>> Some may find that acceptable though. Like turning off fsync with >>>> PostgreSQL to get much higher throughput. As little no as you are >> made >>>> *very* aware of the risks. >>>>=20 >>>> It's good to have input in this thread from one with more = experience >>>> with RSF-1 than the rest of us. You confirm what others have that >> said >>>> about RSF-1, that it's stable and works well. What were you = deploying >>>> it on? >>>>=20 >>>> Chris >>>>=20 >>>> Sent from my iPhone 5 >>>>=20 >>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley > >>>> >> wrote: >>>>=20 >>>>> The question I always ask, as an architect, is "can you lose 1 >> minute >>>>> worth of data?" If you can, then batched replication is perfect. = If >>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has >>>>> worked extremely well. If i remember right, it works at the dmu. I >>>>> would suggest try it. They have been trying to have a full freebsd >>>>> solution, I have several customers running it well. >>>>>=20 >>>>> linda >>>>>=20 >>>>>=20 >>>>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen >>>>>> Gotteswinter wrote: >>>>>>>=20 >>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen >>>>>>>> Gotteswinter wrote: >>>>>>>>>=20 >>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar = >>>>>>>>>> > >> wrote: >>>>>>>>>>>=20 >>>>>>>>>>> As I said in a previous post I tested the zfs send/receive >>>>>>>>>>> approach (with >>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in >>>>>>>>>>> all what you >>>>>>>>>>> said, especially about off-site replicate and synchronous >>>>>>>>>>> replication. >>>>>>>>>>>=20 >>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at = the >>>>>>>>>>> moment, >>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, = but >>>>>>>>>>> ATM it >>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>>>> I must be too old school, but I don=E2=80=99t quite like the = idea of >>>>>>>>>> using an essentially unreliable transport >>>>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>>>>=20 >>>>>>>>>> In case something went wrong, that approach could risk >>>>>>>>>> corrupting a pool. Although, frankly, >>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS = HBA >>>>>>>>>> problem that caused some >>>>>>>>>> silent corruption. >>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines >>>>>>>>> hooked up >>>>>>>>> to the same disk chassis. >>>>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>>>>=20 >>>>>>>> I'm still busy to test the whole setup here, including the >>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can >> prevent >>>>>>>> that thanks to: >>>>>>>>=20 >>>>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>>>> and you can't import the pool (even with -f) for ex (filer2 >> is the >>>>>>>> BACKUP): >>>>>>>>=20 >> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >> = >>>>>>>>=20 >>>>>>>> - The shared pool should not be mounted at boot, and you should >>>>>>>> ensure >>>>>>>> that the failover script is not executed during boot time too: >>>>>>>> this is >>>>>>>> to handle the case wherein both machines turn off and/or >> re-ignite at >>>>>>>> the same time. Indeed, the CARP interface can "flip" it's = status >>>>>>>> if both >>>>>>>> machines are powered on at the same time, for ex: >>>>>>>>=20 >> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf >> = and >>>>>>>> you will have a split-brain scenario >>>>>>>>=20 >>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should = not >>>>>>>> happen, this can be handled with a trigger file or something = like >>>>>>>> that >>>>>>>>=20 >>>>>>>> - I've still have to check if the order is OK, but I think >> that as >>>>>>>> long >>>>>>>> as you shutdown the replication interface and that you adapt = the >>>>>>>> advskew (including the config file) of the CARP interface >> before the >>>>>>>> zpool import -f in the failover script you can be relatively >>>>>>>> confident >>>>>>>> that nothing will be written on the iSCSI targets >>>>>>>>=20 >>>>>>>> - A zpool scrub should be run at regular intervals >>>>>>>>=20 >>>>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>>>>=20 >> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >> = >>>>>>>>=20 >>>>>>>> Julien >>>>>>>>=20 >>>>>>> 100=E2=82=AC question without detailed looking at that script. = yes from a >>>>>>> first >>>>>>> view its super simple, but: why are solutions like rsf-1 such = more >>>>>>> powerful / featurerich. Theres a reason for, which is that >> they try to >>>>>>> cover every possible situation (which makes more than sense >> for this). >>>>>> I've never used "rsf-1" so I can't say much more about it, but >> I have >>>>>> no doubts about it's ability to handle "complex situations", = where >>>>>> multiple nodes / networks are involved. >>>>>>=20 >>>>>>> That script works for sure, within very limited cases imho >>>>>>>=20 >>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to = happen >>>>>>>>> sooner >>>>>>>>> or later especially when it comes to homegrown automatism >> solutions. >>>>>>>>> even the commercial parts where much more time/work goes >> into such >>>>>>>>> solutions fail in a regular manner >>>>>>>>>=20 >>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, = that >>>>>>>>>> you can consider it >>>>>>>>>> essentially atomic. A transport corruption should not cause >>>>>>>>>> trouble (apart from a failed >>>>>>>>>> "zfs receive") and with snapshot retention you can even roll >>>>>>>>>> back. You can=E2=80=99t roll back >>>>>>>>>> zpool replications :) >>>>>>>>>>=20 >>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as >> your >>>>>>>>>> zfs receive doesn=E2=80=99t involve a rollback >>>>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by = mistake. >>>>>>>>>> Just make sure that your replica datasets >>>>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t = complain. >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> Cheers, >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> Borja. >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> _______________________________________________ >>>>>>>>>> freebsd-fs@freebsd.org >> > >> mailing list >>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> >>>>>>>>>> To unsubscribe, send any mail to >>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >> >>>>>>>>>> > >" >>>>>>>>>>=20 >>>>>>>>> _______________________________________________ >>>>>>>>> freebsd-fs@freebsd.org >> > >> mailing list >>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> >>>>>>>>> To unsubscribe, send any mail to >>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >> >>>>>>>>> > >" >>>>>=20 >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org >> > >> mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> >>>>> To unsubscribe, send any mail to >> "freebsd-fs-unsubscribe@freebsd.org >> >>>>> > >" >>>=20 >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> >>> To unsubscribe, send any mail to >> "freebsd-fs-unsubscribe@freebsd.org >> " >> _______________________________________________ >> freebsd-fs@freebsd.org mailing = list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> >> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org >> " >>=20 >>=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Thu Aug 18 08:02:55 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7529DBBEE45 for ; Thu, 18 Aug 2016 08:02:55 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) by mx1.freebsd.org (Postfix) with ESMTP id 8328D11BB for ; Thu, 18 Aug 2016 08:02:54 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 5815E45FC0FB; Thu, 18 Aug 2016 10:02:52 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MT7WSVCplP01; Thu, 18 Aug 2016 10:02:47 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id 8736B4C4C698; Thu, 18 Aug 2016 10:02:47 +0200 (CEST) Reply-To: juergen.gotteswinter@internetx.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com> To: Ben RUBSON , FreeBSD FS From: InterNetX - Juergen Gotteswinter Organization: InterNetX GmbH Message-ID: <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com> Date: Thu, 18 Aug 2016 10:02:45 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 08:02:55 -0000 new day, new things learned :) thanks! but like said, zrep does its on locking in zfs properties. so even this is fine while true; do zrep sync all; done see http://www.bolthole.com/solaris/zrep/ the properties look like this tank/vmail redundant_metadata all default tank/vmail zrep:savecount 5 local tank/vmail zrep:lock-time 20160620101703 local tank/vmail zrep:master yes local tank/vmail zrep:src-fs tank/vmail local tank/vmail zrep:dest-host stor1 local tank/vmail zrep:src-host stor2 local tank/vmail zrep:dest-fs tank/vmail local tank/vmail zrep:lock-pid 10887 local it also takes care of the replication partner, the replicated datasets are read only until you tell zrep "go go go, become master" Simple usage summary: zrep (init|-i) ZFS/fs remotehost remoteZFSpool/fs zrep (sync|-S) [-q seconds] ZFS/fs zrep (sync|-S) [-q seconds] all zrep (sync|-S) ZFS/fs@snapshot -- temporary retroactive sync zrep (status|-s) [-v] [(-a|ZFS/fs)] zrep refresh ZFS/fs -- pull version of sync zrep (list|-l) [-Lv] zrep (expire|-e) [-L] (ZFS/fs ...)|(all)|() zrep (changeconfig|-C) [-f] ZFS/fs remotehost remoteZFSpool/fs zrep (changeconfig|-C) [-f] [-d] ZFS/fs srchost srcZFSpool/fs zrep failover [-L] ZFS/fs zrep takeover [-L] ZFS/fs zrep failover pool/ds -> master sets pool read only, connects to slave, sets pool on slave rw should be easy to combine with carp/devd, but this is the land of vodoo automagic again which i dont trust that much. Am 18.08.2016 um 09:40 schrieb Ben RUBSON: > Yep this is better : > > if mkdir > then > do_your_job > rm -rf > fi > > > >> On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter wrote: >> >> uhm, dont really investigated if it is or not. add a "sync" after that? >> or replace it? >> >> but anyway, thanks for the hint. will dig into this! >> >> Am 18.08.2016 um 09:36 schrieb krad: >>> I didnt think touch was atomic, mkdir is though >>> >>> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter >>> >> > wrote: >>> >>> >>> >>> Am 17.08.2016 um 20:03 schrieb Linda Kateley: >>>> I just do consulting so I don't always get to see the end of the >>>> project. Although we are starting to do more ongoing support so we can >>>> see the progress.. >>>> >>>> I have worked with some of the guys from high-availability.com for maybe >>>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work >>>> beautifully with omni/illumos. The one customer I have running it in >>>> prod is an isp in south america running openstack and zfs on freebsd as >>>> iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i >>>> have some contacts there. Ping me offlist. >>> >>> no offense, but it sounds a bit like marketing. >>> >>> here: running nexenta ha setup since several years with one catastrophic >>> failure due to split brain >>> >>>> >>>> You do risk losing data if you batch zfs send. It is very hard to run >>>> that real time. >>> >>> depends on how much data changes aka delta size >>> >>> >>> You have to take the snap then send the snap. Most >>>> people run in cron, even if it's not in cron, you would want one to >>>> finish before you started the next. >>> >>> thats the reason why lock files where invented, tools like zrep handle >>> that themself via additional zfs properties >>> >>> or, if one does not trust a single layer >>> >>> -- snip -- >>> #!/bin/sh >>> if [ ! -f /var/run/replic ] ; then >>> touch /var/run/replic >>> /blah/path/zrep sync all >> /var/log/zfsrepli.log >>> rm -f /var/run/replic >>> fi >>> -- snip -- >>> >>> something like this, simple >>> >>> If you lose the sending host before >>>> the receive is complete you won't have a full copy. >>> >>> if rsf fails, and you end up in split brain you loose way more. been >>> there, seen that. >>> >>> With zfs though you >>>> will probably still have the data on the sending host, however long it >>>> takes to bring it back up. RSF-1 runs in the zfs stack and send the >>>> writes to the second system. It's kind of pricey, but actually much less >>>> expensive than commercial alternatives. >>>> >>>> Anytime you run anything sync it adds latency but makes things safer.. >>> >>> not surprising, it all depends on the usecase >>> >>>> There is also a cool tool I like, called zerto for vmware that sits in >>>> the hypervisor and sends a sync copy of a write locally and then an >>>> async remotely. It's pretty cool. Although I haven't run it myself, have >>>> a bunch of customers running it. I believe it works with proxmox too. >>>> >>>> Most people I run into (these days) don't mind losing 5 or even 30 >>>> minutes of data. Small shops. >>> >>> you talk about minutes, what delta size are we talking here about? why >>> not using zrep in a loop for example >>> >>> They usually have a copy somewhere else. >>>> Or the cost of 5-30 minutes isn't that great. I used work as a >>>> datacenter architect for sun/oracle with only fortune 500. There losing >>>> 1 sec could put large companies out of business. I worked with banks and >>>> exchanges. >>> >>> again, usecase. i bet 99% on this list are not operating fortune 500 >>> bank filers >>> >>> They couldn't ever lose a single transaction. Most people >>>> nowadays do the replication/availability in the application though and >>>> don't care about underlying hardware, especially disk. >>>> >>>> >>>> On 8/17/16 11:55 AM, Chris Watson wrote: >>>>> Of course, if you are willing to accept some amount of data loss that >>>>> opens up a lot more options. :) >>>>> >>>>> Some may find that acceptable though. Like turning off fsync with >>>>> PostgreSQL to get much higher throughput. As little no as you are >>> made >>>>> *very* aware of the risks. >>>>> >>>>> It's good to have input in this thread from one with more experience >>>>> with RSF-1 than the rest of us. You confirm what others have that >>> said >>>>> about RSF-1, that it's stable and works well. What were you deploying >>>>> it on? >>>>> >>>>> Chris >>>>> >>>>> Sent from my iPhone 5 >>>>> >>>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley >> >>>>> >> wrote: >>>>> >>>>>> The question I always ask, as an architect, is "can you lose 1 >>> minute >>>>>> worth of data?" If you can, then batched replication is perfect. If >>>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has >>>>>> worked extremely well. If i remember right, it works at the dmu. I >>>>>> would suggest try it. They have been trying to have a full freebsd >>>>>> solution, I have several customers running it well. >>>>>> >>>>>> linda >>>>>> >>>>>> >>>>>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen >>>>>>> Gotteswinter wrote: >>>>>>>> >>>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen >>>>>>>>> Gotteswinter wrote: >>>>>>>>>> >>>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>>>>> >> >> wrote: >>>>>>>>>>>> >>>>>>>>>>>> As I said in a previous post I tested the zfs send/receive >>>>>>>>>>>> approach (with >>>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in >>>>>>>>>>>> all what you >>>>>>>>>>>> said, especially about off-site replicate and synchronous >>>>>>>>>>>> replication. >>>>>>>>>>>> >>>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the >>>>>>>>>>>> moment, >>>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but >>>>>>>>>>>> ATM it >>>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>>>>> I must be too old school, but I don’t quite like the idea of >>>>>>>>>>> using an essentially unreliable transport >>>>>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>>>>> >>>>>>>>>>> In case something went wrong, that approach could risk >>>>>>>>>>> corrupting a pool. Although, frankly, >>>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA >>>>>>>>>>> problem that caused some >>>>>>>>>>> silent corruption. >>>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines >>>>>>>>>> hooked up >>>>>>>>>> to the same disk chassis. >>>>>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>>>>> >>>>>>>>> I'm still busy to test the whole setup here, including the >>>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can >>> prevent >>>>>>>>> that thanks to: >>>>>>>>> >>>>>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>>>>> and you can't import the pool (even with -f) for ex (filer2 >>> is the >>>>>>>>> BACKUP): >>>>>>>>> >>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>> >>>>>>>>> >>>>>>>>> - The shared pool should not be mounted at boot, and you should >>>>>>>>> ensure >>>>>>>>> that the failover script is not executed during boot time too: >>>>>>>>> this is >>>>>>>>> to handle the case wherein both machines turn off and/or >>> re-ignite at >>>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status >>>>>>>>> if both >>>>>>>>> machines are powered on at the same time, for ex: >>>>>>>>> >>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf >>> and >>>>>>>>> you will have a split-brain scenario >>>>>>>>> >>>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>>>>>> happen, this can be handled with a trigger file or something like >>>>>>>>> that >>>>>>>>> >>>>>>>>> - I've still have to check if the order is OK, but I think >>> that as >>>>>>>>> long >>>>>>>>> as you shutdown the replication interface and that you adapt the >>>>>>>>> advskew (including the config file) of the CARP interface >>> before the >>>>>>>>> zpool import -f in the failover script you can be relatively >>>>>>>>> confident >>>>>>>>> that nothing will be written on the iSCSI targets >>>>>>>>> >>>>>>>>> - A zpool scrub should be run at regular intervals >>>>>>>>> >>>>>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>>>>> >>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>> >>>>>>>>> >>>>>>>>> Julien >>>>>>>>> >>>>>>>> 100€ question without detailed looking at that script. yes from a >>>>>>>> first >>>>>>>> view its super simple, but: why are solutions like rsf-1 such more >>>>>>>> powerful / featurerich. Theres a reason for, which is that >>> they try to >>>>>>>> cover every possible situation (which makes more than sense >>> for this). >>>>>>> I've never used "rsf-1" so I can't say much more about it, but >>> I have >>>>>>> no doubts about it's ability to handle "complex situations", where >>>>>>> multiple nodes / networks are involved. >>>>>>> >>>>>>>> That script works for sure, within very limited cases imho >>>>>>>> >>>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen >>>>>>>>>> sooner >>>>>>>>>> or later especially when it comes to homegrown automatism >>> solutions. >>>>>>>>>> even the commercial parts where much more time/work goes >>> into such >>>>>>>>>> solutions fail in a regular manner >>>>>>>>>> >>>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that >>>>>>>>>>> you can consider it >>>>>>>>>>> essentially atomic. A transport corruption should not cause >>>>>>>>>>> trouble (apart from a failed >>>>>>>>>>> "zfs receive") and with snapshot retention you can even roll >>>>>>>>>>> back. You can’t roll back >>>>>>>>>>> zpool replications :) >>>>>>>>>>> >>>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as >>> your >>>>>>>>>>> zfs receive doesn’t involve a rollback >>>>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. >>>>>>>>>>> Just make sure that your replica datasets >>>>>>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Borja. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> freebsd-fs@freebsd.org >>> > >>> mailing list >>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> >>>>>>>>>>> To unsubscribe, send any mail to >>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>> >>>>>>>>>>> >> >" >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> freebsd-fs@freebsd.org >>> > >>> mailing list >>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> >>>>>>>>>> To unsubscribe, send any mail to >>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>> >>>>>>>>>> >> >" >>>>>> >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org >>> > >>> mailing list >>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> >>>>>> To unsubscribe, send any mail to >>> "freebsd-fs-unsubscribe@freebsd.org >>> >>>>>> >> >" >>>> >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> >>>> To unsubscribe, send any mail to >>> "freebsd-fs-unsubscribe@freebsd.org >>> " >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >>> " >>> >>> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Thu Aug 18 10:38:27 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DE733BBEE1F for ; Thu, 18 Aug 2016 10:38:27 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3DFA31F95 for ; Thu, 18 Aug 2016 10:38:27 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: by mail-wm0-x22d.google.com with SMTP id o80so25465911wme.1 for ; Thu, 18 Aug 2016 03:38:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=+3AjPVPLnOhyawDcIISBb8aMrf1BvLFfNZiEPdPdpP4=; b=CxbHxOaE7ZeV3bXOb3qex96WD7LPYUHbOiYEdw/hOmdCdTReviRLvHNv7v1yBg1EPg 72F+4KhaIhYhBWE84wxJc/gNiodwcEF54cK9558GQnUf4kRVzBH/WhhCz/QGtcmBlpEl +w0fAZjcTQnjqMQ5VUteKB+BExuEdq4Qt024qPZd33GAlNYPn08exo18rC4XzNYAv0N2 A+3rm0NQZLbgK1AkPpXxPnzHTFIpVwtjqeok59uQ9DXOkPXpPt1oKPqTRvblUwV9GpnB afSwE0/6m86Dy4Yhuy+7fYLn5kFTnP4NoIAL4/wJm/UgTeU9RL4FbLKpCnZQsGWzpgZe scuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=+3AjPVPLnOhyawDcIISBb8aMrf1BvLFfNZiEPdPdpP4=; b=BvNC4q9jAAy4AmXHZme0RPMnZD2PqszmnQuU95J6S9/c51948rCie6K+s4cMleg/bU 3GNRKVVq7y5kcDtpsEBMKRW0iqDpU2de7FV0ZgzNOJ3uOtgBu+TilHjLJJ5IHZBoW3kn K7pJd7F9/99D08Wz0PaLgJa2F25S7zc4eEkFbXJ0cduq+q2PFEq1vObzdIqlYYYIP5T2 sYT/KAZgsjwTHTj0jwJ6kox1NpD3FwjKFWhL+y+T4CGJWcMw5aRUTrPURIDKSBs6y3A0 urqS3TXTwBlm1EKVvHPq5uS2lWIL1DpfxTpjSXJqpu4YbIWQNacbD60jU8SfT52RTdAx P//w== X-Gm-Message-State: AEkoout2TAmVhEQ3/9qPQ6dF+4flAjMR0ffLr93Cyn1TbplpFgj3F6pNzJ3WVmNvetyGqWwdG379ZGVWUVEURw== X-Received: by 10.194.127.37 with SMTP id nd5mr1504881wjb.156.1471516705391; Thu, 18 Aug 2016 03:38:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.54.202 with HTTP; Thu, 18 Aug 2016 03:38:24 -0700 (PDT) In-Reply-To: <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com> References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com> <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com> From: krad Date: Thu, 18 Aug 2016 11:38:24 +0100 Message-ID: Subject: Re: HAST + ZFS + NFS + CARP To: InterNetX - Juergen Gotteswinter Cc: Ben RUBSON , FreeBSD FS Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 10:38:28 -0000 "new day, new things learned :)" job done for today then, it must be beer o clock? On 18 August 2016 at 09:02, InterNetX - Juergen Gotteswinter < juergen.gotteswinter@internetx.com> wrote: > new day, new things learned :) > > thanks! > > but like said, zrep does its on locking in zfs properties. so even this > is fine > > while true; do zrep sync all; done > > > see > > http://www.bolthole.com/solaris/zrep/ > > the properties look like this > > tank/vmail redundant_metadata all default > tank/vmail zrep:savecount 5 local > tank/vmail zrep:lock-time 20160620101703 local > tank/vmail zrep:master yes local > tank/vmail zrep:src-fs tank/vmail local > tank/vmail zrep:dest-host stor1 local > tank/vmail zrep:src-host stor2 local > tank/vmail zrep:dest-fs tank/vmail local > tank/vmail zrep:lock-pid 10887 local > > > it also takes care of the replication partner, the replicated datasets > are read only until you tell zrep "go go go, become master" > > Simple usage summary: > zrep (init|-i) ZFS/fs remotehost remoteZFSpool/fs > zrep (sync|-S) [-q seconds] ZFS/fs > zrep (sync|-S) [-q seconds] all > zrep (sync|-S) ZFS/fs@snapshot -- temporary retroactive sync > zrep (status|-s) [-v] [(-a|ZFS/fs)] > zrep refresh ZFS/fs -- pull version of sync > zrep (list|-l) [-Lv] > zrep (expire|-e) [-L] (ZFS/fs ...)|(all)|() > zrep (changeconfig|-C) [-f] ZFS/fs remotehost remoteZFSpool/fs > zrep (changeconfig|-C) [-f] [-d] ZFS/fs srchost srcZFSpool/fs > zrep failover [-L] ZFS/fs > zrep takeover [-L] ZFS/fs > > > zrep failover pool/ds -> master sets pool read only, connects to slave, > sets pool on slave rw > > should be easy to combine with carp/devd, but this is the land of vodoo > automagic again which i dont trust that much. > > > Am 18.08.2016 um 09:40 schrieb Ben RUBSON: > > Yep this is better : > > > > if mkdir > > then > > do_your_job > > rm -rf > > fi > > > > > > > >> On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter < > juergen.gotteswinter@internetx.com> wrote: > >> > >> uhm, dont really investigated if it is or not. add a "sync" after that= ? > >> or replace it? > >> > >> but anyway, thanks for the hint. will dig into this! > >> > >> Am 18.08.2016 um 09:36 schrieb krad: > >>> I didnt think touch was atomic, mkdir is though > >>> > >>> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter > >>> >>> > wrote: > >>> > >>> > >>> > >>> Am 17.08.2016 um 20:03 schrieb Linda Kateley: > >>>> I just do consulting so I don't always get to see the end of the > >>>> project. Although we are starting to do more ongoing support so we c= an > >>>> see the progress.. > >>>> > >>>> I have worked with some of the guys from high-availability.com < > http://high-availability.com> for maybe > >>>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does wo= rk > >>>> beautifully with omni/illumos. The one customer I have running it in > >>>> prod is an isp in south america running openstack and zfs on freebsd > as > >>>> iscsi. Big boxes, 90+ drives per frame. If someone would like try > it, i > >>>> have some contacts there. Ping me offlist. > >>> > >>> no offense, but it sounds a bit like marketing. > >>> > >>> here: running nexenta ha setup since several years with one > catastrophic > >>> failure due to split brain > >>> > >>>> > >>>> You do risk losing data if you batch zfs send. It is very hard to ru= n > >>>> that real time. > >>> > >>> depends on how much data changes aka delta size > >>> > >>> > >>> You have to take the snap then send the snap. Most > >>>> people run in cron, even if it's not in cron, you would want one to > >>>> finish before you started the next. > >>> > >>> thats the reason why lock files where invented, tools like zrep > handle > >>> that themself via additional zfs properties > >>> > >>> or, if one does not trust a single layer > >>> > >>> -- snip -- > >>> #!/bin/sh > >>> if [ ! -f /var/run/replic ] ; then > >>> touch /var/run/replic > >>> /blah/path/zrep sync all >> /var/log/zfsrepli.log > >>> rm -f /var/run/replic > >>> fi > >>> -- snip -- > >>> > >>> something like this, simple > >>> > >>> If you lose the sending host before > >>>> the receive is complete you won't have a full copy. > >>> > >>> if rsf fails, and you end up in split brain you loose way more. be= en > >>> there, seen that. > >>> > >>> With zfs though you > >>>> will probably still have the data on the sending host, however long = it > >>>> takes to bring it back up. RSF-1 runs in the zfs stack and send the > >>>> writes to the second system. It's kind of pricey, but actually much > less > >>>> expensive than commercial alternatives. > >>>> > >>>> Anytime you run anything sync it adds latency but makes things safer= .. > >>> > >>> not surprising, it all depends on the usecase > >>> > >>>> There is also a cool tool I like, called zerto for vmware that sits = in > >>>> the hypervisor and sends a sync copy of a write locally and then an > >>>> async remotely. It's pretty cool. Although I haven't run it myself, > have > >>>> a bunch of customers running it. I believe it works with proxmox too= . > >>>> > >>>> Most people I run into (these days) don't mind losing 5 or even 30 > >>>> minutes of data. Small shops. > >>> > >>> you talk about minutes, what delta size are we talking here about? > why > >>> not using zrep in a loop for example > >>> > >>> They usually have a copy somewhere else. > >>>> Or the cost of 5-30 minutes isn't that great. I used work as a > >>>> datacenter architect for sun/oracle with only fortune 500. There > losing > >>>> 1 sec could put large companies out of business. I worked with banks > and > >>>> exchanges. > >>> > >>> again, usecase. i bet 99% on this list are not operating fortune 5= 00 > >>> bank filers > >>> > >>> They couldn't ever lose a single transaction. Most people > >>>> nowadays do the replication/availability in the application though a= nd > >>>> don't care about underlying hardware, especially disk. > >>>> > >>>> > >>>> On 8/17/16 11:55 AM, Chris Watson wrote: > >>>>> Of course, if you are willing to accept some amount of data loss th= at > >>>>> opens up a lot more options. :) > >>>>> > >>>>> Some may find that acceptable though. Like turning off fsync with > >>>>> PostgreSQL to get much higher throughput. As little no as you are > >>> made > >>>>> *very* aware of the risks. > >>>>> > >>>>> It's good to have input in this thread from one with more experienc= e > >>>>> with RSF-1 than the rest of us. You confirm what others have that > >>> said > >>>>> about RSF-1, that it's stable and works well. What were you deployi= ng > >>>>> it on? > >>>>> > >>>>> Chris > >>>>> > >>>>> Sent from my iPhone 5 > >>>>> > >>>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley >>> > >>>>> >> wrote: > >>>>> > >>>>>> The question I always ask, as an architect, is "can you lose 1 > >>> minute > >>>>>> worth of data?" If you can, then batched replication is perfect. I= f > >>>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has > >>>>>> worked extremely well. If i remember right, it works at the dmu. I > >>>>>> would suggest try it. They have been trying to have a full freebsd > >>>>>> solution, I have several customers running it well. > >>>>>> > >>>>>> linda > >>>>>> > >>>>>> > >>>>>> On 8/17/16 4:52 AM, Julien Cigar wrote: > >>>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen > >>>>>>> Gotteswinter wrote: > >>>>>>>> > >>>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: > >>>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen > >>>>>>>>> Gotteswinter wrote: > >>>>>>>>>> > >>>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: > >>>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>>>>>> >>> >> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> As I said in a previous post I tested the zfs send/receive > >>>>>>>>>>>> approach (with > >>>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in > >>>>>>>>>>>> all what you > >>>>>>>>>>>> said, especially about off-site replicate and synchronous > >>>>>>>>>>>> replication. > >>>>>>>>>>>> > >>>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at th= e > >>>>>>>>>>>> moment, > >>>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, b= ut > >>>>>>>>>>>> ATM it > >>>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. > >>>>>>>>>>> I must be too old school, but I don=E2=80=99t quite like the = idea of > >>>>>>>>>>> using an essentially unreliable transport > >>>>>>>>>>> (Ethernet) for low-level filesystem operations. > >>>>>>>>>>> > >>>>>>>>>>> In case something went wrong, that approach could risk > >>>>>>>>>>> corrupting a pool. Although, frankly, > >>>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS H= BA > >>>>>>>>>>> problem that caused some > >>>>>>>>>>> silent corruption. > >>>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines > >>>>>>>>>> hooked up > >>>>>>>>>> to the same disk chassis. > >>>>>>>>> Yes this is the first thing on the list to avoid .. :) > >>>>>>>>> > >>>>>>>>> I'm still busy to test the whole setup here, including the > >>>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can > >>> prevent > >>>>>>>>> that thanks to: > >>>>>>>>> > >>>>>>>>> - As long as ctld is running on the BACKUP the disks are locked > >>>>>>>>> and you can't import the pool (even with -f) for ex (filer2 > >>> is the > >>>>>>>>> BACKUP): > >>>>>>>>> > >>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > >>> > >>>>>>>>> > >>>>>>>>> - The shared pool should not be mounted at boot, and you should > >>>>>>>>> ensure > >>>>>>>>> that the failover script is not executed during boot time too: > >>>>>>>>> this is > >>>>>>>>> to handle the case wherein both machines turn off and/or > >>> re-ignite at > >>>>>>>>> the same time. Indeed, the CARP interface can "flip" it's statu= s > >>>>>>>>> if both > >>>>>>>>> machines are powered on at the same time, for ex: > >>>>>>>>> > >>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf > >>> > and > >>>>>>>>> you will have a split-brain scenario > >>>>>>>>> > >>>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons > >>>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should no= t > >>>>>>>>> happen, this can be handled with a trigger file or something li= ke > >>>>>>>>> that > >>>>>>>>> > >>>>>>>>> - I've still have to check if the order is OK, but I think > >>> that as > >>>>>>>>> long > >>>>>>>>> as you shutdown the replication interface and that you adapt th= e > >>>>>>>>> advskew (including the config file) of the CARP interface > >>> before the > >>>>>>>>> zpool import -f in the failover script you can be relatively > >>>>>>>>> confident > >>>>>>>>> that nothing will be written on the iSCSI targets > >>>>>>>>> > >>>>>>>>> - A zpool scrub should be run at regular intervals > >>>>>>>>> > >>>>>>>>> This is my MASTER -> BACKUP CARP script ATM > >>>>>>>>> > >>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > >>> > >>>>>>>>> > >>>>>>>>> Julien > >>>>>>>>> > >>>>>>>> 100=E2=82=AC question without detailed looking at that script. y= es from a > >>>>>>>> first > >>>>>>>> view its super simple, but: why are solutions like rsf-1 such mo= re > >>>>>>>> powerful / featurerich. Theres a reason for, which is that > >>> they try to > >>>>>>>> cover every possible situation (which makes more than sense > >>> for this). > >>>>>>> I've never used "rsf-1" so I can't say much more about it, but > >>> I have > >>>>>>> no doubts about it's ability to handle "complex situations", wher= e > >>>>>>> multiple nodes / networks are involved. > >>>>>>> > >>>>>>>> That script works for sure, within very limited cases imho > >>>>>>>> > >>>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happe= n > >>>>>>>>>> sooner > >>>>>>>>>> or later especially when it comes to homegrown automatism > >>> solutions. > >>>>>>>>>> even the commercial parts where much more time/work goes > >>> into such > >>>>>>>>>> solutions fail in a regular manner > >>>>>>>>>> > >>>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, th= at > >>>>>>>>>>> you can consider it > >>>>>>>>>>> essentially atomic. A transport corruption should not cause > >>>>>>>>>>> trouble (apart from a failed > >>>>>>>>>>> "zfs receive") and with snapshot retention you can even roll > >>>>>>>>>>> back. You can=E2=80=99t roll back > >>>>>>>>>>> zpool replications :) > >>>>>>>>>>> > >>>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as > >>> your > >>>>>>>>>>> zfs receive doesn=E2=80=99t involve a rollback > >>>>>>>>>>> to the latest snapshot, it won=E2=80=99t destroy anything by = mistake. > >>>>>>>>>>> Just make sure that your replica datasets > >>>>>>>>>>> aren=E2=80=99t mounted and zfs receive won=E2=80=99t complain= . > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Borja. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> freebsd-fs@freebsd.org > >>> > > >>> mailing list > >>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > >>>>>>>>>>> To unsubscribe, send any mail to > >>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org > >>> > >>>>>>>>>>> >>> >" > >>>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> freebsd-fs@freebsd.org > >>> > > >>> mailing list > >>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > >>>>>>>>>> To unsubscribe, send any mail to > >>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org > >>> > >>>>>>>>>> >>> >" > >>>>>> > >>>>>> _______________________________________________ > >>>>>> freebsd-fs@freebsd.org > >>> > > >>> mailing list > >>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > >>>>>> To unsubscribe, send any mail to > >>> "freebsd-fs-unsubscribe@freebsd.org > >>> > >>>>>> >>> >" > >>>> > >>>> _______________________________________________ > >>>> freebsd-fs@freebsd.org mailing list > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > >>>> To unsubscribe, send any mail to > >>> "freebsd-fs-unsubscribe@freebsd.org > >>> " > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing lis= t > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@ > freebsd.org > >>> " > >>> > >>> > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Thu Aug 18 10:50:16 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97AC9BBE52A for ; Thu, 18 Aug 2016 10:50:16 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4EEAC1A8C for ; Thu, 18 Aug 2016 10:50:15 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11]) by proxypop01.sare.net (Postfix) with ESMTPSA id 2AC549DD3B3; Thu, 18 Aug 2016 12:50:07 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Borja Marcos In-Reply-To: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> Date: Thu, 18 Aug 2016 12:50:06 +0200 Cc: Chris Watson , freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <354253C2-E42E-4B9C-9931-9135A5A7DFD9@sarenet.es> References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> To: linda@kateley.com X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 10:50:16 -0000 > On 17 Aug 2016, at 20:03, Linda Kateley wrote: >=20 > You do risk losing data if you batch zfs send. It is very hard to run = that real time. You have to take the snap then send the snap. Most = people run in cron, even if it's not in cron, you would want one to = finish before you started the next. If you lose the sending host before = the receive is complete you won't have a full copy. With zfs though you = will probably still have the data on the sending host, however long it = takes to bring it back up. RSF-1 runs in the zfs stack and send the = writes to the second system. It's kind of pricey, but actually much less = expensive than commercial alternatives. Doing somewhat critical stuff off cron is not usually a good idea. I do = ZFS replication with a custom program which makes sure of some important = stuff: - Using holds to avoid an accidental snapshot deletion to require a full = send/receive.=20 - Avoiding starting a new send/receive on a dataset in case the previous = one didn=E2=80=99t finish for whatever reason (the main problem with = cron) - Offering the possibility of some random variation on the replication = period so that, in case several happen to start simultaneously, you = don=E2=80=99t have a periodically overloaded system. - Avoiding mounting the replicas so that the receive won=E2=80=99t need = a rollback, which would be potentially risky. - Supports one-to-many replicas, with different periodicity for each = destination if required. I am sorry I can=E2=80=99t share it (company property) but the program = is rather silly anyway. The important work was the decision to have the = previous features, and a design decision to avoid destructive and portentially = error-prone operations such as rollbacks.=20 Most applications that require real time replication are databases, and = they usually include a clustering option which can be much simpler to = manage (and more robust in this case) than filesystem replication. For other cases, often you can design around the loss of a small amount = of data. I understand that in some cases you have no other option, but the benefits of asynchronous send/receive are so many, especially if = you are on a tight budget, it=E2=80=99s well worth to try to make the = most of it. Borja. From owner-freebsd-fs@freebsd.org Thu Aug 18 11:17:40 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7740BBE3D3 for ; Thu, 18 Aug 2016 11:17:40 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (mail.in-addr.com [IPv6:2a01:4f8:191:61e8::2525:2525]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 56C8A1827 for ; Thu, 18 Aug 2016 11:17:40 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by mail.in-addr.com with local (Exim 4.87 (FreeBSD)) (envelope-from ) id 1baLKL-0008Ap-Oq; Thu, 18 Aug 2016 12:17:37 +0100 Date: Thu, 18 Aug 2016 12:17:37 +0100 From: Gary Palmer To: Ben RUBSON Cc: FreeBSD FS Subject: Re: HAST + ZFS + NFS + CARP Message-ID: <20160818111737.GB47566@in-addr.com> References: <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 11:17:40 -0000 Isn't this exactly what the lockf command was designed to do for you? I'd also suggest rmdir rather than rm -rf On Thu, Aug 18, 2016 at 09:40:50AM +0200, Ben RUBSON wrote: > Yep this is better : > > if mkdir > then > do_your_job > rm -rf > fi > > > > > On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter wrote: > > > > uhm, dont really investigated if it is or not. add a "sync" after that? > > or replace it? > > > > but anyway, thanks for the hint. will dig into this! > > > > Am 18.08.2016 um 09:36 schrieb krad: > >> I didnt think touch was atomic, mkdir is though > >> > >> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter > >> >> > wrote: > >> > >> > >> > >> Am 17.08.2016 um 20:03 schrieb Linda Kateley: > >>> I just do consulting so I don't always get to see the end of the > >>> project. Although we are starting to do more ongoing support so we can > >>> see the progress.. > >>> > >>> I have worked with some of the guys from high-availability.com for maybe > >>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work > >>> beautifully with omni/illumos. The one customer I have running it in > >>> prod is an isp in south america running openstack and zfs on freebsd as > >>> iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i > >>> have some contacts there. Ping me offlist. > >> > >> no offense, but it sounds a bit like marketing. > >> > >> here: running nexenta ha setup since several years with one catastrophic > >> failure due to split brain > >> > >>> > >>> You do risk losing data if you batch zfs send. It is very hard to run > >>> that real time. > >> > >> depends on how much data changes aka delta size > >> > >> > >> You have to take the snap then send the snap. Most > >>> people run in cron, even if it's not in cron, you would want one to > >>> finish before you started the next. > >> > >> thats the reason why lock files where invented, tools like zrep handle > >> that themself via additional zfs properties > >> > >> or, if one does not trust a single layer > >> > >> -- snip -- > >> #!/bin/sh > >> if [ ! -f /var/run/replic ] ; then > >> touch /var/run/replic > >> /blah/path/zrep sync all >> /var/log/zfsrepli.log > >> rm -f /var/run/replic > >> fi > >> -- snip -- > >> > >> something like this, simple > >> > >> If you lose the sending host before > >>> the receive is complete you won't have a full copy. > >> > >> if rsf fails, and you end up in split brain you loose way more. been > >> there, seen that. > >> > >> With zfs though you > >>> will probably still have the data on the sending host, however long it > >>> takes to bring it back up. RSF-1 runs in the zfs stack and send the > >>> writes to the second system. It's kind of pricey, but actually much less > >>> expensive than commercial alternatives. > >>> > >>> Anytime you run anything sync it adds latency but makes things safer.. > >> > >> not surprising, it all depends on the usecase > >> > >>> There is also a cool tool I like, called zerto for vmware that sits in > >>> the hypervisor and sends a sync copy of a write locally and then an > >>> async remotely. It's pretty cool. Although I haven't run it myself, have > >>> a bunch of customers running it. I believe it works with proxmox too. > >>> > >>> Most people I run into (these days) don't mind losing 5 or even 30 > >>> minutes of data. Small shops. > >> > >> you talk about minutes, what delta size are we talking here about? why > >> not using zrep in a loop for example > >> > >> They usually have a copy somewhere else. > >>> Or the cost of 5-30 minutes isn't that great. I used work as a > >>> datacenter architect for sun/oracle with only fortune 500. There losing > >>> 1 sec could put large companies out of business. I worked with banks and > >>> exchanges. > >> > >> again, usecase. i bet 99% on this list are not operating fortune 500 > >> bank filers > >> > >> They couldn't ever lose a single transaction. Most people > >>> nowadays do the replication/availability in the application though and > >>> don't care about underlying hardware, especially disk. > >>> > >>> > >>> On 8/17/16 11:55 AM, Chris Watson wrote: > >>>> Of course, if you are willing to accept some amount of data loss that > >>>> opens up a lot more options. :) > >>>> > >>>> Some may find that acceptable though. Like turning off fsync with > >>>> PostgreSQL to get much higher throughput. As little no as you are > >> made > >>>> *very* aware of the risks. > >>>> > >>>> It's good to have input in this thread from one with more experience > >>>> with RSF-1 than the rest of us. You confirm what others have that > >> said > >>>> about RSF-1, that it's stable and works well. What were you deploying > >>>> it on? > >>>> > >>>> Chris > >>>> > >>>> Sent from my iPhone 5 > >>>> > >>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley >> > >>>> >> wrote: > >>>> > >>>>> The question I always ask, as an architect, is "can you lose 1 > >> minute > >>>>> worth of data?" If you can, then batched replication is perfect. If > >>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has > >>>>> worked extremely well. If i remember right, it works at the dmu. I > >>>>> would suggest try it. They have been trying to have a full freebsd > >>>>> solution, I have several customers running it well. > >>>>> > >>>>> linda > >>>>> > >>>>> > >>>>> On 8/17/16 4:52 AM, Julien Cigar wrote: > >>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen > >>>>>> Gotteswinter wrote: > >>>>>>> > >>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: > >>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen > >>>>>>>> Gotteswinter wrote: > >>>>>>>>> > >>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: > >>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>>>>> >> >> wrote: > >>>>>>>>>>> > >>>>>>>>>>> As I said in a previous post I tested the zfs send/receive > >>>>>>>>>>> approach (with > >>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in > >>>>>>>>>>> all what you > >>>>>>>>>>> said, especially about off-site replicate and synchronous > >>>>>>>>>>> replication. > >>>>>>>>>>> > >>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the > >>>>>>>>>>> moment, > >>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but > >>>>>>>>>>> ATM it > >>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. > >>>>>>>>>> I must be too old school, but I don???t quite like the idea of > >>>>>>>>>> using an essentially unreliable transport > >>>>>>>>>> (Ethernet) for low-level filesystem operations. > >>>>>>>>>> > >>>>>>>>>> In case something went wrong, that approach could risk > >>>>>>>>>> corrupting a pool. Although, frankly, > >>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA > >>>>>>>>>> problem that caused some > >>>>>>>>>> silent corruption. > >>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines > >>>>>>>>> hooked up > >>>>>>>>> to the same disk chassis. > >>>>>>>> Yes this is the first thing on the list to avoid .. :) > >>>>>>>> > >>>>>>>> I'm still busy to test the whole setup here, including the > >>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can > >> prevent > >>>>>>>> that thanks to: > >>>>>>>> > >>>>>>>> - As long as ctld is running on the BACKUP the disks are locked > >>>>>>>> and you can't import the pool (even with -f) for ex (filer2 > >> is the > >>>>>>>> BACKUP): > >>>>>>>> > >> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f > >> > >>>>>>>> > >>>>>>>> - The shared pool should not be mounted at boot, and you should > >>>>>>>> ensure > >>>>>>>> that the failover script is not executed during boot time too: > >>>>>>>> this is > >>>>>>>> to handle the case wherein both machines turn off and/or > >> re-ignite at > >>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status > >>>>>>>> if both > >>>>>>>> machines are powered on at the same time, for ex: > >>>>>>>> > >> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf > >> and > >>>>>>>> you will have a split-brain scenario > >>>>>>>> > >>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons > >>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not > >>>>>>>> happen, this can be handled with a trigger file or something like > >>>>>>>> that > >>>>>>>> > >>>>>>>> - I've still have to check if the order is OK, but I think > >> that as > >>>>>>>> long > >>>>>>>> as you shutdown the replication interface and that you adapt the > >>>>>>>> advskew (including the config file) of the CARP interface > >> before the > >>>>>>>> zpool import -f in the failover script you can be relatively > >>>>>>>> confident > >>>>>>>> that nothing will be written on the iSCSI targets > >>>>>>>> > >>>>>>>> - A zpool scrub should be run at regular intervals > >>>>>>>> > >>>>>>>> This is my MASTER -> BACKUP CARP script ATM > >>>>>>>> > >> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 > >> > >>>>>>>> > >>>>>>>> Julien > >>>>>>>> > >>>>>>> 100??? question without detailed looking at that script. yes from a > >>>>>>> first > >>>>>>> view its super simple, but: why are solutions like rsf-1 such more > >>>>>>> powerful / featurerich. Theres a reason for, which is that > >> they try to > >>>>>>> cover every possible situation (which makes more than sense > >> for this). > >>>>>> I've never used "rsf-1" so I can't say much more about it, but > >> I have > >>>>>> no doubts about it's ability to handle "complex situations", where > >>>>>> multiple nodes / networks are involved. > >>>>>> > >>>>>>> That script works for sure, within very limited cases imho > >>>>>>> > >>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen > >>>>>>>>> sooner > >>>>>>>>> or later especially when it comes to homegrown automatism > >> solutions. > >>>>>>>>> even the commercial parts where much more time/work goes > >> into such > >>>>>>>>> solutions fail in a regular manner > >>>>>>>>> > >>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that > >>>>>>>>>> you can consider it > >>>>>>>>>> essentially atomic. A transport corruption should not cause > >>>>>>>>>> trouble (apart from a failed > >>>>>>>>>> "zfs receive") and with snapshot retention you can even roll > >>>>>>>>>> back. You can???t roll back > >>>>>>>>>> zpool replications :) > >>>>>>>>>> > >>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as > >> your > >>>>>>>>>> zfs receive doesn???t involve a rollback > >>>>>>>>>> to the latest snapshot, it won???t destroy anything by mistake. > >>>>>>>>>> Just make sure that your replica datasets > >>>>>>>>>> aren???t mounted and zfs receive won???t complain. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Borja. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> freebsd-fs@freebsd.org > >> > > >> mailing list > >>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> > >>>>>>>>>> To unsubscribe, send any mail to > >>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org > >> > >>>>>>>>>> >> >" > >>>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> freebsd-fs@freebsd.org > >> > > >> mailing list > >>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> > >>>>>>>>> To unsubscribe, send any mail to > >>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org > >> > >>>>>>>>> >> >" > >>>>> > >>>>> _______________________________________________ > >>>>> freebsd-fs@freebsd.org > >> > > >> mailing list > >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> > >>>>> To unsubscribe, send any mail to > >> "freebsd-fs-unsubscribe@freebsd.org > >> > >>>>> >> >" > >>> > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> > >>> To unsubscribe, send any mail to > >> "freebsd-fs-unsubscribe@freebsd.org > >> " > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > >> " > >> > >> > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Thu Aug 18 11:32:19 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C37B8BBEBEF for ; Thu, 18 Aug 2016 11:32:19 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from elf.hq.norma.perm.ru (mail.norma.perm.ru [IPv6:2a00:7540:1::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.norma.perm.ru", Issuer "Vivat-Trade UNIX Root CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4755E1911 for ; Thu, 18 Aug 2016 11:32:19 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from bsdrookie.norma.com. ([IPv6:fd00::7fe]) by elf.hq.norma.perm.ru (8.15.2/8.15.2) with ESMTPS id u7IBWCfo036601 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Thu, 18 Aug 2016 16:32:13 +0500 (YEKT) (envelope-from emz@norma.perm.ru) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=norma.perm.ru; s=key; t=1471519935; bh=fOVv6ickd2L7fZ97nb2B+DOqhmHXCfgrsgHnSF/+PFY=; h=To:From:Subject:Date; b=izbVrb22gjaWMIlfxFkItM/8wIHaW60AhvhawaX+qAGFbAiCp4HyfnlhAtVcAqWwS XMMtEpTCTblz41nBw3nzbpYhnhu0pYYKgney5PT4Iiyznnz7z3kY9RyRHbed8gSc1u lg8Q/eqdUB7XfSTxpknyoEXU+5K7vvrhpn8Ygdog= To: FreeBSD FS From: "Eugene M. Zheganin" Subject: zpool list FREE vs zfs list AVAIL Message-ID: <57B59CBC.8000904@norma.perm.ru> Date: Thu, 18 Aug 2016 16:32:12 +0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 11:32:19 -0000 Hi. What is the difference between zpool list FREE for a pool and zfs list AVAIL ? Because they differ a lot, I'm looking at a server at the moment where the difference is like dozens of times: zfs list reports that 97 gigabytes is available, and the zpool list for the same pool says that 4.18 terabytes is free. From my point of view this should be the same number. Thanks. Eugene. From owner-freebsd-fs@freebsd.org Thu Aug 18 14:55:05 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 56FF4BBD746 for ; Thu, 18 Aug 2016 14:55:05 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 18AD211C4 for ; Thu, 18 Aug 2016 14:55:05 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1baOid-000CjB-P2; Thu, 18 Aug 2016 17:54:55 +0300 Date: Thu, 18 Aug 2016 17:54:55 +0300 From: Slawa Olhovchenkov To: Matthias Gamsjager , freebsd-fs@freebsd.org Subject: Re: ZFS ARC under memory pressure Message-ID: <20160818145455.GA48739@zxy.spb.ru> References: <20160816193416.GM8192@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 14:55:05 -0000 On Wed, Aug 17, 2016 at 09:18:20AM +0200, Matthias Gamsjager wrote: > On 16 August 2016 at 21:34, Slawa Olhovchenkov wrote: > > > I see issuses with ZFS ARC inder memory pressure. > > ZFS ARC size can be dramaticaly reduced, up to arc_min. > > > > As I see memory pressure event cause call arc_lowmem and set needfree: > > > > arc.c:arc_lowmem > > > > needfree = btoc(arc_c >> arc_shrink_shift); > > > > After this, arc_available_memory return negative vaules (PAGESIZE * > > (-needfree)) until needfree is zero. Independent how too much memory > > freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= > > arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every > > loop interation). > > > > arc_c droped to minimum value if arc_size fast enough droped. > > > > No control current to initial memory allocation. > > > > As result, I can see needless arc reclaim, from 10x to 100x times. > > > > Can some one check me and comment this? > > _______________________________________________ > > > > > What version are you on? stable/10, same code in stable/11/9 and current/12 -- Slawa Olhovchenkov From owner-freebsd-fs@freebsd.org Thu Aug 18 14:58:59 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ABBAFBBDA91; Thu, 18 Aug 2016 14:58:59 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from elf.hq.norma.perm.ru (mail.norma.perm.ru [IPv6:2a00:7540:1::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.norma.perm.ru", Issuer "Vivat-Trade UNIX Root CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 2EFB21350; Thu, 18 Aug 2016 14:58:58 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from bsdrookie.norma.com. ([IPv6:fd00::7fe]) by elf.hq.norma.perm.ru (8.15.2/8.15.2) with ESMTPS id u7IEwtox052480 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Thu, 18 Aug 2016 19:58:55 +0500 (YEKT) (envelope-from emz@norma.perm.ru) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=norma.perm.ru; s=key; t=1471532335; bh=hnc4Ar2j/187ahmTWpLNqP1CJA3hyHJ3tgqfhuFf7Qc=; h=To:Cc:From:Subject:Date; b=joczOecG2LeOvEj3UA9SR1Y2erZ7Flsl19FQJD032D5Wi/3aQx/MvqbIR5fBq/jGJ 68JRKTa97EP1xrQrOEgJv0ngPoRVJNwaw3Ml6mcGsEguzNYzz5svAkKYotQ6w2Qoxb Uwq68fm0lWYNFTPElG4b70VCB45VzYKSoaUjYbkg= To: FreeBSD FS Cc: freebsd-stable From: "Eugene M. Zheganin" Subject: cannot destroy '': dataset is busy vs iSCSI Message-ID: <57B5CD2F.2070204@norma.perm.ru> Date: Thu, 18 Aug 2016 19:58:55 +0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 14:58:59 -0000 Hi. I'm using zvol clones with iSCSI. Perdiodically I renew them and destroy the old ones, but sometimes the clone gets stuck and refuses to be destroyed: (I'm showing the full sequence so it's self explanatory who is who's parent) [root@san2:/etc]# zfs destroy esx/games-reference1@ver5_6 cannot destroy 'esx/games-reference1@ver5_6': snapshot has dependent clones use '-R' to destroy the following datasets: esx/games-reference1-ver5_6-worker111 [root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111 cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy The only entity that can hold the dataset open is ctld, so: [root@san2:/etc]# service ctld reload [root@san2:/etc]# grep esx/games-reference1-ver5_6-worker111 /etc/ctl.conf [root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111 cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy As you can see, the clone isn't mentioned in ctl.conf, but still refuses to be destroyed. Is there any way to destroy it without restarting ctld or rebooting the server ? iSCSI is vital for production, but clones sometimes holds lot of space. Thanks. Eugene. From owner-freebsd-fs@freebsd.org Thu Aug 18 17:04:42 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9188BBECD1 for ; Thu, 18 Aug 2016 17:04:42 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-it0-x233.google.com (mail-it0-x233.google.com [IPv6:2607:f8b0:4001:c0b::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B26031BE9 for ; Thu, 18 Aug 2016 17:04:42 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by mail-it0-x233.google.com with SMTP id e63so3685197ith.1 for ; Thu, 18 Aug 2016 10:04:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kateley-com.20150623.gappssmtp.com; s=20150623; h=reply-to:subject:references:to:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=sjD3C1UduSe7+TJ7RrNmA+eJRZ5T1psJbeO5vNzMLcA=; b=nZOO229YoHeVR91n+eFtvLPeEco0f59iG2Sntk3Pc+XNjAO9ffb8F6EGh3q8pUNDkk voWTCUyOO/uFkPI28UVB6w3cSERQoPnJgpMVevgSKvA3X5ALdDS5jJqLYf1bTOz+yEgp 97bbNs37mcwD6+2yEfKvdwkoGtkHRxYE1Ic2ibv4nvldo0GV9hce6RF7cFprbUt+dXnI jLyakjGaPeORj/tFw3cHKgrrNVuKS7Nwfr3VRVg55IhKBf8RP4BdRuSJuEfR6u9VAs6G 1/jbIhVH7f4Cj4kbwLp9WH4TWhDNZdpn/AEvQVUCJmUv5uGsc1fRhesJbr1SOGT9F8jX zWNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=sjD3C1UduSe7+TJ7RrNmA+eJRZ5T1psJbeO5vNzMLcA=; b=V3mmPs2rp7FuWlUeeU8VMiphpW7kFn7rfhwAl1v0jU0F/mIYKJdwaXjlWzNHZTFKtP NyqylxrwgoM+HQyqGeSLiuJurOiifM5FgAJTBI361XxlFXfItsCh931p3grmOwFPlejo CDx/DElBlIwW3kymlsbiK/pIy4lZMDtqmgAl/gIUHfpXnABnIqPVlNd0VrzocIYL01vE kMpy26WspHTpiBA3l/htyDczxrKI0QPPsfewNL5FbLEDQ8D8u5kXI3bmeaby1n1PDeT0 0F0tSony3JH2csb/1sVX3/7iVAm4mWhr0/0rDzBTPR0l8Bv1c+Oev7qYSMpbcRMQ7Sjm Z6Ig== X-Gm-Message-State: AEkooutyhAUlTYyZ5oiVQdos1Fbm3baE/wOZLlyN1Dx9CaGQ/tFCuF6mKqMPTdYe/HqfKg== X-Received: by 10.36.149.5 with SMTP id m5mr794636itd.20.1471539881144; Thu, 18 Aug 2016 10:04:41 -0700 (PDT) Received: from [192.168.0.19] (67-4-156-204.mpls.qwest.net. [67.4.156.204]) by smtp.googlemail.com with ESMTPSA id e6sm2291968ith.0.2016.08.18.10.04.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Aug 2016 10:04:40 -0700 (PDT) Reply-To: linda@kateley.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com> To: freebsd-fs@freebsd.org From: Linda Kateley Organization: Kateley Company Message-ID: Date: Thu, 18 Aug 2016 12:04:35 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <02F2828E-AB88-4F25-AB73-5EF041BAD36E@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 17:04:43 -0000 Lemme send this over them :) linda On 8/17/16 4:14 PM, Ben RUBSON wrote: >> On 17 Aug 2016, at 20:03, Linda Kateley wrote: >> >> RSF-1 runs in the zfs stack and send the writes to the second system. > Linda, do you have any link to a documentation about this RSF-1 operation mode ? > > According to what I red about RSF-1, storage is shared between nodes, and RSF-1 manages the failover, we do not have 2 different storages. > (so I don't really understand how writes are sent to the "second system") > > In addition, RSF-1 does not seem to help with long-distance replication to a different storage. > But I may be wrong ? > This is where ZFS send/receive helps. > Or even a nicer solution I proposed a few weeks ago : https://www.illumos.org/issues/7166 (but a lot of work to achieve). > > Ben > >> On 8/17/16 11:55 AM, Chris Watson wrote: >>> Of course, if you are willing to accept some amount of data loss that opens up a lot more options. :) >>> >>> Some may find that acceptable though. Like turning off fsync with PostgreSQL to get much higher throughput. As little no as you are made *very* aware of the risks. >>> >>> It's good to have input in this thread from one with more experience with RSF-1 than the rest of us. You confirm what others have that said about RSF-1, that it's stable and works well. What were you deploying it on? >>> >>> Chris >>> >>> Sent from my iPhone 5 >>> >>> On Aug 17, 2016, at 11:18 AM, Linda Kateley > wrote: >>> >>>> The question I always ask, as an architect, is "can you lose 1 minute worth of data?" If you can, then batched replication is perfect. If you can't.. then HA. Every place I have positioned it, rsf-1 has worked extremely well. If i remember right, it works at the dmu. I would suggest try it. They have been trying to have a full freebsd solution, I have several customers running it well. >>>> >>>> linda >>>> >>>> >>>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen Gotteswinter wrote: >>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen Gotteswinter wrote: >>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar > wrote: >>>>>>>>>> >>>>>>>>>> As I said in a previous post I tested the zfs send/receive approach (with >>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in all what you >>>>>>>>>> said, especially about off-site replicate and synchronous replication. >>>>>>>>>> >>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the moment, >>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but ATM it >>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>>> I must be too old school, but I don’t quite like the idea of using an essentially unreliable transport >>>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>>> >>>>>>>>> In case something went wrong, that approach could risk corrupting a pool. Although, frankly, >>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA problem that caused some >>>>>>>>> silent corruption. >>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines hooked up >>>>>>>> to the same disk chassis. >>>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>>> >>>>>>> I'm still busy to test the whole setup here, including the >>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>>>>>> that thanks to: >>>>>>> >>>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>>> and you can't import the pool (even with -f) for ex (filer2 is the >>>>>>> BACKUP): >>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>>>> >>>>>>> - The shared pool should not be mounted at boot, and you should ensure >>>>>>> that the failover script is not executed during boot time too: this is >>>>>>> to handle the case wherein both machines turn off and/or re-ignite at >>>>>>> the same time. Indeed, the CARP interface can "flip" it's status if both >>>>>>> machines are powered on at the same time, for ex: >>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>>>>>> you will have a split-brain scenario >>>>>>> >>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>>>> happen, this can be handled with a trigger file or something like that >>>>>>> >>>>>>> - I've still have to check if the order is OK, but I think that as long >>>>>>> as you shutdown the replication interface and that you adapt the >>>>>>> advskew (including the config file) of the CARP interface before the >>>>>>> zpool import -f in the failover script you can be relatively confident >>>>>>> that nothing will be written on the iSCSI targets >>>>>>> >>>>>>> - A zpool scrub should be run at regular intervals >>>>>>> >>>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>>>> >>>>>>> Julien >>>>>>> >>>>>> 100€ question without detailed looking at that script. yes from a first >>>>>> view its super simple, but: why are solutions like rsf-1 such more >>>>>> powerful / featurerich. Theres a reason for, which is that they try to >>>>>> cover every possible situation (which makes more than sense for this). >>>>> I've never used "rsf-1" so I can't say much more about it, but I have >>>>> no doubts about it's ability to handle "complex situations", where >>>>> multiple nodes / networks are involved. >>>>> >>>>>> That script works for sure, within very limited cases imho >>>>>> >>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen sooner >>>>>>>> or later especially when it comes to homegrown automatism solutions. >>>>>>>> even the commercial parts where much more time/work goes into such >>>>>>>> solutions fail in a regular manner >>>>>>>> >>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that you can consider it >>>>>>>>> essentially atomic. A transport corruption should not cause trouble (apart from a failed >>>>>>>>> "zfs receive") and with snapshot retention you can even roll back. You can’t roll back >>>>>>>>> zpool replications :) >>>>>>>>> >>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your zfs receive doesn’t involve a rollback >>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. Just make sure that your replica datasets >>>>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Borja. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org " >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org " >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org " >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Thu Aug 18 17:13:54 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F025BBE061 for ; Thu, 18 Aug 2016 17:13:54 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-it0-x234.google.com (mail-it0-x234.google.com [IPv6:2607:f8b0:4001:c0b::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 403BA12A1 for ; Thu, 18 Aug 2016 17:13:54 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by mail-it0-x234.google.com with SMTP id f6so3619838ith.0 for ; Thu, 18 Aug 2016 10:13:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kateley-com.20150623.gappssmtp.com; s=20150623; h=reply-to:subject:references:to:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=/cv7LFUo9H4S+yvxGHb7DVtSk+yH52epUXSSck8hF/Q=; b=r2pBk41tOYKMmFgk17EPLf2lH+Eu6KZNtLJakADi6iHfRI9EqZlAbnFVjKJ7jncEzU B+9WW0nlOSzTbZTthGUhx7xNC3LpCQr20KVd5pcLv9pGU+IyR3A/KGRnziCZ51Hm5Ny9 vMYbzbMfjnCvZbyf88NWdUlT/uVp1vEsNVkM7b1ZLByozP9gOYpyHoDNjpIkvDhfXMRn icuCXEmzn4b26hgdMKrUyvvhdr39TIzWQKz8PKGaejsAemPZabn/D2zSblDglILuvMhZ EaXQlt/VDUtt3hQEMmzD9s1hX/lkguaXX9MIw+sknQLJHLTxDvkTxlpnlCOQD43mhOJ7 KRRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=/cv7LFUo9H4S+yvxGHb7DVtSk+yH52epUXSSck8hF/Q=; b=NyylD+QJ5+rL2DHyB/5O2pHzO2x59cQq3i1DPXtpGwv2hRAOGYVYxCYYeU6Jm33Ou+ A/g/WE8wxiHQG0MY5p47p3OJjpwOD38WdJVXw/4DGi45Iuqc1IQdn4vzDrVSpduMS5aa AaJOD3jAdrNAXOHkQ6mI+xvb6ADm0x0XP63htXu6Cedyvn9Idsinra4iOiR5NXXVsFWc ejHaeRzL2n1+4SaPXpVXNXWcJ6LV/FfSV+TjlrixNBV3jAtv76cd413emlghM86Fjco9 Jvue5cUsaZIC81MRJetJataofHt3I3paf9Y3TGmC8LNWDS5kUC+qmXWaivdqI007Rpqu 1DVQ== X-Gm-Message-State: AEkoouvvXlt9cWRu8mz2IiFm7Q6L7E5/jX9rR41WSMY1eLOk4vZlSiV7v4WVYwqGghp7UA== X-Received: by 10.36.149.193 with SMTP id m184mr815974itd.94.1471540433123; Thu, 18 Aug 2016 10:13:53 -0700 (PDT) Received: from [192.168.0.19] (67-4-156-204.mpls.qwest.net. [67.4.156.204]) by smtp.googlemail.com with ESMTPSA id f126sm235063ith.7.2016.08.18.10.13.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Aug 2016 10:13:52 -0700 (PDT) Reply-To: linda@kateley.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> <409301a7-ce03-aaa3-c4dc-fa9f9ba66e01@internetx.com> <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com> To: freebsd-fs@freebsd.org From: Linda Kateley Organization: Kateley Company Message-ID: <7828fdbc-3a5b-3998-ac54-a896cf02927f@kateley.com> Date: Thu, 18 Aug 2016 12:13:51 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <69234c7d-cda9-2d56-b5e0-bb5e3961cc19@internetx.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 17:13:54 -0000 Cool, thanks linda On 8/18/16 3:02 AM, InterNetX - Juergen Gotteswinter wrote: > new day, new things learned :) > > thanks! > > but like said, zrep does its on locking in zfs properties. so even this > is fine > > while true; do zrep sync all; done > > > see > > http://www.bolthole.com/solaris/zrep/ > > the properties look like this > > tank/vmail redundant_metadata all default > tank/vmail zrep:savecount 5 local > tank/vmail zrep:lock-time 20160620101703 local > tank/vmail zrep:master yes local > tank/vmail zrep:src-fs tank/vmail local > tank/vmail zrep:dest-host stor1 local > tank/vmail zrep:src-host stor2 local > tank/vmail zrep:dest-fs tank/vmail local > tank/vmail zrep:lock-pid 10887 local > > > it also takes care of the replication partner, the replicated datasets > are read only until you tell zrep "go go go, become master" > > Simple usage summary: > zrep (init|-i) ZFS/fs remotehost remoteZFSpool/fs > zrep (sync|-S) [-q seconds] ZFS/fs > zrep (sync|-S) [-q seconds] all > zrep (sync|-S) ZFS/fs@snapshot -- temporary retroactive sync > zrep (status|-s) [-v] [(-a|ZFS/fs)] > zrep refresh ZFS/fs -- pull version of sync > zrep (list|-l) [-Lv] > zrep (expire|-e) [-L] (ZFS/fs ...)|(all)|() > zrep (changeconfig|-C) [-f] ZFS/fs remotehost remoteZFSpool/fs > zrep (changeconfig|-C) [-f] [-d] ZFS/fs srchost srcZFSpool/fs > zrep failover [-L] ZFS/fs > zrep takeover [-L] ZFS/fs > > > zrep failover pool/ds -> master sets pool read only, connects to slave, > sets pool on slave rw > > should be easy to combine with carp/devd, but this is the land of vodoo > automagic again which i dont trust that much. > > > Am 18.08.2016 um 09:40 schrieb Ben RUBSON: >> Yep this is better : >> >> if mkdir >> then >> do_your_job >> rm -rf >> fi >> >> >> >>> On 18 Aug 2016, at 09:38, InterNetX - Juergen Gotteswinter wrote: >>> >>> uhm, dont really investigated if it is or not. add a "sync" after that? >>> or replace it? >>> >>> but anyway, thanks for the hint. will dig into this! >>> >>> Am 18.08.2016 um 09:36 schrieb krad: >>>> I didnt think touch was atomic, mkdir is though >>>> >>>> On 18 August 2016 at 08:32, InterNetX - Juergen Gotteswinter >>>> >>> > wrote: >>>> >>>> >>>> >>>> Am 17.08.2016 um 20:03 schrieb Linda Kateley: >>>>> I just do consulting so I don't always get to see the end of the >>>>> project. Although we are starting to do more ongoing support so we can >>>>> see the progress.. >>>>> >>>>> I have worked with some of the guys from high-availability.com for maybe >>>>> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work >>>>> beautifully with omni/illumos. The one customer I have running it in >>>>> prod is an isp in south america running openstack and zfs on freebsd as >>>>> iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i >>>>> have some contacts there. Ping me offlist. >>>> no offense, but it sounds a bit like marketing. >>>> >>>> here: running nexenta ha setup since several years with one catastrophic >>>> failure due to split brain >>>> >>>>> You do risk losing data if you batch zfs send. It is very hard to run >>>>> that real time. >>>> depends on how much data changes aka delta size >>>> >>>> >>>> You have to take the snap then send the snap. Most >>>>> people run in cron, even if it's not in cron, you would want one to >>>>> finish before you started the next. >>>> thats the reason why lock files where invented, tools like zrep handle >>>> that themself via additional zfs properties >>>> >>>> or, if one does not trust a single layer >>>> >>>> -- snip -- >>>> #!/bin/sh >>>> if [ ! -f /var/run/replic ] ; then >>>> touch /var/run/replic >>>> /blah/path/zrep sync all >> /var/log/zfsrepli.log >>>> rm -f /var/run/replic >>>> fi >>>> -- snip -- >>>> >>>> something like this, simple >>>> >>>> If you lose the sending host before >>>>> the receive is complete you won't have a full copy. >>>> if rsf fails, and you end up in split brain you loose way more. been >>>> there, seen that. >>>> >>>> With zfs though you >>>>> will probably still have the data on the sending host, however long it >>>>> takes to bring it back up. RSF-1 runs in the zfs stack and send the >>>>> writes to the second system. It's kind of pricey, but actually much less >>>>> expensive than commercial alternatives. >>>>> >>>>> Anytime you run anything sync it adds latency but makes things safer.. >>>> not surprising, it all depends on the usecase >>>> >>>>> There is also a cool tool I like, called zerto for vmware that sits in >>>>> the hypervisor and sends a sync copy of a write locally and then an >>>>> async remotely. It's pretty cool. Although I haven't run it myself, have >>>>> a bunch of customers running it. I believe it works with proxmox too. >>>>> >>>>> Most people I run into (these days) don't mind losing 5 or even 30 >>>>> minutes of data. Small shops. >>>> you talk about minutes, what delta size are we talking here about? why >>>> not using zrep in a loop for example >>>> >>>> They usually have a copy somewhere else. >>>>> Or the cost of 5-30 minutes isn't that great. I used work as a >>>>> datacenter architect for sun/oracle with only fortune 500. There losing >>>>> 1 sec could put large companies out of business. I worked with banks and >>>>> exchanges. >>>> again, usecase. i bet 99% on this list are not operating fortune 500 >>>> bank filers >>>> >>>> They couldn't ever lose a single transaction. Most people >>>>> nowadays do the replication/availability in the application though and >>>>> don't care about underlying hardware, especially disk. >>>>> >>>>> >>>>> On 8/17/16 11:55 AM, Chris Watson wrote: >>>>>> Of course, if you are willing to accept some amount of data loss that >>>>>> opens up a lot more options. :) >>>>>> >>>>>> Some may find that acceptable though. Like turning off fsync with >>>>>> PostgreSQL to get much higher throughput. As little no as you are >>>> made >>>>>> *very* aware of the risks. >>>>>> >>>>>> It's good to have input in this thread from one with more experience >>>>>> with RSF-1 than the rest of us. You confirm what others have that >>>> said >>>>>> about RSF-1, that it's stable and works well. What were you deploying >>>>>> it on? >>>>>> >>>>>> Chris >>>>>> >>>>>> Sent from my iPhone 5 >>>>>> >>>>>> On Aug 17, 2016, at 11:18 AM, Linda Kateley >>> >>>>>> >> wrote: >>>>>> >>>>>>> The question I always ask, as an architect, is "can you lose 1 >>>> minute >>>>>>> worth of data?" If you can, then batched replication is perfect. If >>>>>>> you can't.. then HA. Every place I have positioned it, rsf-1 has >>>>>>> worked extremely well. If i remember right, it works at the dmu. I >>>>>>> would suggest try it. They have been trying to have a full freebsd >>>>>>> solution, I have several customers running it well. >>>>>>> >>>>>>> linda >>>>>>> >>>>>>> >>>>>>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>>>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen >>>>>>>> Gotteswinter wrote: >>>>>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen >>>>>>>>>> Gotteswinter wrote: >>>>>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>>>>>> >>> >> wrote: >>>>>>>>>>>>> As I said in a previous post I tested the zfs send/receive >>>>>>>>>>>>> approach (with >>>>>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in >>>>>>>>>>>>> all what you >>>>>>>>>>>>> said, especially about off-site replicate and synchronous >>>>>>>>>>>>> replication. >>>>>>>>>>>>> >>>>>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the >>>>>>>>>>>>> moment, >>>>>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but >>>>>>>>>>>>> ATM it >>>>>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>>>>>> I must be too old school, but I don’t quite like the idea of >>>>>>>>>>>> using an essentially unreliable transport >>>>>>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>>>>>> >>>>>>>>>>>> In case something went wrong, that approach could risk >>>>>>>>>>>> corrupting a pool. Although, frankly, >>>>>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA >>>>>>>>>>>> problem that caused some >>>>>>>>>>>> silent corruption. >>>>>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines >>>>>>>>>>> hooked up >>>>>>>>>>> to the same disk chassis. >>>>>>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>>>>>> >>>>>>>>>> I'm still busy to test the whole setup here, including the >>>>>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can >>>> prevent >>>>>>>>>> that thanks to: >>>>>>>>>> >>>>>>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>>>>>> and you can't import the pool (even with -f) for ex (filer2 >>>> is the >>>>>>>>>> BACKUP): >>>>>>>>>> >>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>> >>>>>>>>>> - The shared pool should not be mounted at boot, and you should >>>>>>>>>> ensure >>>>>>>>>> that the failover script is not executed during boot time too: >>>>>>>>>> this is >>>>>>>>>> to handle the case wherein both machines turn off and/or >>>> re-ignite at >>>>>>>>>> the same time. Indeed, the CARP interface can "flip" it's status >>>>>>>>>> if both >>>>>>>>>> machines are powered on at the same time, for ex: >>>>>>>>>> >>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf >>>> and >>>>>>>>>> you will have a split-brain scenario >>>>>>>>>> >>>>>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>>>>>>> happen, this can be handled with a trigger file or something like >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> - I've still have to check if the order is OK, but I think >>>> that as >>>>>>>>>> long >>>>>>>>>> as you shutdown the replication interface and that you adapt the >>>>>>>>>> advskew (including the config file) of the CARP interface >>>> before the >>>>>>>>>> zpool import -f in the failover script you can be relatively >>>>>>>>>> confident >>>>>>>>>> that nothing will be written on the iSCSI targets >>>>>>>>>> >>>>>>>>>> - A zpool scrub should be run at regular intervals >>>>>>>>>> >>>>>>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>>>>>> >>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>> >>>>>>>>>> Julien >>>>>>>>>> >>>>>>>>> 100€ question without detailed looking at that script. yes from a >>>>>>>>> first >>>>>>>>> view its super simple, but: why are solutions like rsf-1 such more >>>>>>>>> powerful / featurerich. Theres a reason for, which is that >>>> they try to >>>>>>>>> cover every possible situation (which makes more than sense >>>> for this). >>>>>>>> I've never used "rsf-1" so I can't say much more about it, but >>>> I have >>>>>>>> no doubts about it's ability to handle "complex situations", where >>>>>>>> multiple nodes / networks are involved. >>>>>>>> >>>>>>>>> That script works for sure, within very limited cases imho >>>>>>>>> >>>>>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen >>>>>>>>>>> sooner >>>>>>>>>>> or later especially when it comes to homegrown automatism >>>> solutions. >>>>>>>>>>> even the commercial parts where much more time/work goes >>>> into such >>>>>>>>>>> solutions fail in a regular manner >>>>>>>>>>> >>>>>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that >>>>>>>>>>>> you can consider it >>>>>>>>>>>> essentially atomic. A transport corruption should not cause >>>>>>>>>>>> trouble (apart from a failed >>>>>>>>>>>> "zfs receive") and with snapshot retention you can even roll >>>>>>>>>>>> back. You can’t roll back >>>>>>>>>>>> zpool replications :) >>>>>>>>>>>> >>>>>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as >>>> your >>>>>>>>>>>> zfs receive doesn’t involve a rollback >>>>>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. >>>>>>>>>>>> Just make sure that your replica datasets >>>>>>>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Borja. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> freebsd-fs@freebsd.org >>>> > >>>> mailing list >>>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> >>>>>>>>>>>> To unsubscribe, send any mail to >>>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>> >>>>>>>>>>>> >>> >" >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> freebsd-fs@freebsd.org >>>> > >>>> mailing list >>>>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> >>>>>>>>>>> To unsubscribe, send any mail to >>>>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>> >>>>>>>>>>> >>> >" >>>>>>> _______________________________________________ >>>>>>> freebsd-fs@freebsd.org >>>> > >>>> mailing list >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> >>>>>>> To unsubscribe, send any mail to >>>> "freebsd-fs-unsubscribe@freebsd.org >>>> >>>>>>> >>> >" >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> >>>>> To unsubscribe, send any mail to >>>> "freebsd-fs-unsubscribe@freebsd.org >>>> " >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >>>> " >>>> >>>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Thu Aug 18 17:19:58 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 250A3BBE2E0 for ; Thu, 18 Aug 2016 17:19:58 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D8972151E for ; Thu, 18 Aug 2016 17:19:57 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by mail-it0-x230.google.com with SMTP id n128so3718656ith.1 for ; Thu, 18 Aug 2016 10:19:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kateley-com.20150623.gappssmtp.com; s=20150623; h=reply-to:subject:references:to:cc:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=z9zBx+sDu9j56VaWeHA3k8btgArLPWm7s0u78FtsLTk=; b=wwOfOQ+Je+EBCGBiONjjN/n/vGJ9f23Mmby9Mj4xGq40zk8ZCpovHA64l8iuRPGtYF X1pzM/m2uPlQ9p0R/2Krbk7RzCFWUedW+PgSB5q7kCHITNAYuQn0mj9lhPWYF0/jXF/2 zVSyLGNU0Gk24ejgCNfXfuUJr4EGx8vIg1DuY8eR9Jiq6te4570f5Ct8N8sFinDQkIR1 DNuXAso0pLBqTQy/2L67nfj0RJqEDQccTbikuOiPZfAGaU2V1t0aEmeAPXz+X+PJSy+q kPfewxOgdqLOTVnsnh8RwDAVWB9kFFnYNW5n4SqWQYPp7fBrk8jWmhaXJOQ8+hcJ4IH3 LQGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:cc:from :organization:message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=z9zBx+sDu9j56VaWeHA3k8btgArLPWm7s0u78FtsLTk=; b=TV23DtAsB70BnEFcayaORt3AVzX2SlJJe/HYSVMB7v3IDO6ZbzGjOOLmzw3T1B25Wq G8b+lJUnZbD3jRlck9wxXcnAVtrzgsNJZJWaSCl8Ldn1dVGe3EYh1JBOEpf/h9kIdEWw OvmN7fEyGjcky5kXXEoU0SW/Hhb2EtiJ93f6GIcBAk4AwY8QrfGpTaGF/L6DkbIPHEv1 rwgFlE1voYwdgJuxc/v+jNxACbYj590aEUxax1AKRhVIILjmBO/DwzxD+KsE17XG1SkD x0ro9q58EvbRuVAm8yPWMx4YoO3y72bkqafdQc0/VXEoObw0b4+f2ld0lCtP9awJWCVn F60g== X-Gm-Message-State: AEkoouurVW2JMRY0LaE+rKnCICgc9cIoWSUnFvPtYQLHjboDFtIyWOXLot9gQCxkbT2lbQ== X-Received: by 10.36.33.197 with SMTP id e188mr869525ita.42.1471540797019; Thu, 18 Aug 2016 10:19:57 -0700 (PDT) Received: from [192.168.0.19] (67-4-156-204.mpls.qwest.net. [67.4.156.204]) by smtp.googlemail.com with ESMTPSA id z128sm1528555iof.4.2016.08.18.10.19.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Aug 2016 10:19:56 -0700 (PDT) Reply-To: linda@kateley.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> To: juergen.gotteswinter@internetx.com, linda@kateley.com, Chris Watson Cc: freebsd-fs@freebsd.org From: Linda Kateley Organization: Kateley Company Message-ID: <4c34cbf9-84b5-5d42-e0b4-bf18aa1ef9a7@kateley.com> Date: Thu, 18 Aug 2016 12:19:55 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 17:19:58 -0000 On 8/18/16 2:32 AM, InterNetX - Juergen Gotteswinter wrote: > > Am 17.08.2016 um 20:03 schrieb Linda Kateley: >> I just do consulting so I don't always get to see the end of the >> project. Although we are starting to do more ongoing support so we can >> see the progress.. >> >> I have worked with some of the guys from high-availability.com for maybe >> 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work >> beautifully with omni/illumos. The one customer I have running it in >> prod is an isp in south america running openstack and zfs on freebsd as >> iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i >> have some contacts there. Ping me offlist. > no offense, but it sounds a bit like marketing. > > here: running nexenta ha setup since several years with one catastrophic > failure due to split brain Just trying to say I don't see projects ongoing.. just at beginning > >> You do risk losing data if you batch zfs send. It is very hard to run >> that real time. > depends on how much data changes aka delta size > > > You have to take the snap then send the snap. Most >> people run in cron, even if it's not in cron, you would want one to >> finish before you started the next. > thats the reason why lock files where invented, tools like zrep handle > that themself via additional zfs properties > > or, if one does not trust a single layer > > -- snip -- > #!/bin/sh > if [ ! -f /var/run/replic ] ; then > touch /var/run/replic > /blah/path/zrep sync all >> /var/log/zfsrepli.log > rm -f /var/run/replic > fi > -- snip -- > > something like this, simple > > If you lose the sending host before >> the receive is complete you won't have a full copy. > if rsf fails, and you end up in split brain you loose way more. been > there, seen that. > > With zfs though you >> will probably still have the data on the sending host, however long it >> takes to bring it back up. RSF-1 runs in the zfs stack and send the >> writes to the second system. It's kind of pricey, but actually much less >> expensive than commercial alternatives. >> >> Anytime you run anything sync it adds latency but makes things safer.. > not surprising, it all depends on the usecase > >> There is also a cool tool I like, called zerto for vmware that sits in >> the hypervisor and sends a sync copy of a write locally and then an >> async remotely. It's pretty cool. Although I haven't run it myself, have >> a bunch of customers running it. I believe it works with proxmox too. >> >> Most people I run into (these days) don't mind losing 5 or even 30 >> minutes of data. Small shops. > you talk about minutes, what delta size are we talking here about? why > not using zrep in a loop for example > > They usually have a copy somewhere else. >> Or the cost of 5-30 minutes isn't that great. I used work as a >> datacenter architect for sun/oracle with only fortune 500. There losing >> 1 sec could put large companies out of business. I worked with banks and >> exchanges. > again, usecase. i bet 99% on this list are not operating fortune 500 > bank filers > > They couldn't ever lose a single transaction. Most people >> nowadays do the replication/availability in the application though and >> don't care about underlying hardware, especially disk. >> >> >> On 8/17/16 11:55 AM, Chris Watson wrote: >>> Of course, if you are willing to accept some amount of data loss that >>> opens up a lot more options. :) >>> >>> Some may find that acceptable though. Like turning off fsync with >>> PostgreSQL to get much higher throughput. As little no as you are made >>> *very* aware of the risks. >>> >>> It's good to have input in this thread from one with more experience >>> with RSF-1 than the rest of us. You confirm what others have that said >>> about RSF-1, that it's stable and works well. What were you deploying >>> it on? >>> >>> Chris >>> >>> Sent from my iPhone 5 >>> >>> On Aug 17, 2016, at 11:18 AM, Linda Kateley >> > wrote: >>> >>>> The question I always ask, as an architect, is "can you lose 1 minute >>>> worth of data?" If you can, then batched replication is perfect. If >>>> you can't.. then HA. Every place I have positioned it, rsf-1 has >>>> worked extremely well. If i remember right, it works at the dmu. I >>>> would suggest try it. They have been trying to have a full freebsd >>>> solution, I have several customers running it well. >>>> >>>> linda >>>> >>>> >>>> On 8/17/16 4:52 AM, Julien Cigar wrote: >>>>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen >>>>> Gotteswinter wrote: >>>>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen >>>>>>> Gotteswinter wrote: >>>>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>> As I said in a previous post I tested the zfs send/receive >>>>>>>>>> approach (with >>>>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in >>>>>>>>>> all what you >>>>>>>>>> said, especially about off-site replicate and synchronous >>>>>>>>>> replication. >>>>>>>>>> >>>>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the >>>>>>>>>> moment, >>>>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but >>>>>>>>>> ATM it >>>>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>>>> I must be too old school, but I don’t quite like the idea of >>>>>>>>> using an essentially unreliable transport >>>>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>>>> >>>>>>>>> In case something went wrong, that approach could risk >>>>>>>>> corrupting a pool. Although, frankly, >>>>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA >>>>>>>>> problem that caused some >>>>>>>>> silent corruption. >>>>>>>> try dual split import :D i mean, zpool -f import on 2 machines >>>>>>>> hooked up >>>>>>>> to the same disk chassis. >>>>>>> Yes this is the first thing on the list to avoid .. :) >>>>>>> >>>>>>> I'm still busy to test the whole setup here, including the >>>>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>>>>>> that thanks to: >>>>>>> >>>>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>>>> and you can't import the pool (even with -f) for ex (filer2 is the >>>>>>> BACKUP): >>>>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>>>> >>>>>>> - The shared pool should not be mounted at boot, and you should >>>>>>> ensure >>>>>>> that the failover script is not executed during boot time too: >>>>>>> this is >>>>>>> to handle the case wherein both machines turn off and/or re-ignite at >>>>>>> the same time. Indeed, the CARP interface can "flip" it's status >>>>>>> if both >>>>>>> machines are powered on at the same time, for ex: >>>>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>>>>>> you will have a split-brain scenario >>>>>>> >>>>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>>>> happen, this can be handled with a trigger file or something like >>>>>>> that >>>>>>> >>>>>>> - I've still have to check if the order is OK, but I think that as >>>>>>> long >>>>>>> as you shutdown the replication interface and that you adapt the >>>>>>> advskew (including the config file) of the CARP interface before the >>>>>>> zpool import -f in the failover script you can be relatively >>>>>>> confident >>>>>>> that nothing will be written on the iSCSI targets >>>>>>> >>>>>>> - A zpool scrub should be run at regular intervals >>>>>>> >>>>>>> This is my MASTER -> BACKUP CARP script ATM >>>>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>>>> >>>>>>> Julien >>>>>>> >>>>>> 100€ question without detailed looking at that script. yes from a >>>>>> first >>>>>> view its super simple, but: why are solutions like rsf-1 such more >>>>>> powerful / featurerich. Theres a reason for, which is that they try to >>>>>> cover every possible situation (which makes more than sense for this). >>>>> I've never used "rsf-1" so I can't say much more about it, but I have >>>>> no doubts about it's ability to handle "complex situations", where >>>>> multiple nodes / networks are involved. >>>>> >>>>>> That script works for sure, within very limited cases imho >>>>>> >>>>>>>> kaboom, really ugly kaboom. thats what is very likely to happen >>>>>>>> sooner >>>>>>>> or later especially when it comes to homegrown automatism solutions. >>>>>>>> even the commercial parts where much more time/work goes into such >>>>>>>> solutions fail in a regular manner >>>>>>>> >>>>>>>>> The advantage of ZFS send/receive of datasets is, however, that >>>>>>>>> you can consider it >>>>>>>>> essentially atomic. A transport corruption should not cause >>>>>>>>> trouble (apart from a failed >>>>>>>>> "zfs receive") and with snapshot retention you can even roll >>>>>>>>> back. You can’t roll back >>>>>>>>> zpool replications :) >>>>>>>>> >>>>>>>>> ZFS receive does a lot of sanity checks as well. As long as your >>>>>>>>> zfs receive doesn’t involve a rollback >>>>>>>>> to the latest snapshot, it won’t destroy anything by mistake. >>>>>>>>> Just make sure that your replica datasets >>>>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Borja. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>>> To unsubscribe, send any mail to >>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>>>>> " >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to >>>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>>>> " >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >>>> " >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Thu Aug 18 20:01:33 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1E90BBE785 for ; Thu, 18 Aug 2016 20:01:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id DD1D11845; Thu, 18 Aug 2016 20:01:32 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA16268; Thu, 18 Aug 2016 23:01:24 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1baTVE-000JoD-HQ; Thu, 18 Aug 2016 23:01:24 +0300 Subject: Re: ZFS ARC under memory pressure To: Slawa Olhovchenkov , freebsd-fs@FreeBSD.org, Alexander Motin References: <20160816193416.GM8192@zxy.spb.ru> From: Andriy Gapon Message-ID: <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> Date: Thu, 18 Aug 2016 23:00:28 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160816193416.GM8192@zxy.spb.ru> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 20:01:33 -0000 On 16/08/2016 22:34, Slawa Olhovchenkov wrote: > I see issuses with ZFS ARC inder memory pressure. > ZFS ARC size can be dramaticaly reduced, up to arc_min. > > As I see memory pressure event cause call arc_lowmem and set needfree: > > arc.c:arc_lowmem > > needfree = btoc(arc_c >> arc_shrink_shift); > > After this, arc_available_memory return negative vaules (PAGESIZE * > (-needfree)) until needfree is zero. Independent how too much memory > freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= > arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every > loop interation). > > arc_c droped to minimum value if arc_size fast enough droped. > > No control current to initial memory allocation. > > As result, I can see needless arc reclaim, from 10x to 100x times. > > Can some one check me and comment this? You might have found a real problem here, but I am short of time right now to properly analyze the issue. I think that on illumos 'needfree' is a variable that's managed by the virtual memory system and it is akin to our vm_pageout_deficit. But during the porting it became an artificial value and its handling might be sub-optimal. -- Andriy Gapon From owner-freebsd-fs@freebsd.org Thu Aug 18 20:27:00 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F5ECBBEE34 for ; Thu, 18 Aug 2016 20:27:00 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 214F21730; Thu, 18 Aug 2016 20:27:00 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1baTty-000JwN-0I; Thu, 18 Aug 2016 23:26:58 +0300 Date: Thu, 18 Aug 2016 23:26:57 +0300 From: Slawa Olhovchenkov To: Andriy Gapon Cc: freebsd-fs@FreeBSD.org, Alexander Motin Subject: Re: ZFS ARC under memory pressure Message-ID: <20160818202657.GS8192@zxy.spb.ru> References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 20:27:00 -0000 On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote: > On 16/08/2016 22:34, Slawa Olhovchenkov wrote: > > I see issuses with ZFS ARC inder memory pressure. > > ZFS ARC size can be dramaticaly reduced, up to arc_min. > > > > As I see memory pressure event cause call arc_lowmem and set needfree: > > > > arc.c:arc_lowmem > > > > needfree = btoc(arc_c >> arc_shrink_shift); > > > > After this, arc_available_memory return negative vaules (PAGESIZE * > > (-needfree)) until needfree is zero. Independent how too much memory > > freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= > > arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every > > loop interation). > > > > arc_c droped to minimum value if arc_size fast enough droped. > > > > No control current to initial memory allocation. > > > > As result, I can see needless arc reclaim, from 10x to 100x times. > > > > Can some one check me and comment this? > > You might have found a real problem here, but I am short of time right now to > properly analyze the issue. I think that on illumos 'needfree' is a variable > that's managed by the virtual memory system and it is akin to our > vm_pageout_deficit. But during the porting it became an artificial value and > its handling might be sub-optimal. As I see, totaly not optimal. I am create some patch for sub-optimal handling and now test it. From owner-freebsd-fs@freebsd.org Thu Aug 18 20:31:38 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BC001BBEEE8 for ; Thu, 18 Aug 2016 20:31:38 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8EEB019F9 for ; Thu, 18 Aug 2016 20:31:38 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 713AA20855C for ; Thu, 18 Aug 2016 15:31:30 -0500 (CDT) Subject: Re: ZFS ARC under memory pressure To: freebsd-fs@freebsd.org References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> From: Karl Denninger Message-ID: Date: Thu, 18 Aug 2016 15:31:26 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160818202657.GS8192@zxy.spb.ru> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms020806060003040608060704" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 20:31:38 -0000 This is a cryptographically signed message in MIME format. --------------ms020806060003040608060704 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/18/2016 15:26, Slawa Olhovchenkov wrote: > On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote: > >> On 16/08/2016 22:34, Slawa Olhovchenkov wrote: >>> I see issuses with ZFS ARC inder memory pressure. >>> ZFS ARC size can be dramaticaly reduced, up to arc_min. >>> >>> As I see memory pressure event cause call arc_lowmem and set needfree= : >>> >>> arc.c:arc_lowmem >>> >>> needfree =3D btoc(arc_c >> arc_shrink_shift); >>> >>> After this, arc_available_memory return negative vaules (PAGESIZE * >>> (-needfree)) until needfree is zero. Independent how too much memory >>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=3D >>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every= >>> loop interation). >>> >>> arc_c droped to minimum value if arc_size fast enough droped. >>> >>> No control current to initial memory allocation. >>> >>> As result, I can see needless arc reclaim, from 10x to 100x times. >>> >>> Can some one check me and comment this? >> You might have found a real problem here, but I am short of time right= now to >> properly analyze the issue. I think that on illumos 'needfree' is a v= ariable >> that's managed by the virtual memory system and it is akin to our >> vm_pageout_deficit. But during the porting it became an artificial va= lue and >> its handling might be sub-optimal. > As I see, totaly not optimal. > I am create some patch for sub-optimal handling and now test it. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" You might want to look at the code contained in here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594 There are some ugly interactions with the VM system you can run into if you're not careful; I've chased this issue before and while I haven't yet done the work to integrate it into 11.x (and the underlying code *has* changed since the 10.x patches I developed) if you wind up driving the VM system to evict pages to swap rather than pare back ARC you're probably making the wrong choice. In addition UMA can come into the picture too and (at least previously) was a severe contributor to pathological behavior. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms020806060003040608060704 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTgyMDMxMjZaME8GCSqGSIb3DQEJBDFCBEAv nKKhxLndyPUzBHmLmYJfFxZ6B48eRagCuHF7H9YjfaCFQq0ZxSIvaajnjktqNcWyqQbfWIvd fEewlEUa35snMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIALszJzrwg 0ll1oeKkSiMgZsap90O6BhYKSp2DflnUELzShod/ya7uo1yqQmavyfSHk+Wdm0soqRUfbdcs pfvTInNpmKIhShrURrLCFNIQyXc5x+gvrtrsXZbTleUYAIZqU38b6ipCshOIBcDXtzWKT7yn zojAu8wi01V3rcB9L/NdcPHNDtemEq0n5mbCGuukGNnKUzLKOfBIT5OaCKFJIq61Sh60uz+v TPKudH3l7+04fiS3WBSN8z3LP9J3MjRckUszvH+abFi5X4d9NKn1gfDXiGZ0ketn68x3fKu+ f2LMCXNIXbIGWL6iSNiavPR7h5Bx172LsN/PJMtcCU7rT0Ctd9I41lOCCayEtZ24AewBL3uU kMzv/24LMEqpgXqQ6N0Uw9qC0R4KGL4LijO06uop0ZBZcOUECTwEdj6UYPPRLkGHYdIOzqla zYD13MTpdou58qn5KANHq8VS5iJ4A2axET7Niddj4Eu4K72FLS2DjIrrH+1KXtEcPgUxKm8X 1yGhALgdgvIU8Pbs/+YccPmCPX5KUDBOkrW/wcHPytNYKBFWnLlM3cvmSOLnTSVjeNnbpyKQ jxCvGpV6UfQBNBmBvHw6mh/lpnmwt55Djs2xA8oEnZa37Vg86RpXuzy9IZI0s6Rfz6B7ODuO hvplmtx7ZtTHCMe+I4HjXQMVxnsAAAAAAAA= --------------ms020806060003040608060704-- From owner-freebsd-fs@freebsd.org Thu Aug 18 22:58:25 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F854BBEF15; Thu, 18 Aug 2016 22:58:25 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from land.berklix.org (land.berklix.org [144.76.10.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 34C6D1182; Thu, 18 Aug 2016 22:58:24 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from mart.js.berklix.net (p5083CC3A.dip0.t-ipconnect.de [80.131.204.58]) (authenticated bits=128) by land.berklix.org (8.15.2/8.15.2) with ESMTPA id u7IMwFPs076625; Thu, 18 Aug 2016 22:58:15 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id u7IMwAbf003005; Fri, 19 Aug 2016 00:58:10 +0200 (CEST) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.7/8.14.7) with ESMTP id u7IMvpT5090433; Fri, 19 Aug 2016 00:58:09 +0200 (CEST) (envelope-from jhs@berklix.com) Message-Id: <201608182258.u7IMvpT5090433@fire.js.berklix.net> To: "Jukka A. Ukkonen" cc: freebsd-advocacy@freebsd.org, freebsd-fs@freebsd.org Subject: Re: A how-to guide which you might wish to use for freebsd advocacy From: "Julian H. Stacey" Organization: http://berklix.eu BSD Unix Linux Consultants, Munich Germany User-agent: EXMH on FreeBSD http://berklix.eu/free/ X-URL: http://www.berklix.eu/~jhs/ In-reply-to: Your message "Thu, 18 Aug 2016 12:40:18 +0300." <71a9ed60-90c1-9df3-4da0-cafd23e48fc0@gmail.com> Date: Fri, 19 Aug 2016 00:57:51 +0200 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2016 22:58:25 -0000 "Jukka A. Ukkonen" wrote freebsd-advocacy@freebsd.org: > > https://www.facebook.com/notes/jukka-ukkonen/upgrading-the-storage-disk-to-finnsat-fh05-hdr-digital-tv-receiver-while-retaini/10208639116987804 > > Feel free to publish the link on the freebsd web site or otherwise > distribute it further. I added cc: freebsd-fs@freebsd.org as it's about file systems & Ext2 & offsets. (BTW I have no facebook login, so can assure readers Jukka's page is public, no fbook login needed to access URL) An extract re FS. ] FreeBSD will by default not accept the Finnsat generated partition ] for mounting. The trick is that Finnsat creates partitions with ] slack alignment. FreeBSD knows that a live ext2fs has to be a ] multiple of 4kB, 4096 bytes in size, i.e. 8 disk blocks, 8*512 ] bytes. If the partition size is not perfectly aligned, FreeBSD does ] not allow read-write mount to an ext2fs instance. With all likelihood ] it might be a broken file system. Why should FreeBSD help making ] things worse? ] ] So, you will have to adjust the partition size such that its length ] will be a multiple of 8 disk blocks. The tool for this is gpart ] (geom partition) which both modifies the partition tables and shows ] their contents. First use the command Thanks Jukka, I've bcc'd a friend who I discussed Humax TV recorders with a while back, on similar issues, some time when I'm visiting the town where my 3 Humax owner friends are, I hope to find time to experiment with my USB to SATA converter. This thread is archived here: http://lists.freebsd.org/pipermail/freebsd-advocacy/2016-August/004619.html & under here: http://lists.freebsd.org/pipermail/freebsd-fs/2016-August/date.html Cheers, Julian -- Julian Stacey, BSD Linux Unix Sys Eng Consultant Munich Reply below, Prefix '> '. Plain text, No .doc, base64, HTML, quoted-printable. http://berklix.eu/brexit/#stolen_votes From owner-freebsd-fs@freebsd.org Fri Aug 19 04:21:36 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8578ABBD414 for ; Fri, 19 Aug 2016 04:21:36 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua0-x230.google.com (mail-ua0-x230.google.com [IPv6:2607:f8b0:400c:c08::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E7F61576 for ; Fri, 19 Aug 2016 04:21:36 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua0-x230.google.com with SMTP id n59so61632969uan.2 for ; Thu, 18 Aug 2016 21:21:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=Kc4hqKEb/ysBW+KD38nOPRAs96e8ZeSESdrxoANTxf4=; b=WbK3RpfMunCBQ7j/vQSfmkgJ0+0NalqpvUquTJWpMfcDNjcAlM15rSCBEsFks8q5j8 P5+AhkVbWpFUT52WVrzB+OOEL2IiqnZwi7Tv9w4iLJpPpUAYyao7P3sto/NML7N6tR/I 1Fv3dwYDlehKaGXZMoKuQe8ZQ49Q6uibqtVHDsb3chmyAiIkCKLVtkYgh97aK2pF1WDs po4NFcMdEuO2XFqp7/I2dthtxdOJc2X/NBytQHgx+inplew9v8pyY1J4XzyrBTo/cjZs x1iehLwSisWv8bMoxtAZlIrAC4n+Tuv0T3vtSRjLTtIhGmgJfitM43yJglN/mzc7W3zX xZlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=Kc4hqKEb/ysBW+KD38nOPRAs96e8ZeSESdrxoANTxf4=; b=E6sFiXVK1ikg9ed1HJBJFcLfjrir3Hl+IPSeR9m9eyUhtHfcMa053Movsy+umPUIJA NG+ImoSpkYbDxMQImZQJxtGWv7Bmra88pizqGs7R1HY5iowmYz2JidaQ+4OXAuzJ7zWt ggFcEWb/M3FVRKCeEgK07E7v3iS2VOKZwq7RETu0CI0txjXd0YmfB4MKWH9G/fil6J19 ZfcNdPykdG4EWmTPlVVIzGqz8H6DsO1kEluBKeuragK+huQ9wecVn/R12EbbXPyp5RCj XA40fyzpYwhbpoqg8o2Sq4djhLBaCDuTWgRbF/rLuoU7SDNk1df1+NwMGlFdD5lVjBBz 9V2w== X-Gm-Message-State: AEkoout7+arknWKneuyC/S+T7gzHsMhZ+1fSxFQP6lb1g/KjFdVepLc4a5eKm/nleNjW6k6m3XnKmE0ofKWrMA== X-Received: by 10.31.183.193 with SMTP id h184mr3070211vkf.3.1471580495383; Thu, 18 Aug 2016 21:21:35 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.103.0.84 with HTTP; Thu, 18 Aug 2016 21:21:34 -0700 (PDT) X-Originating-IP: [69.53.245.200] In-Reply-To: <201608182258.u7IMvpT5090433@fire.js.berklix.net> References: <71a9ed60-90c1-9df3-4da0-cafd23e48fc0@gmail.com> <201608182258.u7IMvpT5090433@fire.js.berklix.net> From: Warner Losh Date: Thu, 18 Aug 2016 22:21:34 -0600 X-Google-Sender-Auth: Z17SRGj_0gN3_LHIkPgFFM1t0kI Message-ID: Subject: Re: A how-to guide which you might wish to use for freebsd advocacy To: "Julian H. Stacey" Cc: "Jukka A. Ukkonen" , freebsd-fs@freebsd.org, freebsd-advocacy@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 04:21:36 -0000 On Thu, Aug 18, 2016 at 4:57 PM, Julian H. Stacey wrote: > "Jukka A. Ukkonen" wrote freebsd-advocacy@freebsd.org: >> >> https://www.facebook.com/notes/jukka-ukkonen/upgrading-the-storage-disk-to-finnsat-fh05-hdr-digital-tv-receiver-while-retaini/10208639116987804 >> >> Feel free to publish the link on the freebsd web site or otherwise >> distribute it further. > > I added cc: freebsd-fs@freebsd.org as it's about file systems & Ext2 & offsets. > > (BTW I have no facebook login, so can assure readers Jukka's > page is public, no fbook login needed to access URL) > > An extract re FS. > > ] FreeBSD will by default not accept the Finnsat generated partition > ] for mounting. The trick is that Finnsat creates partitions with > ] slack alignment. FreeBSD knows that a live ext2fs has to be a > ] multiple of 4kB, 4096 bytes in size, i.e. 8 disk blocks, 8*512 > ] bytes. If the partition size is not perfectly aligned, FreeBSD does > ] not allow read-write mount to an ext2fs instance. With all likelihood > ] it might be a broken file system. Why should FreeBSD help making > ] things worse? > ] > ] So, you will have to adjust the partition size such that its length > ] will be a multiple of 8 disk blocks. The tool for this is gpart > ] (geom partition) which both modifies the partition tables and shows > ] their contents. First use the command > > Thanks Jukka, > > I've bcc'd a friend who I discussed Humax TV recorders with a while > back, on similar issues, some time when I'm visiting the town where > my 3 Humax owner friends are, I hope to find time to experiment > with my USB to SATA converter. > > This thread is archived here: > http://lists.freebsd.org/pipermail/freebsd-advocacy/2016-August/004619.html > & under here: > http://lists.freebsd.org/pipermail/freebsd-fs/2016-August/date.html This has been a problem for a while. gpart is too smart. It won't allow one to create unaligned partitions, even when you know they will work. Warner From owner-freebsd-fs@freebsd.org Fri Aug 19 07:56:24 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EFFBBBF4AB for ; Fri, 19 Aug 2016 07:56:24 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 12DF4120C for ; Fri, 19 Aug 2016 07:56:23 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11]) by proxypop01.sare.net (Postfix) with ESMTPSA id 2A04B9DD34F; Fri, 19 Aug 2016 09:56:19 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Borja Marcos In-Reply-To: <20160818132948.GB51561@neutralgood.org> Date: Fri, 19 Aug 2016 09:56:19 +0200 Cc: krad , FreeBSD FS , InterNetX - Juergen Gotteswinter Content-Transfer-Encoding: quoted-printable Message-Id: <0B420CCC-D04F-451A-960B-496F9F0031AE@sarenet.es> References: <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> <20160818132948.GB51561@neutralgood.org> To: "Kevin P. Neal" X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 07:56:24 -0000 > On 18 Aug 2016, at 15:29, Kevin P. Neal wrote: >=20 > On Thu, Aug 18, 2016 at 08:36:24AM +0100, krad wrote: >> I didnt think touch was atomic, mkdir is though >=20 > The shell script snippit that was posted is not safe since there is = time > in between the touch and the check for the existance of the lock file. >=20 > The better solution is to use FreeBSD's lockf command. Unfortunately it=E2=80=99s not portable. Hence mkdir is the suggested = way you will find on scripting tutorials, especially the classic ones :) Borja. From owner-freebsd-fs@freebsd.org Fri Aug 19 08:23:18 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B24BABBFEBC; Fri, 19 Aug 2016 08:23:18 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com [IPv6:2a00:1450:400c:c09::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 44D6D16BE; Fri, 19 Aug 2016 08:23:18 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: by mail-wm0-x234.google.com with SMTP id q128so24815388wma.1; Fri, 19 Aug 2016 01:23:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=XudDE4YK6XONsuVyvfMpLIKIhMyMEwGZInjqbQmepIY=; b=f7fMM+XuPCqB3f4BBMxyJY34O7l/KVvdTiKhgWXPg3jL6M1itTZgYA5GI4Elau+xqY cE5LnWQSSgF0HcKP+l2R0/xc9iNXctRvvd/IUebg4yBneH6MGrASb1mUOFF1agGPGXBi DOhzXr+nFWnkRtUeDbczRueoWOr7WDCK7WJMJ983QfuGOFow80xNkiFYE7fMtshrEPhd uDAF1yC1WYXVsMobEpju9R3rK/Zo+7eWpE86NheGaPQeUbAxZs2FbjPc/1w1pvH9SW5r 3xbEs5srf79+yzAEeKkH7fKOeidP6t8VMIzilYA8rPR2FG017TGSByzMCMyQLQIHT/0U 4qXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=XudDE4YK6XONsuVyvfMpLIKIhMyMEwGZInjqbQmepIY=; b=M8xMPlBVF4NwtnaWm1QFecMdEq4DKQpQEfl88+1JdRQmrLr1c08IOpx96T78KGtCI9 BvsXNzm6ta74aA8nz3mVvZNHBS0T56QInfrU6E2JUsF5mXiafWoUXks3a3AKcC7h8Eh6 KePdp65fx8NWp+iM6vLp+SilT9MyK9WJBH5bqN+3fSMANy69UhbWde2kWbckhIp/oNfp XkC++tHtJrlEFnBCrdjHjVos4Cx07raacmC5BYgcQ6qYt2G9lLwLuB+DJDcnhPT5TEHV A/Z3IbEOpVnmAhXFmCU7LGnnxj6bd4NdcVtGm4kZabkWOeMPmrhO41F8t2AD22bvZupQ gMAA== X-Gm-Message-State: AEkoouuK2v0C1kpw4+Cuy8pA/H6EfsQyzQr5qHjCqOhhV5YvffHW6TguxBU7l6psKeDO/g== X-Received: by 10.28.32.77 with SMTP id g74mr2752671wmg.45.1471594996724; Fri, 19 Aug 2016 01:23:16 -0700 (PDT) Received: from brick (euc212.neoplus.adsl.tpnet.pl. [83.20.174.212]) by smtp.gmail.com with ESMTPSA id w129sm3306406wmd.9.2016.08.19.01.23.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Aug 2016 01:23:15 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Date: Fri, 19 Aug 2016 10:23:10 +0200 From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= To: "Eugene M. Zheganin" Cc: FreeBSD FS , freebsd-stable Subject: Re: cannot destroy '': dataset is busy vs iSCSI Message-ID: <20160819082310.GA14806@brick> Mail-Followup-To: "Eugene M. Zheganin" , FreeBSD FS , freebsd-stable References: <57B5CD2F.2070204@norma.perm.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57B5CD2F.2070204@norma.perm.ru> User-Agent: Mutt/1.6.1 (2016-04-27) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 08:23:18 -0000 On 0818T1958, Eugene M. Zheganin wrote: > Hi. > > I'm using zvol clones with iSCSI. Perdiodically I renew them and destroy > the old ones, but sometimes the clone gets stuck and refuses to be > destroyed: > > (I'm showing the full sequence so it's self explanatory who is who's parent) > > [root@san2:/etc]# zfs destroy esx/games-reference1@ver5_6 > cannot destroy 'esx/games-reference1@ver5_6': snapshot has dependent clones > use '-R' to destroy the following datasets: > esx/games-reference1-ver5_6-worker111 > [root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111 > cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy > > The only entity that can hold the dataset open is ctld, so: > > [root@san2:/etc]# service ctld reload > [root@san2:/etc]# grep esx/games-reference1-ver5_6-worker111 /etc/ctl.conf > [root@san2:/etc]# zfs destroy esx/games-reference1-ver5_6-worker111 > cannot destroy 'esx/games-reference1-ver5_6-worker111': dataset is busy > > As you can see, the clone isn't mentioned in ctl.conf, but still refuses > to be destroyed. > Is there any way to destroy it without restarting ctld or rebooting the > server ? iSCSI is vital for production, but clones sometimes holds lot > of space. Could you do "ctladm devlist -v" and see if the LUN for this file somehow didn't get removed? From owner-freebsd-fs@freebsd.org Fri Aug 19 08:49:30 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EAD5BBE7E5 for ; Fri, 19 Aug 2016 08:49:30 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu1176c.smtpx.saremail.com (cu1176c.smtpx.saremail.com [195.16.148.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 112F41A19 for ; Fri, 19 Aug 2016 08:49:29 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11]) by proxypop02.sare.net (Postfix) with ESMTPSA id 8A20D9DC90E; Fri, 19 Aug 2016 10:49:20 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: HAST + ZFS + NFS + CARP From: Borja Marcos In-Reply-To: <4c34cbf9-84b5-5d42-e0b4-bf18aa1ef9a7@kateley.com> Date: Fri, 19 Aug 2016 10:49:20 +0200 Cc: juergen.gotteswinter@internetx.com, Chris Watson , freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <3F002B89-353E-41CE-8ACF-B34D7D774BCC@sarenet.es> References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> <7468cc18-85e8-3765-2b2b-a93ef73ca05a@internetx.com> <4c34cbf9-84b5-5d42-e0b4-bf18aa1ef9a7@kateley.com> To: linda@kateley.com X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 08:49:30 -0000 > On 18 Aug 2016, at 19:19, Linda Kateley wrote: >> here: running nexenta ha setup since several years with one = catastrophic >> failure due to split brain > Just trying to say I don't see projects ongoing.. just at beginning I saw consultants near the T=C3=A4nnhauser Gate=E2=80=A6 Some kind of feedback loop is terrific, really! ;) Borja. From owner-freebsd-fs@freebsd.org Fri Aug 19 20:18:49 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9F9EBBE694 for ; Fri, 19 Aug 2016 20:18:49 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8A4AB1592 for ; Fri, 19 Aug 2016 20:18:49 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1baqFU-0003gK-Kd; Fri, 19 Aug 2016 23:18:40 +0300 Date: Fri, 19 Aug 2016 23:18:40 +0300 From: Slawa Olhovchenkov To: Karl Denninger , freebsd-fs@freebsd.org Subject: Re: ZFS ARC under memory pressure Message-ID: <20160819201840.GA12519@zxy.spb.ru> References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 20:18:49 -0000 On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote: > > On 8/18/2016 15:26, Slawa Olhovchenkov wrote: > > On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote: > > > >> On 16/08/2016 22:34, Slawa Olhovchenkov wrote: > >>> I see issuses with ZFS ARC inder memory pressure. > >>> ZFS ARC size can be dramaticaly reduced, up to arc_min. > >>> > >>> As I see memory pressure event cause call arc_lowmem and set needfree: > >>> > >>> arc.c:arc_lowmem > >>> > >>> needfree = btoc(arc_c >> arc_shrink_shift); > >>> > >>> After this, arc_available_memory return negative vaules (PAGESIZE * > >>> (-needfree)) until needfree is zero. Independent how too much memory > >>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= > >>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every > >>> loop interation). > >>> > >>> arc_c droped to minimum value if arc_size fast enough droped. > >>> > >>> No control current to initial memory allocation. > >>> > >>> As result, I can see needless arc reclaim, from 10x to 100x times. > >>> > >>> Can some one check me and comment this? > >> You might have found a real problem here, but I am short of time right now to > >> properly analyze the issue. I think that on illumos 'needfree' is a variable > >> that's managed by the virtual memory system and it is akin to our > >> vm_pageout_deficit. But during the porting it became an artificial value and > >> its handling might be sub-optimal. > > As I see, totaly not optimal. > > I am create some patch for sub-optimal handling and now test it. > > _______________________________________________ > > freebsd-fs at freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org" > > You might want to look at the code contained in here: > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 In may case arc.c issuse cused by revision r286625 in HEAD (and r288562 in STABLE) -- all in 2015, not touch in 2014. > There are some ugly interactions with the VM system you can run into if > you're not careful; I've chased this issue before and while I haven't > yet done the work to integrate it into 11.x (and the underlying code > *has* changed since the 10.x patches I developed) if you wind up driving > the VM system to evict pages to swap rather than pare back ARC you're > probably making the wrong choice. > > In addition UMA can come into the picture too and (at least previously) > was a severe contributor to pathological behavior. I am only do less aggresive (and more controlled) shrink of ARC size. Now ARC just collapsed. Pointed PR is realy BIG. I am can't read and understund all of this. r286625 change behaivor of interaction between ARC and VM. You problem still exist? Can you explain (in list)? -- Slawa Olhovchenkov From owner-freebsd-fs@freebsd.org Fri Aug 19 20:39:09 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 81CE5BBEF51 for ; Fri, 19 Aug 2016 20:39:09 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3D69A1318 for ; Fri, 19 Aug 2016 20:39:09 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 3C107208713; Fri, 19 Aug 2016 15:39:01 -0500 (CDT) Subject: Re: ZFS ARC under memory pressure To: Slawa Olhovchenkov , freebsd-fs@freebsd.org References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <20160819201840.GA12519@zxy.spb.ru> From: Karl Denninger Message-ID: Date: Fri, 19 Aug 2016 15:38:55 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160819201840.GA12519@zxy.spb.ru> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms030104010004020408090706" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 20:39:09 -0000 This is a cryptographically signed message in MIME format. --------------ms030104010004020408090706 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/19/2016 15:18, Slawa Olhovchenkov wrote: > On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote: > >> On 8/18/2016 15:26, Slawa Olhovchenkov wrote: >>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote: >>> >>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote: >>>>> I see issuses with ZFS ARC inder memory pressure. >>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min. >>>>> >>>>> As I see memory pressure event cause call arc_lowmem and set needfr= ee: >>>>> >>>>> arc.c:arc_lowmem >>>>> >>>>> needfree =3D btoc(arc_c >> arc_shrink_shift); >>>>> >>>>> After this, arc_available_memory return negative vaules (PAGESIZE *= >>>>> (-needfree)) until needfree is zero. Independent how too much memor= y >>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=3D= >>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at eve= ry >>>>> loop interation). >>>>> >>>>> arc_c droped to minimum value if arc_size fast enough droped. >>>>> >>>>> No control current to initial memory allocation. >>>>> >>>>> As result, I can see needless arc reclaim, from 10x to 100x times. >>>>> >>>>> Can some one check me and comment this? >>>> You might have found a real problem here, but I am short of time rig= ht now to >>>> properly analyze the issue. I think that on illumos 'needfree' is a= variable >>>> that's managed by the virtual memory system and it is akin to our >>>> vm_pageout_deficit. But during the porting it became an artificial = value and >>>> its handling might be sub-optimal. >>> As I see, totaly not optimal. >>> I am create some patch for sub-optimal handling and now test it. >>> _______________________________________________ >>> freebsd-fs at freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.o= rg" >> You might want to look at the code contained in here: >> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594 > In may case arc.c issuse cused by revision r286625 in HEAD (and > r288562 in STABLE) -- all in 2015, not touch in 2014. > >> There are some ugly interactions with the VM system you can run into i= f >> you're not careful; I've chased this issue before and while I haven't >> yet done the work to integrate it into 11.x (and the underlying code >> *has* changed since the 10.x patches I developed) if you wind up drivi= ng >> the VM system to evict pages to swap rather than pare back ARC you're >> probably making the wrong choice. >> >> In addition UMA can come into the picture too and (at least previously= ) >> was a severe contributor to pathological behavior. > I am only do less aggresive (and more controlled) shrink of ARC size. > Now ARC just collapsed. > > Pointed PR is realy BIG. I am can't read and understund all of this. > r286625 change behaivor of interaction between ARC and VM. > You problem still exist? Can you explain (in list)? > Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified buffer cache (which the VM system manages) ZFS does not. ARC is allocated out of kernel memory and (by default) also uses UMA; the VM system is not involved in its management. When the VM system gets constrained (low memory) it thus cannot tell the ARC to pare back. So when the VM system gets low on RAM it will start to page. The problem with this is that if the VM system is low on RAM because the ARC is consuming memory you do NOT want to page, you want to evict some of the ARC. Consider this: ARC data *at best* prevents one I/O. That is, if there is data in the cache when you go to read from disk, you avoid one I/O per unit of data in the ARC you didn't have to read. Paging *always* requires one I/O (to write the page(s) to the swap) and MAY involve two (to later page it back in.) It is never a "win" to spend a *guaranteed* I/O when you can instead act in a way that *might* cause you to (later) need to execute one. Unfortunately the VM system has another interaction that causes trouble too. The VM system will "demote" a page to inactive or cache status but not actually free it. It only starts to go through those pages and free them when the vm system wakes up, and that only happens when free space gets low enough to trigger it. Finally, there's another problem that comes into play; UMA. Kernel memory allocation is fairly expensive. UMA grabs memory from the kernel allocation system in big chunks and manages it, and by doing so gains a pretty-significant performance boost. But this means that you can have large amounts of RAM that are allocated, not in use, and yet the VM system cannot reclaim them on its own. The ZFS code has to reap those caches, but reaping them is a moderately expensive operation too, thus you don't want to do it unnecessarily. I've not yet gone through the 11.x code to see what changed from 10.x; what I do know is that it is materially better-behaved than it used to be, in that prior to 11.x I would have (by now) pretty much been forced into rolling that forward and testing it because the misbehavior in one of my production systems was severe enough to render it basically unusable without the patch in that PR inline, with the most-serious misbehavior being paging-induced stalls that could reach 10s of seconds or more in duration. 11.x hasn't exhibited the severe problems, unpatched, that 10.x was known to do on my production systems -- but it is far less than great in that it sure as heck does have UMA coherence issues..... ARC Size: 38.58% 8.61 GiB Target Size: (Adaptive) 70.33% 15.70 GiB Min Size (Hard Limit): 12.50% 2.79 GiB Max Size (High Water): 8:1 22.32 GiB I have 20GB out in kernel memory on this machine right now but only 8.6 of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused -- so despite the belief expressed by some that the 11.x code is "better" at reaping UMA I'm sure not seeing it here. I'll get around to rolling forward and modifying that PR since that particular bit of jackassery with UMA is a definite performance problem. I suspect a big part of what you're seeing lies there as well. When I do get that code done and tested I suspect it may solve your problems as well. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms030104010004020408090706 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTkyMDM4NTVaME8GCSqGSIb3DQEJBDFCBEBf jToACzZ0aJsbs4kPgVo+9NmclF/Tf7IFMaWtJhWC9c6c69kh1e+sd08ixw/+69mWurKFLbsi k0c2kARW6yWTMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAiRCmguHm vmtO9kxmx0/QNeOjkaTT4SSzpqNuCXGfFc8z0DusMi0yV+d1qQ6+Kd7MGZKuyliV934suUt4 1f4Nr6F8A72QoHVYxFhO5FfkPqCogYMGgcViqaAh3AsHvr/lwHef1gGiUQS0rRVv61WYFacs g7vSULg4J5WuwDAhnHoW2FJ5EFrHXZdCb4h0i7aFnDktrCasXowAy5HPxa52oF1zSFKtOP3Z N1o3irux+R+ZvQPs5Bt31feVIevfaWHtdESwqgtalnGhpdgr4yFJldMIegJG3gNYc7+ChbH5 xLctVV6Du/PEZdcDUmq3jZjjAHAhqxcDmheRW+EXBbRbMTSHESx5hGIKYZ/7MrAfjC+MwN1W JHCeLw9x7VV8ucmUje7X3Pb7VSStmfkSt3qgAUEQQDc/1BpBBLPVXSF5i62UqNC/4iL00/Qe pQ9uNoBwVDIZik0Qmhb+Smeu6jEG88IBacqfNH+RoF9UPW0GgGh3k2un4jeaPHkljRQStydR 8RlOqflMEGG4LHX2v4YiYA+UHtxMV2pu4L+q26WsFEelD0byDenXX5aneKr0J62/Qo40NEc1 2lDDhPouaTPmYKNFjQX4pXKMTLo8hWSQEvcfwleZBBh0gmVoQNHDQP/wGGzSNkuq7gtEXs3l i0kEDF41DkONr7NO1N3+mBYkFNYAAAAAAAA= --------------ms030104010004020408090706-- From owner-freebsd-fs@freebsd.org Fri Aug 19 21:34:50 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A060FBC09B7 for ; Fri, 19 Aug 2016 21:34:50 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6088311AA for ; Fri, 19 Aug 2016 21:34:50 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1barR8-0005Lk-9Y; Sat, 20 Aug 2016 00:34:46 +0300 Date: Sat, 20 Aug 2016 00:34:46 +0300 From: Slawa Olhovchenkov To: Karl Denninger Cc: freebsd-fs@freebsd.org Subject: Re: ZFS ARC under memory pressure Message-ID: <20160819213446.GT8192@zxy.spb.ru> References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <20160819201840.GA12519@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 21:34:50 -0000 On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote: > On 8/19/2016 15:18, Slawa Olhovchenkov wrote: > > On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote: > > > >> On 8/18/2016 15:26, Slawa Olhovchenkov wrote: > >>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote: > >>> > >>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote: > >>>>> I see issuses with ZFS ARC inder memory pressure. > >>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min. > >>>>> > >>>>> As I see memory pressure event cause call arc_lowmem and set needfree: > >>>>> > >>>>> arc.c:arc_lowmem > >>>>> > >>>>> needfree = btoc(arc_c >> arc_shrink_shift); > >>>>> > >>>>> After this, arc_available_memory return negative vaules (PAGESIZE * > >>>>> (-needfree)) until needfree is zero. Independent how too much memory > >>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= > >>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at every > >>>>> loop interation). > >>>>> > >>>>> arc_c droped to minimum value if arc_size fast enough droped. > >>>>> > >>>>> No control current to initial memory allocation. > >>>>> > >>>>> As result, I can see needless arc reclaim, from 10x to 100x times. > >>>>> > >>>>> Can some one check me and comment this? > >>>> You might have found a real problem here, but I am short of time right now to > >>>> properly analyze the issue. I think that on illumos 'needfree' is a variable > >>>> that's managed by the virtual memory system and it is akin to our > >>>> vm_pageout_deficit. But during the porting it became an artificial value and > >>>> its handling might be sub-optimal. > >>> As I see, totaly not optimal. > >>> I am create some patch for sub-optimal handling and now test it. > >>> _______________________________________________ > >>> freebsd-fs at freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org" > >> You might want to look at the code contained in here: > >> > >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 > > In may case arc.c issuse cused by revision r286625 in HEAD (and > > r288562 in STABLE) -- all in 2015, not touch in 2014. > > > >> There are some ugly interactions with the VM system you can run into if > >> you're not careful; I've chased this issue before and while I haven't > >> yet done the work to integrate it into 11.x (and the underlying code > >> *has* changed since the 10.x patches I developed) if you wind up driving > >> the VM system to evict pages to swap rather than pare back ARC you're > >> probably making the wrong choice. > >> > >> In addition UMA can come into the picture too and (at least previously) > >> was a severe contributor to pathological behavior. > > I am only do less aggresive (and more controlled) shrink of ARC size. > > Now ARC just collapsed. > > > > Pointed PR is realy BIG. I am can't read and understund all of this. > > r286625 change behaivor of interaction between ARC and VM. > > You problem still exist? Can you explain (in list)? > > > > Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified > buffer cache (which the VM system manages) ZFS does not. ARC is > allocated out of kernel memory and (by default) also uses UMA; the VM > system is not involved in its management. > > When the VM system gets constrained (low memory) it thus cannot tell the > ARC to pare back. So when the VM system gets low on RAM it will start Currently VM generate event and ARC listen for this event, handle it by arc.c:arc_lowmem(). > to page. The problem with this is that if the VM system is low on RAM > because the ARC is consuming memory you do NOT want to page, you want to > evict some of the ARC. Now by event `lowmem` ARC try to evict 1/128 of ARC. > Unfortunately the VM system has another interaction that causes trouble > too. The VM system will "demote" a page to inactive or cache status but > not actually free it. It only starts to go through those pages and free > them when the vm system wakes up, and that only happens when free space > gets low enough to trigger it. > Finally, there's another problem that comes into play; UMA. Kernel > memory allocation is fairly expensive. UMA grabs memory from the kernel > allocation system in big chunks and manages it, and by doing so gains a > pretty-significant performance boost. But this means that you can have > large amounts of RAM that are allocated, not in use, and yet the VM > system cannot reclaim them on its own. The ZFS code has to reap those > caches, but reaping them is a moderately expensive operation too, thus > you don't want to do it unnecessarily. Not sure, but some code in ZFS may be handle this. arc.c:arc_kmem_reap_now(). Not sure. > I've not yet gone through the 11.x code to see what changed from 10.x; > what I do know is that it is materially better-behaved than it used to > be, in that prior to 11.x I would have (by now) pretty much been forced > into rolling that forward and testing it because the misbehavior in one > of my production systems was severe enough to render it basically > unusable without the patch in that PR inline, with the most-serious > misbehavior being paging-induced stalls that could reach 10s of seconds > or more in duration. > > 11.x hasn't exhibited the severe problems, unpatched, that 10.x was > known to do on my production systems -- but it is far less than great in > that it sure as heck does have UMA coherence issues..... > > ARC Size: 38.58% 8.61 GiB > Target Size: (Adaptive) 70.33% 15.70 GiB > Min Size (Hard Limit): 12.50% 2.79 GiB > Max Size (High Water): 8:1 22.32 GiB > > I have 20GB out in kernel memory on this machine right now but only 8.6 > of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused > -- so despite the belief expressed by some that the 11.x code is > "better" at reaping UMA I'm sure not seeing it here. I see. In my case: ARC Size: 79.65% 98.48 GiB Target Size: (Adaptive) 79.60% 98.42 GiB Min Size (Hard Limit): 12.50% 15.46 GiB Max Size (High Water): 8:1 123.64 GiB System Memory: 2.27% 2.83 GiB Active, 9.58% 11.94 GiB Inact 86.34% 107.62 GiB Wired, 0.00% 0 Cache 1.80% 2.25 GiB Free, 0.00% 0 Gap Real Installed: 128.00 GiB Real Available: 99.96% 127.95 GiB Real Managed: 97.41% 124.64 GiB Logical Total: 128.00 GiB Logical Used: 88.92% 113.81 GiB Logical Free: 11.08% 14.19 GiB Kernel Memory: 758.25 MiB Data: 97.81% 741.61 MiB Text: 2.19% 16.64 MiB Kernel Memory Map: 124.64 GiB Size: 81.84% 102.01 GiB Free: 18.16% 22.63 GiB Mem: 2895M Active, 12G Inact, 108G Wired, 528K Buf, 2303M Free ARC: 98G Total, 89G MFU, 9535M MRU, 35M Anon, 126M Header, 404M Other Swap: 32G Total, 394M Used, 32G Free, 1% Inuse Is this 12G Inactive as 'UMA allocated-but-unused'? This is also may be freed but not reclaimed network bufs. > I'll get around to rolling forward and modifying that PR since that > particular bit of jackassery with UMA is a definite performance > problem. I suspect a big part of what you're seeing lies there as > well. When I do get that code done and tested I suspect it may solve > your problems as well. No. May problem is completly different: under memory pressure, after arc_lowmem() set needfree to non-zero arc_reclaim_thread() start to shrink ARC. But arc_reclaim_thread (in FreeBSD case) don't correctly control this process and shrink stoped in random time (when after next iteration arc_size <= arc_c), mostly after drop to Min Size (Hard Limit). I am just resore control of shrink process. From owner-freebsd-fs@freebsd.org Fri Aug 19 21:52:09 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 416BABC0DE2 for ; Fri, 19 Aug 2016 21:52:09 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D48311BC7 for ; Fri, 19 Aug 2016 21:52:08 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 53614208A4C for ; Fri, 19 Aug 2016 16:52:06 -0500 (CDT) Subject: Re: ZFS ARC under memory pressure References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <20160819201840.GA12519@zxy.spb.ru> <20160819213446.GT8192@zxy.spb.ru> To: freebsd-fs@freebsd.org From: Karl Denninger Message-ID: <05ba785a-c86f-1ec8-fcf3-71d22551f4f3@denninger.net> Date: Fri, 19 Aug 2016 16:52:00 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160819213446.GT8192@zxy.spb.ru> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms040808020408080903000805" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 21:52:09 -0000 This is a cryptographically signed message in MIME format. --------------ms040808020408080903000805 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/19/2016 16:34, Slawa Olhovchenkov wrote: > On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote: > >> On 8/19/2016 15:18, Slawa Olhovchenkov wrote: >>> On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote: >>> >>>> On 8/18/2016 15:26, Slawa Olhovchenkov wrote: >>>>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote: >>>>> >>>>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote: >>>>>>> I see issuses with ZFS ARC inder memory pressure. >>>>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min. >>>>>>> >>>>>>> As I see memory pressure event cause call arc_lowmem and set need= free: >>>>>>> >>>>>>> arc.c:arc_lowmem >>>>>>> >>>>>>> needfree =3D btoc(arc_c >> arc_shrink_shift); >>>>>>> >>>>>>> After this, arc_available_memory return negative vaules (PAGESIZE= * >>>>>>> (-needfree)) until needfree is zero. Independent how too much mem= ory >>>>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <= =3D >>>>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at e= very >>>>>>> loop interation). >>>>>>> >>>>>>> arc_c droped to minimum value if arc_size fast enough droped. >>>>>>> >>>>>>> No control current to initial memory allocation. >>>>>>> >>>>>>> As result, I can see needless arc reclaim, from 10x to 100x times= =2E >>>>>>> >>>>>>> Can some one check me and comment this? >>>>>> You might have found a real problem here, but I am short of time r= ight now to >>>>>> properly analyze the issue. I think that on illumos 'needfree' is= a variable >>>>>> that's managed by the virtual memory system and it is akin to our >>>>>> vm_pageout_deficit. But during the porting it became an artificia= l value and >>>>>> its handling might be sub-optimal. >>>>> As I see, totaly not optimal. >>>>> I am create some patch for sub-optimal handling and now test it. >>>>> _______________________________________________ >>>>> freebsd-fs at freebsd.org mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd= =2Eorg" >>>> You might want to look at the code contained in here: >>>> >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594 >>> In may case arc.c issuse cused by revision r286625 in HEAD (and >>> r288562 in STABLE) -- all in 2015, not touch in 2014. >>> >>>> There are some ugly interactions with the VM system you can run into= if >>>> you're not careful; I've chased this issue before and while I haven'= t >>>> yet done the work to integrate it into 11.x (and the underlying code= >>>> *has* changed since the 10.x patches I developed) if you wind up dri= ving >>>> the VM system to evict pages to swap rather than pare back ARC you'r= e >>>> probably making the wrong choice. >>>> >>>> In addition UMA can come into the picture too and (at least previous= ly) >>>> was a severe contributor to pathological behavior. >>> I am only do less aggresive (and more controlled) shrink of ARC size.= >>> Now ARC just collapsed. >>> >>> Pointed PR is realy BIG. I am can't read and understund all of this. >>> r286625 change behaivor of interaction between ARC and VM. >>> You problem still exist? Can you explain (in list)? >>> >> Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified >> buffer cache (which the VM system manages) ZFS does not. ARC is >> allocated out of kernel memory and (by default) also uses UMA; the VM >> system is not involved in its management. >> >> When the VM system gets constrained (low memory) it thus cannot tell t= he >> ARC to pare back. So when the VM system gets low on RAM it will start= > Currently VM generate event and ARC listen for this event, handle it > by arc.c:arc_lowmem(). > >> to page. The problem with this is that if the VM system is low on RAM= >> because the ARC is consuming memory you do NOT want to page, you want = to >> evict some of the ARC. > Now by event `lowmem` ARC try to evict 1/128 of ARC. > >> Unfortunately the VM system has another interaction that causes troubl= e >> too. The VM system will "demote" a page to inactive or cache status b= ut >> not actually free it. It only starts to go through those pages and fr= ee >> them when the vm system wakes up, and that only happens when free spac= e >> gets low enough to trigger it. > >> Finally, there's another problem that comes into play; UMA. Kernel >> memory allocation is fairly expensive. UMA grabs memory from the kern= el >> allocation system in big chunks and manages it, and by doing so gains = a >> pretty-significant performance boost. But this means that you can hav= e >> large amounts of RAM that are allocated, not in use, and yet the VM >> system cannot reclaim them on its own. The ZFS code has to reap those= >> caches, but reaping them is a moderately expensive operation too, thus= >> you don't want to do it unnecessarily. > Not sure, but some code in ZFS may be handle this. > arc.c:arc_kmem_reap_now(). > Not sure. > >> I've not yet gone through the 11.x code to see what changed from 10.x;= >> what I do know is that it is materially better-behaved than it used to= >> be, in that prior to 11.x I would have (by now) pretty much been force= d >> into rolling that forward and testing it because the misbehavior in on= e >> of my production systems was severe enough to render it basically >> unusable without the patch in that PR inline, with the most-serious >> misbehavior being paging-induced stalls that could reach 10s of second= s >> or more in duration. >> >> 11.x hasn't exhibited the severe problems, unpatched, that 10.x was >> known to do on my production systems -- but it is far less than great = in >> that it sure as heck does have UMA coherence issues..... >> >> ARC Size: 38.58% 8.61 GiB >> Target Size: (Adaptive) 70.33% 15.70 GiB >> Min Size (Hard Limit): 12.50% 2.79 GiB >> Max Size (High Water): 8:1 22.32 GiB >> >> I have 20GB out in kernel memory on this machine right now but only 8.= 6 >> of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused= >> -- so despite the belief expressed by some that the 11.x code is >> "better" at reaping UMA I'm sure not seeing it here. > I see. > In my case: > > ARC Size: 79.65% 98.48 GiB > Target Size: (Adaptive) 79.60% 98.42 GiB > Min Size (Hard Limit): 12.50% 15.46 GiB > Max Size (High Water): 8:1 123.64 GiB > > System Memory: > > 2.27% 2.83 GiB Active, 9.58% 11.94 GiB Inact > 86.34% 107.62 GiB Wired, 0.00% 0 Cache > 1.80% 2.25 GiB Free, 0.00% 0 Gap > > Real Installed: 128.00 GiB > Real Available: 99.96% 127.95 GiB > Real Managed: 97.41% 124.64 GiB > > Logical Total: 128.00 GiB > Logical Used: 88.92% 113.81 GiB > Logical Free: 11.08% 14.19 GiB > > Kernel Memory: 758.25 MiB > Data: 97.81% 741.61 MiB > Text: 2.19% 16.64 MiB > > Kernel Memory Map: 124.64 GiB > Size: 81.84% 102.01 GiB > Free: 18.16% 22.63 GiB > > Mem: 2895M Active, 12G Inact, 108G Wired, 528K Buf, 2303M Free > ARC: 98G Total, 89G MFU, 9535M MRU, 35M Anon, 126M Header, 404M Other > Swap: 32G Total, 394M Used, 32G Free, 1% Inuse > > Is this 12G Inactive as 'UMA allocated-but-unused'? > This is also may be freed but not reclaimed network bufs. > >> I'll get around to rolling forward and modifying that PR since that >> particular bit of jackassery with UMA is a definite performance >> problem. I suspect a big part of what you're seeing lies there as >> well. When I do get that code done and tested I suspect it may solve >> your problems as well. > No. May problem is completly different: under memory pressure, after ar= c_lowmem() > set needfree to non-zero arc_reclaim_thread() start to shrink ARC. But > arc_reclaim_thread (in FreeBSD case) don't correctly control this proce= ss > and shrink stoped in random time (when after next iteration arc_size <=3D= arc_c), > mostly after drop to Min Size (Hard Limit). > > I am just resore control of shrink process. Not quite due to the UMA issue, among other things. There's also a potential "stall" issue that can arise also having to do with dirty_max sizing, especially if you are using rotating media. The PR patch scaled that back dynamically as well under memory pressure and eliminated that issue as well. I won't have time to look at this for at least another week on my test machine as I'm unfortunately buried with unrelated work at present, but I should be able to put some effort into this within the next couple weeks and see if I can quickly roll forward the important parts of the previous PR patch. I think you'll find that it stops the behavior you're seeing - I'm just pointing out that this was more-complex internally than it first appeared in the 10.x branch and I have no reason to believe the interactions that lead to bad behavior are not still in play given what you're describing for symptoms. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms040808020408080903000805 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTkyMTUyMDBaME8GCSqGSIb3DQEJBDFCBEAK hJi5/8ptyPvenRhWie/BSME8lhs9BQnHdC6flidXNcBCWBhTvA0NrlqjIYn/ORlwXesJRByf t14fEPqQtrVaMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAM2mjKv7n smv9SiI6bPPW708oruljYXQpJPRsM0HD8/hYLn5TPsVysnWZwuZCUrNikEBrQI5qqMmpYt9n o/DrVAhOiupZ2Jz8/oO7KJ+EEdMCABFdY9LRowdpJTHOhYUkaJ5D4YFg/EKP3a8RWGZ6av07 Iy4WZliVOVAV8147Pqxc/YJRxqEM225WV4riC2KkGgskNmYzB9M/nsNNTJiT0EhGxJIq/qfS k5WwkSAMOpUj8M3dI6pOCyIDIqjSUc4wxoVa4UXrdgx5VvXIZCsaatC8USfjCi9j1UE0aACe /CiPQFNIoesa+yMGszJ5jmHQAt1Wv/95nTQlfN6hEnZw015hGq6Wh3IPb4ajBVyy5TzEOiCV qiql3Z8ccHGaBjQDlSqK+CM/8ApZSeXE/CpThaGRPdUyZBQ51XRLvYzqVnAAM2bPOAgrd2kw ND8Ez7O2N3dpQJlc9pNKM7k7M0bfBSNp+bnjj3bLiiTFNA0fHnCKB2a1Eowucw3jDuVZ2jy3 OTUnBlWN48cE94fsMZ8hh2jYRZ7PHDLrveWUsCTkh8zPoN3rnWOrgw+SDBTREGp4rtJn7nQo popEmhSAR5ZZ6txJ65XAhISwOcHaTJMTn5CitAAG03koJjHK244t64e9P2BiB0LqMrxehM5T tOFRe/TzVhNIxUwq26xOfvP/pDYAAAAAAAA= --------------ms040808020408080903000805-- From owner-freebsd-fs@freebsd.org Sat Aug 20 01:38:53 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ED6EABBF8AF for ; Sat, 20 Aug 2016 01:38:53 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (thebighonker.lerctr.org [IPv6:2001:470:1f0f:3ad:223:7dff:fe9e:6e8a]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "thebighonker.lerctr.org", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B9EA818B1 for ; Sat, 20 Aug 2016 01:38:53 +0000 (UTC) (envelope-from ler@lerctr.org) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lerctr.org; s=lerami; h=Message-ID:Subject:To:From:Date:Content-Transfer-Encoding: Content-Type:MIME-Version:Sender:Reply-To:Cc:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=3VXayWpeBBXYBdeoPq+JUq19eaUd7ZXp01ZNjPVCDlA=; b=Y679OYAVNuaqi7uOTIoNmgNF6X IAhip/YHps/cI6J/AEElQs7fFeTcKcte4Uw7ehifMA7YpXe81/zU/SYQ98k7NYUM0W0/6DRvAbTDf aqlPURmRV3CMdwyqT8zx4q+jFMu2bam0IA/Gmw3+vNbHjN1/2Dv4aApATK43sUXUnq2M=; Received: from thebighonker.lerctr.org ([2001:470:1f0f:3ad:223:7dff:fe9e:6e8a]:40261 helo=webmail.lerctr.org) by thebighonker.lerctr.org with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87 (FreeBSD)) (envelope-from ) id 1bavFL-0006U8-Ua for freebsd-fs@freebsd.org; Fri, 19 Aug 2016 20:38:52 -0500 Received: from 2001:470:1f0f:42c:cc:6a5b:b3ec:36fb by webmail.lerctr.org with HTTP (HTTP/1.1 POST); Fri, 19 Aug 2016 20:38:51 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 19 Aug 2016 20:38:51 -0500 From: Larry Rosenman To: Freebsd fs Subject: Duplicate ZAP Message-ID: <529897b39cc8c04069a4c2b10bec7a7a@thebighonker.lerctr.org> X-Sender: ler@lerctr.org User-Agent: Roundcube Webmail/1.2.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2016 01:38:54 -0000 I brought this up in May, and finally had a chance to try the zfs send|zfs recv thing, and did on all the filesystems containing these 4 files, but I still have these: ZFS WARNING: Duplicated ZAP entry detected (libssl.a). ZFS WARNING: Duplicated ZAP entry detected (libzpool.so). ZFS WARNING: Duplicated ZAP entry detected (libtinfo_p.a). ZFS WARNING: Duplicated ZAP entry detected (libumem.so). The message appears to come out of the dedup code, and I've (long time ago) turned off dedup pool-wide. Do any of the ZFS experts have any other ideas? I can give access if you want to look around with zdb. current world/kernel: thebighonker.lerctr.org ~ $ uname -aKU FreeBSD thebighonker.lerctr.org 10.3-STABLE FreeBSD 10.3-STABLE #43 r301479: Sun Jun 5 22:39:14 CDT 2016 root@thebighonker.lerctr.org:/usr/obj/usr/src/sys/GENERIC amd64 1003503 1003503 thebighonker.lerctr.org ~ $ -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 From owner-freebsd-fs@freebsd.org Sat Aug 20 07:29:54 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D6788BBF7D8 for ; Sat, 20 Aug 2016 07:29:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C61711B84 for ; Sat, 20 Aug 2016 07:29:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7K7TsQt041629 for ; Sat, 20 Aug 2016 07:29:54 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 211939] ZFS does not correctly import cache and spares by label Date: Sat, 20 Aug 2016 07:29:55 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: ben.rubson@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2016 07:29:54 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211939 --- Comment #1 from Ben RUBSON --- Perhaps we are talking about these 2 commits ? https://svnweb.freebsd.org/base?view=3Drevision&revision=3D292066 https://svnweb.freebsd.org/base?view=3Drevision&revision=3D293708 I'm not really sure... Thank you ! Ben --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sat Aug 20 15:22:35 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9410BC04BF for ; Sat, 20 Aug 2016 15:22:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 701FB1B0E for ; Sat, 20 Aug 2016 15:22:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u7KFMPNS094804 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 20 Aug 2016 18:22:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u7KFMPNS094804 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u7KFMPYP094803; Sat, 20 Aug 2016 18:22:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 20 Aug 2016 18:22:25 +0300 From: Konstantin Belousov To: Karl Denninger Cc: Slawa Olhovchenkov , freebsd-fs@freebsd.org Subject: Re: ZFS ARC under memory pressure Message-ID: <20160820152225.GP83214@kib.kiev.ua> References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <20160819201840.GA12519@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2016 15:22:35 -0000 On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote: > Paging *always* requires one I/O (to write the page(s) to the swap) and > MAY involve two (to later page it back in.) It is never a "win" to > spend a *guaranteed* I/O when you can instead act in a way that *might* > cause you to (later) need to execute one. Why would pagedaemon need to write out clean page ? From owner-freebsd-fs@freebsd.org Sat Aug 20 16:08:54 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 04B8FBC0194 for ; Sat, 20 Aug 2016 16:08:54 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CC5B0167F for ; Sat, 20 Aug 2016 16:08:53 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 3FB3F219FD for ; Sat, 20 Aug 2016 11:08:51 -0500 (CDT) Subject: Re: ZFS ARC under memory pressure References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <20160819201840.GA12519@zxy.spb.ru> <20160820152225.GP83214@kib.kiev.ua> Cc: freebsd-fs@freebsd.org From: Karl Denninger Message-ID: <97f166f0-4d47-d5a3-ecb3-d15f1ecf9c1f@denninger.net> Date: Sat, 20 Aug 2016 11:08:44 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160820152225.GP83214@kib.kiev.ua> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms050405090709090407070503" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2016 16:08:54 -0000 This is a cryptographically signed message in MIME format. --------------ms050405090709090407070503 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/20/2016 10:22, Konstantin Belousov wrote: > On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote: >> Paging *always* requires one I/O (to write the page(s) to the swap) an= d >> MAY involve two (to later page it back in.) It is never a "win" to >> spend a *guaranteed* I/O when you can instead act in a way that *might= * >> cause you to (later) need to execute one. > Why would pagedaemon need to write out clean page ? If you are talking about the case of an executable in which part of the text is evicted you are correct, however, you are still choosing in that instance to evict a page for which there will likely be a future demand and thus require an I/O (should that executable come back up for execution) as opposed to one for which you have no idea how likely demand for same will be (a data page in the ARC.) Since the VM has no means of "coloring" the ARC (as it is opaque other than the consumption of system memory to the VM) as to how "useful" (e.g. how often used, etc) a particular data item in the ARC is, it has no information available on which to decide. However, the fact that an executing process is in some sort of waiting state still likely trumps an ARC data page in terms of likelihood of future access. root@NewFS:/usr/src/sys/amd64/conf # pstat -s Device 1K-blocks Used Avail Capacity /dev/mirror/sw.eli 67108860 291356 66817504 0% While this is not a large amount of page space used I can assure you that at no time since boot was all 32GB of memory in the machine consumed with other-than-ARC data. As such for the VM system to have decided to evict pages to the swap file rather than the ARC be pared back is demonstrably wrong since the result was the execution of I/Os on the *speculative* bet that a page in the ARC would preferentially be required. On 10.x, unpatched, there were fairly trivial "added" workload choices that one might make on a routine basis (e.g. "make -j8 buildworld") on this machine that, if you had a largish text file open in "vi", would lead to user-perceived stalls exceeding 10 seconds in length during which that process's working set had been evicted so as to keep ARC cache data! While it might at first blush appear that the Postgres database consumers on the same machine would be happy with this when *their* RSS got paged out and *they* took the resulting 10+ second stall as well that certainly was not the case! 11.x does exhibit far less pathology in this regard than did 10.x (unpatched) and I've yet to see the "stall system to the point that it appears it has crashed" behavior that I formerly could provoke with a trivial test. However, the fact remains that the same machine, with the same load, running 10.x and my patches ran for months at a time with zero page space consumed, a fully-utilized ARC and very little slack space (defined as RAM in "Cache" + allocated-but-unused UMA) -- in other words, with no displayed pathology at all. The behavior of unpatched 11.x, while very-materially better than unpatched 10.x, IMHO does not meet this standard. In particular there are quite-large quantities of UMA space out-but-unused on a regular basis= and while *at present* the ARC looks pretty healthy this is a weekend when system load is quite low. During the week not only does the UMA situation look far worse so does the ARC size and efficiency which frequently winds up running at "half-mast" compared to where it ought to = be. I believe FreeBSD 11.x can do better and intend to roll forward the 10.x work in an attempt to implement that. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms050405090709090407070503 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MjAxNjA4NDRaME8GCSqGSIb3DQEJBDFCBEAS Zl+4p0iIIr2XvXPcFFFHySop9cG9weehGjGjTN2fG8b+6nWgMCkvSGozOg9Ezvojy4PuNEuj 4aJJlOKAnsp8MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAHf4plh2t fRHIRSFT/S6u8gAkyud9Gq+LnTpO4e2MAvXeNUORco00hBXqa5WW8n0mtUmupmBYMAHsreST F3sCwmk0yLyK4RqB6rs84/flVvm0GJlwOaHRxeq4B8qGoxUe4KscjiHLfR+YRI1DAHTP5MER vze4Hk6ANMGUBPlea7Nj6IgAA/pAx8knw3pON0YOnKf6Zb5Rhlbe4pz9I/n7o8BEZ35xfm3o Of39r9QSQX5Y4IyegpIQjdH1kStAHLA8QmCFbhMpwOi0f6xi/tO0qU18Jhew6y3CqGmAYddN nBVEV9u0S5JNgClRcV6JZMYjHxT7PyGGRPVtXJ4hKsy0fZxYUNaZ0Ha5fZvabfGAClW1PDLv sj6DhUvPQ7yXvRFt/ocCQCkGj+UJtHrWcFr75RW6md8/MGnfL386zLLc+/3/h1bm1ig9KRdN PkJYYMxqmUux3ueNCj0kxlnWcctsXaQpChxrdhTns+yxj+32bHXzDiqR8Me4m1IPQkqdpAW2 KQ0fNlop1E4PguteLdQafmtz6DIdIid4N8hgJ75UevlUf705+nJlZCYTLFATfEAO0liiqZxf kcuvvU7dmjKFFdH1pfscQDCDbDD5EaHSp7rEShWJbxrOfxc6RoHEWmBzwo/uSlbVh6ZJ+7Gf 4NTodj3yG6NadnGUw1dtmTqjiokAAAAAAAA= --------------ms050405090709090407070503-- From owner-freebsd-fs@freebsd.org Sat Aug 20 23:28:29 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 25C71BC0657 for ; Sat, 20 Aug 2016 23:28:29 +0000 (UTC) (envelope-from intlightru@vh40.sweb.ru) Received: from vh40.sweb.ru (unknown [IPv6:2a02:408:7722:1:77:222:42:236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BF3C81180 for ; Sat, 20 Aug 2016 23:28:28 +0000 (UTC) (envelope-from intlightru@vh40.sweb.ru) Received: from intlightru by vh40.sweb.ru with local (Exim 4.85_2) (envelope-from ) id 1bbFge-002OEP-J4 for freebsd-fs@freebsd.org; Sun, 21 Aug 2016 02:28:24 +0300 To: freebsd-fs@freebsd.org Subject: Courier was unable to deliver the parcel, ID00000131309 X-PHP-Originating-Script: 10900:post.php(4) : regexp code(1) : eval()'d code(17) : eval()'d code Date: Sun, 21 Aug 2016 02:28:24 +0300 From: "FedEx International Ground" Reply-To: "FedEx International Ground" Message-ID: X-Priority: 3 MIME-Version: 1.0 X-Sender-Uid: 10900 Content-Type: text/plain; charset=us-ascii X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2016 23:28:29 -0000 Dear Customer, Courier was unable to deliver the parcel to you. You can review complete details of your order in the find attached. Kind regards, Arthur Bowen, Station Agent.