From owner-freebsd-fs@freebsd.org Fri Oct 30 13:50:10 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DE50DA1E278 for ; Fri, 30 Oct 2015 13:50:10 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7292A184B for ; Fri, 30 Oct 2015 13:50:10 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by wmff134 with SMTP id f134so12308867wmf.1 for ; Fri, 30 Oct 2015 06:50:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay_co_uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=GYVW5riajIKF8/f1ahQzQBbM358hy7sq95x3qc8n6BQ=; b=pX90HsE3Y0kNtSpSTGHaYZjv85LHmzRp4z+NCFTmeyV88CApcu4Iriad/bddHiB0Cm 5CMD8eIsEG7iCj2DR8cgLxwzuiHZSAtrkZRdCPA/85VR7q5TcIvj33JhcgmcR1ijg6bo sCNyxtx5y6FZP+HCE0m2L1DYajJcKQEeVygqaVYMH7j0v3iMiv2X4Ej2ddc4wnALw78o k7mCXwPNcZM1necaHWXOnRv3omthQeumG+eG+NBjLfBqROwFtSjyhMcA2nyg7rltbat9 f2Y8P3GooqsqyR6zN/EWgKdz0Hfq/kOqgVY/J37Y/of79DPFD5qf1oNV10I+vmU9Yk3L i9Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=GYVW5riajIKF8/f1ahQzQBbM358hy7sq95x3qc8n6BQ=; b=QldpJh+Z45Q/rVBRUYjFQGeq81LG8vkVfq9qaalMBKvuuZg+Q1soG9cyMlGJB2GCNP 7Tg3On+jOfW7wp9JItDH45a41Q0HVa7nsyr8NB5r4mzP0e6m9o+cL9nigxCrte5aGX5C g7r92psjwfbvx5ZNjDknb21GGlpso0GfihtXM5oX4M7kNOboHujfUJqWXvq5xton7j1N ahs9/VNHMBi6K7ve4LLz+jWA7G8anPf9SKkg5+xokp78dXzJR34jNptn7IxiSCMvxo1R 1wtwOId1azccCSQkUcsY7WHrVBgz/STniVTs2VAa8Ci2h05OROjW6aa2EYrDKlqNB6EM 8ayw== X-Gm-Message-State: ALoCoQlF+8nRCu35sOjxeJirKzNf1EVRx+rf9nJEKCSR28dg4WT4kEluvUHCv2YNhdhpnt5Eb/s0 X-Received: by 10.28.13.75 with SMTP id 72mr3477908wmn.20.1446213008780; Fri, 30 Oct 2015 06:50:08 -0700 (PDT) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id j4sm3007953wmg.18.2015.10.30.06.50.07 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Oct 2015 06:50:07 -0700 (PDT) Subject: Re: iSCSI/ZFS strangeness To: freebsd-fs@freebsd.org References: <20151029015721.GA95057@mail.michaelwlucas.com> <563262C4.1040706@rlwinm.de> <9D4FE448-28EC-45F6-B525-E660E3AF57B0@tcbug.org> From: Steven Hartland Message-ID: <56337598.2010704@multiplay.co.uk> Date: Fri, 30 Oct 2015 13:50:16 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <9D4FE448-28EC-45F6-B525-E660E3AF57B0@tcbug.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Oct 2015 13:50:11 -0000 On 30/10/2015 13:06, Josh Paetzel wrote: >> On Oct 29, 2015, at 1:17 PM, Jan Bramkamp wrote: >> >>> On 29/10/15 02:57, Michael W. Lucas wrote: >>> The initiators can both access the iSCSI-based pool--not >>> simultaneously, of course. But CARP, devd, and some shell scripting >>> should get me a highly available pool that can withstand the demise of >>> any one iSCSI server and any one initiator. >>> >>> The hope is that the pool would continue to work even if an iSCSI host >>> shuts down. When the downed iSCSI host returns, the initiators should >>> log back in and the pool auto-resilver. >> I would recommend against using CARP for this because CARP is prone to split-brain situations and in this case they could destroy your whole storage pool. If the current head node fails the replacement has to `zpool import -f` the pool and and in the case of a split-brain situation both head nodes would continue writing to the iSCSI targets. >> >> I would move the leader election to an external service like consul, etcd or zookeeper. This is one case where the added complexity is worth it. If you can't run an external service for this e.g. it would exceed the scope of the chapter you're writing please simplify the setup with more reliable hardware, good monitoring and manual failover for maintenance. CARP isn't designed to implement reliable (enough) master election for your storage cluster. >> >> Adding iSCSI to your storage stack adds complexity and overhead. For setups which still fit inside a single rack SAS (with geom_multipath) is normally faster and cheaper. On the other hand you can't spread out SAS storage far enough to implement disaster tolerance should you really need it and it certainly is an setup. > > I'll impart some wisdom here. > > 1) HA with two nodes is impossible to do right. You need a third system to achieve quorum. > > 2) You can do SAS over optical these days. Perfect for having mirrored JBODs in different fire suppression zones of a datacenter. > > 3) I've seen a LOT of "cobbled together with shell script" HA rigs. They mostly get disabled eventually as it's realized that they go split brain in the edge cases and destroy the storage. What we did was go passive/passive and then address those cases as "how could we have avoided going passive/passive". It took two years. > > 4) Leverage mav@'s ALUA support. For block access this will make your life much easier. > > 5) Give me a call. I type slow and tend to leave things out, but would happily do one or more brain dump sessions. Does the use of gmultipath not solve this problem?