Date: Fri, 30 Oct 2015 08:06:55 -0500 From: Josh Paetzel <josh@tcbug.org> To: Jan Bramkamp <crest@rlwinm.de> Cc: freebsd-fs@freebsd.org Subject: Re: iSCSI/ZFS strangeness Message-ID: <9D4FE448-28EC-45F6-B525-E660E3AF57B0@tcbug.org> In-Reply-To: <563262C4.1040706@rlwinm.de> References: <20151029015721.GA95057@mail.michaelwlucas.com> <563262C4.1040706@rlwinm.de>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Oct 29, 2015, at 1:17 PM, Jan Bramkamp <crest@rlwinm.de> wrote: >=20 >> On 29/10/15 02:57, Michael W. Lucas wrote: >> The initiators can both access the iSCSI-based pool--not >> simultaneously, of course. But CARP, devd, and some shell scripting >> should get me a highly available pool that can withstand the demise of >> any one iSCSI server and any one initiator. >>=20 >> The hope is that the pool would continue to work even if an iSCSI host >> shuts down. When the downed iSCSI host returns, the initiators should >> log back in and the pool auto-resilver. >=20 > I would recommend against using CARP for this because CARP is prone to spl= it-brain situations and in this case they could destroy your whole storage p= ool. If the current head node fails the replacement has to `zpool import -f`= the pool and and in the case of a split-brain situation both head nodes wou= ld continue writing to the iSCSI targets. >=20 > I would move the leader election to an external service like consul, etcd o= r zookeeper. This is one case where the added complexity is worth it. If you= can't run an external service for this e.g. it would exceed the scope of th= e chapter you're writing please simplify the setup with more reliable hardwa= re, good monitoring and manual failover for maintenance. CARP isn't designed= to implement reliable (enough) master election for your storage cluster. >=20 > Adding iSCSI to your storage stack adds complexity and overhead. For setup= s which still fit inside a single rack SAS (with geom_multipath) is normally= faster and cheaper. On the other hand you can't spread out SAS storage far e= nough to implement disaster tolerance should you really need it and it certa= inly is an setup. I'll impart some wisdom here. 1) HA with two nodes is impossible to do right. You need a third system to a= chieve quorum. 2) You can do SAS over optical these days. Perfect for having mirrored JBODs= in different fire suppression zones of a datacenter. 3) I've seen a LOT of "cobbled together with shell script" HA rigs. They mo= stly get disabled eventually as it's realized that they go split brain in th= e edge cases and destroy the storage. What we did was go passive/passive an= d then address those cases as "how could we have avoided going passive/passi= ve". It took two years. 4) Leverage mav@'s ALUA support. For block access this will make your life m= uch easier. 5) Give me a call. I type slow and tend to leave things out, but would happi= ly do one or more brain dump sessions.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9D4FE448-28EC-45F6-B525-E660E3AF57B0>