Date: Thu, 11 Aug 2016 10:11:15 +0200 From: Borja Marcos <borjam@sarenet.es> To: Julien Cigar <julien@perdition.city> Cc: Jordan Hubbard <jkh@ixsystems.com>, freebsd-fs@freebsd.org Subject: Re: HAST + ZFS + NFS + CARP Message-ID: <E7D42341-D324-41C7-B03A-2420DA7A7952@sarenet.es> In-Reply-To: <20160704193131.GJ41276@mordor.lan> References: <678321AB-A9F7-4890-A8C7-E20DFDC69137@gmail.com> <20160630185701.GD5695@mordor.lan> <6035AB85-8E62-4F0A-9FA8-125B31A7A387@gmail.com> <20160703192945.GE41276@mordor.lan> <20160703214723.GF41276@mordor.lan> <65906F84-CFFC-40E9-8236-56AFB6BE2DE1@ixsystems.com> <B48FB28E-30FA-477F-810E-DF4F575F5063@gmail.com> <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <AE372BF0-02BE-4BF3-9073-A05DB4E7FE34@ixsystems.com> <20160704193131.GJ41276@mordor.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 04 Jul 2016, at 21:31, Julien Cigar <julien@perdition.city> wrote: >=20 >> To get specific again, I am not sure I would do what you are = contemplating given your circumstances since it=E2=80=99s not the = cheapest / simplest solution. The cheapest / simplest solution would be = to create 2 small ZFS servers and simply do zfs snapshot replication = between them at periodic intervals, so you have a backup copy of the = data for maximum safety as well as a physically separate server in case = one goes down hard. Disk storage is the cheap part now, particularly if = you have data redundancy and can therefore use inexpensive disks, and = ZFS replication is certainly =E2=80=9Cgood enough=E2=80=9D for disaster = recovery. As others have said, adding additional layers will only = increase the overall fragility of the solution, and =E2=80=9Cfragile=E2=80= =9D is kind of the last thing you need when you=E2=80=99re frantically = trying to deal with a server that has gone down for what could be any = number of reasons. >>=20 >> I, for example, use a pair of FreeNAS Minis at home to store all my = media and they work fine at minimal cost. I use one as the primary = server that talks to all of the VMWare / Plex / iTunes server = applications (and serves as a backup device for all my iDevices) and it = replicates the entire pool to another secondary server that can be = pushed into service as the primary if the first one loses a power supply = / catches fire / loses more than 1 drive at a time / etc. Since I have = a backup, I can also just use RAIDZ1 for the 4x4Tb drive configuration = on the primary and get a good storage / redundancy ratio (I can lose a = single drive without data loss but am also not wasting a lot of storage = on parity). >=20 > You're right, I'll definitively reconsider the zfs send / zfs receive > approach. Sorry to be so late to the party. Unless you have a *hard* requirement for synchronous replication, I = would avoid it like the plague. Synchronous replication sounds sexy, but = it has several disadvantages: Complexity and in case you wish to keep an = off-site replica it will definitely impact performance. Distance will increase delay. Asynchronous replication with ZFS has several advantages, however. First and foremost: the snapshot-replicate approach is a terrific = short-term =E2=80=9Cbackup=E2=80=9D solution that will allow you to = recover quickly from some often too quickly incidents, like your own software corrupting data. A = ZFS snapshot is trivial to roll back and it won=E2=80=99t involve a = costly =E2=80=9Cbackup recovery=E2=80=9D procedure. You can do both replication *and* keep some = snapshot retention policy =C3=A0la Apple=E2=80=99s Time Machine.=20 Second: I mentioned distance when keeping off-site replicas, as distance = necessarily increases delay. Asynchronous replication doesn=C2=B4t have = that problem. Third: With some care you can do a one to N replication, even involving = different replication frequencies. Several years ago, in 2009 I think, I set up a system that worked quite = well. It was based on NFS and ZFS. The requirements were a bit = particular, which in this case greatly simplified it for me. I had a farm of front-end web servers (running Apache) that took all of = the content from a NFS server. The NFS server used ZFS as the file = system. This might not be useful for everyone, but in this case the web = servers were CPU bound due to plenty of PHP crap. As the front ends = weren=E2=80=99t supposed to write to the file server (and indeed it was = undesirable for security reasons) I could afford to export the NFS file = systems in read-only mode.=20 The server was replicated to a sibling in 1 or 2 minute intervals, I = don=E2=80=99t remember. And the interesting part was this. I used = Heartbeat to decide which of the servers was the master. When Heartbeat = decided which one was the master, a specific IP address was assigned to = it, starting the NFS service. So, the front-ends would happily mount it. What happened in case of a server failure?=20 Heartbeat would detect it in a minute more or less. Assuming a master = failure, the former slave would become master, assigning itself the NFS server IP address and starting up NFS. Meanwhile, the front-ends had a = silly script running in 1 minute intervals that simply read a file from = the NFS mounted filesystem. In case there was a reading error it would force = an unmount of the NFS server and it would enter a loop trying to mount = it again until it succeeded. It looks kludgy, but that means that in case of a server loss (ZFS on = FreeBSD wasn=E2=80=99t that stable at the time and we suffered a couple = of them) the website was titsup for maybe two minutes, recovering = automatically. It worked.=20 Both NFS servers were in the same datacenter, but I could have added = geographical dispersion by using BGP to announce the NFS IP address to = our routers.=20 There are better solutions, but this one involved no fancy software = licenses, no expensive hardware and it was quite reliable. The only = problem we had was, maybe I was just too daring, we were bitten by a ZFS = deadlock bug several times. But it worked anyway. Borja.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E7D42341-D324-41C7-B03A-2420DA7A7952>