From owner-freebsd-fs@freebsd.org Wed Aug 17 18:03:22 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5D116BBDB71 for ; Wed, 17 Aug 2016 18:03:22 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-io0-x235.google.com (mail-io0-x235.google.com [IPv6:2607:f8b0:4001:c06::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 238BD1E0C for ; Wed, 17 Aug 2016 18:03:22 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by mail-io0-x235.google.com with SMTP id b62so142309336iod.3 for ; Wed, 17 Aug 2016 11:03:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kateley-com.20150623.gappssmtp.com; s=20150623; h=reply-to:subject:references:to:cc:from:organization:message-id:date :user-agent:mime-version:in-reply-to; bh=W3CZ6snaEsSDrvxIVpZVNCrQBRMZ/J6cyLqa9EV80C8=; b=m4odc/p8dKvuk9pI2qKktnCxG0TXSMmJ9J4HDk9woDch65sNz3aorvH8mYlw/nQn4j dRGpZ/Sz6gQDfnkCwFnpsU5xGy+6NDETgP5TMikl8GMTfgs7Y1vh1tBdozE+d4ytMdHy /ig8gabKXaf7TPEgLjmcCSmurPzEPyCKpKcgknYktUvb7tP9mrhgTLqjUBrOO5C9H9Q5 JE9W5zD9jJx8Obc1Co1aqtdZnUzv7z2yyyMAYg08m1JSyPfbUY19aHJvoa6veP40pUlZ VOW5oX3IRN8AwZhov/uyaBttWILOZ4qPRQarlgw11QCa/Zk6p7YCXX40me8kL5iffOKS cThw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:cc:from :organization:message-id:date:user-agent:mime-version:in-reply-to; bh=W3CZ6snaEsSDrvxIVpZVNCrQBRMZ/J6cyLqa9EV80C8=; b=Ishima+vkpdLgxxt4yajXhpNZyheJT9WhlTVm97a75crfu1hg279S1/gEFiAigW+NZ n9DReiIiqGrX0Bo4R9lIyUjCVFnKaNcpytkaYSt7DIecpjk2xF4k4UevAZZfyD0+OO4J QYjslJE52i0i+mzCgVUp+LXcbsgjyAHg1TqajeEUhAoYewiNch39ia5aMSTK50AaSfHU n+zzl1qcH/qbfGuaAgHesI5TZoMNkZDmmCsZ/y5v1cOPlzSbkkBv9IiE5Hzl+SsbTxDe NG96sT4C4GmGN5V/EFkkHlTlQClttHGMYVfj3nVsUNGklXBoc64jYs9/q10Ui8Cv9xrh bqZA== X-Gm-Message-State: AEkoous5PE/JZw9Dj7MXD2Raoa76xi1bRt1RwCbMY3V6+rPdgu/oosucVd8rpQLVmzBSxw== X-Received: by 10.107.152.201 with SMTP id a192mr54177775ioe.24.1471457001313; Wed, 17 Aug 2016 11:03:21 -0700 (PDT) Received: from Kateleyco-iMac.local (c-50-188-36-30.hsd1.mn.comcast.net. [50.188.36.30]) by smtp.googlemail.com with ESMTPSA id v195sm438837itc.8.2016.08.17.11.03.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Aug 2016 11:03:20 -0700 (PDT) Reply-To: linda@kateley.com Subject: Re: HAST + ZFS + NFS + CARP References: <61283600-A41A-4A8A-92F9-7FAFF54DD175@ixsystems.com> <20160704183643.GI41276@mordor.lan> <20160704193131.GJ41276@mordor.lan> <20160811091016.GI70364@mordor.lan> <1AA52221-9B04-4CF6-97A3-D2C2B330B7F9@sarenet.es> <472bc879-977f-8c4c-c91a-84cc61efcd86@internetx.com> <20160817085413.GE22506@mordor.lan> <465bdec5-45b7-8a1d-d580-329ab6d4881b@internetx.com> <20160817095222.GG22506@mordor.lan> <52d5b687-1351-9ec5-7b67-bfa0be1c8415@kateley.com> <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> To: Chris Watson , linda@kateley.com Cc: freebsd-fs@freebsd.org From: Linda Kateley Organization: Kateley Company Message-ID: <6b866b6e-1ab3-bcc5-151b-653e401742bd@kateley.com> Date: Wed, 17 Aug 2016 13:03:19 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <92F4BE3D-E4C1-4E5C-B631-D8F124988A83@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2016 18:03:22 -0000 I just do consulting so I don't always get to see the end of the project. Although we are starting to do more ongoing support so we can see the progress.. I have worked with some of the guys from high-availability.com for maybe 20 years. RSF-1 is the cluster that is bundled with nexenta. Does work beautifully with omni/illumos. The one customer I have running it in prod is an isp in south america running openstack and zfs on freebsd as iscsi. Big boxes, 90+ drives per frame. If someone would like try it, i have some contacts there. Ping me offlist. You do risk losing data if you batch zfs send. It is very hard to run that real time. You have to take the snap then send the snap. Most people run in cron, even if it's not in cron, you would want one to finish before you started the next. If you lose the sending host before the receive is complete you won't have a full copy. With zfs though you will probably still have the data on the sending host, however long it takes to bring it back up. RSF-1 runs in the zfs stack and send the writes to the second system. It's kind of pricey, but actually much less expensive than commercial alternatives. Anytime you run anything sync it adds latency but makes things safer.. There is also a cool tool I like, called zerto for vmware that sits in the hypervisor and sends a sync copy of a write locally and then an async remotely. It's pretty cool. Although I haven't run it myself, have a bunch of customers running it. I believe it works with proxmox too. Most people I run into (these days) don't mind losing 5 or even 30 minutes of data. Small shops. They usually have a copy somewhere else. Or the cost of 5-30 minutes isn't that great. I used work as a datacenter architect for sun/oracle with only fortune 500. There losing 1 sec could put large companies out of business. I worked with banks and exchanges. They couldn't ever lose a single transaction. Most people nowadays do the replication/availability in the application though and don't care about underlying hardware, especially disk. On 8/17/16 11:55 AM, Chris Watson wrote: > Of course, if you are willing to accept some amount of data loss that > opens up a lot more options. :) > > Some may find that acceptable though. Like turning off fsync with > PostgreSQL to get much higher throughput. As little no as you are made > *very* aware of the risks. > > It's good to have input in this thread from one with more experience > with RSF-1 than the rest of us. You confirm what others have that said > about RSF-1, that it's stable and works well. What were you deploying > it on? > > Chris > > Sent from my iPhone 5 > > On Aug 17, 2016, at 11:18 AM, Linda Kateley > wrote: > >> The question I always ask, as an architect, is "can you lose 1 minute >> worth of data?" If you can, then batched replication is perfect. If >> you can't.. then HA. Every place I have positioned it, rsf-1 has >> worked extremely well. If i remember right, it works at the dmu. I >> would suggest try it. They have been trying to have a full freebsd >> solution, I have several customers running it well. >> >> linda >> >> >> On 8/17/16 4:52 AM, Julien Cigar wrote: >>> On Wed, Aug 17, 2016 at 11:05:46AM +0200, InterNetX - Juergen >>> Gotteswinter wrote: >>>> >>>> Am 17.08.2016 um 10:54 schrieb Julien Cigar: >>>>> On Wed, Aug 17, 2016 at 09:25:30AM +0200, InterNetX - Juergen >>>>> Gotteswinter wrote: >>>>>> >>>>>> Am 11.08.2016 um 11:24 schrieb Borja Marcos: >>>>>>>> On 11 Aug 2016, at 11:10, Julien Cigar >>>>>>> > wrote: >>>>>>>> >>>>>>>> As I said in a previous post I tested the zfs send/receive >>>>>>>> approach (with >>>>>>>> zrep) and it works (more or less) perfectly.. so I concur in >>>>>>>> all what you >>>>>>>> said, especially about off-site replicate and synchronous >>>>>>>> replication. >>>>>>>> >>>>>>>> Out of curiosity I'm also testing a ZFS + iSCSI + CARP at the >>>>>>>> moment, >>>>>>>> I'm in the early tests, haven't done any heavy writes yet, but >>>>>>>> ATM it >>>>>>>> works as expected, I havent' managed to corrupt the zpool. >>>>>>> I must be too old school, but I don’t quite like the idea of >>>>>>> using an essentially unreliable transport >>>>>>> (Ethernet) for low-level filesystem operations. >>>>>>> >>>>>>> In case something went wrong, that approach could risk >>>>>>> corrupting a pool. Although, frankly, >>>>>>> ZFS is extremely resilient. One of mine even survived a SAS HBA >>>>>>> problem that caused some >>>>>>> silent corruption. >>>>>> try dual split import :D i mean, zpool -f import on 2 machines >>>>>> hooked up >>>>>> to the same disk chassis. >>>>> Yes this is the first thing on the list to avoid .. :) >>>>> >>>>> I'm still busy to test the whole setup here, including the >>>>> MASTER -> BACKUP failover script (CARP), but I think you can prevent >>>>> that thanks to: >>>>> >>>>> - As long as ctld is running on the BACKUP the disks are locked >>>>> and you can't import the pool (even with -f) for ex (filer2 is the >>>>> BACKUP): >>>>> https://gist.github.com/silenius/f9536e081d473ba4fddd50f59c56b58f >>>>> >>>>> - The shared pool should not be mounted at boot, and you should ensure >>>>> that the failover script is not executed during boot time too: this is >>>>> to handle the case wherein both machines turn off and/or re-ignite at >>>>> the same time. Indeed, the CARP interface can "flip" it's status >>>>> if both >>>>> machines are powered on at the same time, for ex: >>>>> https://gist.github.com/silenius/344c3e998a1889f988fdfc3ceba57aaf and >>>>> you will have a split-brain scenario >>>>> >>>>> - Sometimes you'll need to reboot the MASTER for some $reasons >>>>> (freebsd-update, etc) and the MASTER -> BACKUP switch should not >>>>> happen, this can be handled with a trigger file or something like that >>>>> >>>>> - I've still have to check if the order is OK, but I think that as >>>>> long >>>>> as you shutdown the replication interface and that you adapt the >>>>> advskew (including the config file) of the CARP interface before the >>>>> zpool import -f in the failover script you can be relatively confident >>>>> that nothing will be written on the iSCSI targets >>>>> >>>>> - A zpool scrub should be run at regular intervals >>>>> >>>>> This is my MASTER -> BACKUP CARP script ATM >>>>> https://gist.github.com/silenius/7f6ee8030eb6b923affb655a259bfef7 >>>>> >>>>> Julien >>>>> >>>> 100€ question without detailed looking at that script. yes from a first >>>> view its super simple, but: why are solutions like rsf-1 such more >>>> powerful / featurerich. Theres a reason for, which is that they try to >>>> cover every possible situation (which makes more than sense for this). >>> I've never used "rsf-1" so I can't say much more about it, but I have >>> no doubts about it's ability to handle "complex situations", where >>> multiple nodes / networks are involved. >>> >>>> That script works for sure, within very limited cases imho >>>> >>>>>> kaboom, really ugly kaboom. thats what is very likely to happen >>>>>> sooner >>>>>> or later especially when it comes to homegrown automatism solutions. >>>>>> even the commercial parts where much more time/work goes into such >>>>>> solutions fail in a regular manner >>>>>> >>>>>>> The advantage of ZFS send/receive of datasets is, however, that >>>>>>> you can consider it >>>>>>> essentially atomic. A transport corruption should not cause >>>>>>> trouble (apart from a failed >>>>>>> "zfs receive") and with snapshot retention you can even roll >>>>>>> back. You can’t roll back >>>>>>> zpool replications :) >>>>>>> >>>>>>> ZFS receive does a lot of sanity checks as well. As long as your >>>>>>> zfs receive doesn’t involve a rollback >>>>>>> to the latest snapshot, it won’t destroy anything by mistake. >>>>>>> Just make sure that your replica datasets >>>>>>> aren’t mounted and zfs receive won’t complain. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Borja. >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>> To unsubscribe, send any mail to >>>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>>> " >>>>>>> >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org mailing list >>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>> To unsubscribe, send any mail to >>>>>> "freebsd-fs-unsubscribe@freebsd.org >>>>>> " >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >> "