From owner-freebsd-fs@FreeBSD.ORG Fri Oct 4 02:39:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 896B8D6A for ; Fri, 4 Oct 2013 02:39:46 +0000 (UTC) (envelope-from ari@ish.com.au) Received: from fish.ish.com.au (eth5921.nsw.adsl.internode.on.net [59.167.240.32]) (using TLSv1.1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 468F72E8A for ; Fri, 4 Oct 2013 02:39:45 +0000 (UTC) Received: from ip-136.ish.com.au ([203.29.62.136]:64383) by fish.ish.com.au with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76) (envelope-from ) id 1VRvIo-0007yT-1v for freebsd-fs@freebsd.org; Fri, 04 Oct 2013 12:39:39 +1000 X-CTCH-RefID: str=0001.0A150201.524E2A6B.0005:SCFSTAT15613948, ss=1, re=-4.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 Message-ID: <524E2A69.9070009@ish.com.au> Date: Fri, 04 Oct 2013 12:39:37 +1000 From: Aristedes Maniatis User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: hast degraded References: <524E14B6.9040808@ish.com.au> In-Reply-To: <524E14B6.9040808@ish.com.au> X-Enigmail-Version: 1.5.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Oct 2013 02:39:46 -0000 On 4/10/13 11:07am, Aristedes Maniatis wrote: > ssd2 degraded primary /dev/da2 10.8.8.1 OK, I think I understand better what I'm looking at now. Degraded just means that the secondary is not running. The primary is actually mounting just fine, however all the metadata in the zpool is now corrupted so I cannot recover with zpool import -F This corruption is on both the master and slave, so I think I'm completely screwed at this point. My best guess is that the hast master became slave, but didn't properly export the zpool. The slave became the master and tried to write data into the pool in the other direction until the whole thing became hopelessly messed up. I can see the power of hast, but also that it requires way more attention to the details of failover than my simplistic approach gave it. There is at least one failure mode which increases the chance of corrupting the whole filesystem on both master and slave, as compared to a more simplistic rsync periodic sychronisation. Not so cheery, Ari -- --------------------------> Aristedes Maniatis ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A