From owner-freebsd-questions@FreeBSD.ORG  Wed Feb 20 20:55:01 2013
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A57B8FD6
 for <freebsd-questions@freebsd.org>; Wed, 20 Feb 2013 20:55:01 +0000 (UTC)
 (envelope-from cms@balius.com)
Received: from in.mx.balius.com (68-189-209-140.static.ftwo.tx.charter.com
 [68.189.209.140]) by mx1.freebsd.org (Postfix) with ESMTP id 881AC63D
 for <freebsd-questions@freebsd.org>; Wed, 20 Feb 2013 20:55:01 +0000 (UTC)
Received: from [10.0.0.82] (68-189-209-142.static.ftwo.tx.charter.com
 [68.189.209.142])
 by in.mx.balius.com (Postfix) with ESMTPSA id 4B4DF78887;
 Wed, 20 Feb 2013 20:54:55 +0000 (GMT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1283)
Subject: HAST - detect failure and restore avoiding an outage?
From: Chad M Stewart <cms@balius.com>
X-Mac: It Just Works!
Date: Wed, 20 Feb 2013 14:54:54 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <E3C8C9A2-712E-4925-995A-0471CCD3515B@balius.com>
To: freebsd-questions@freebsd.org
X-Mailer: Apple Mail (2.1283)
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 20:55:01 -0000


I built a 2 node cluster for testing HAST out.  Each node is an older HP =
server with 6 scsi disks.  Each disk is configured as RAID 0 in the raid =
controller, I wanted a JBOD to be presented to FreeBSD 9.1 x86.  I =
allocated a single disk for the OS, and the other 5 disks for HAST.

node2# zpool status
  pool: scsi-san
 state: ONLINE
  scan: scrub repaired 0 in 0h27m with 0 errors on Tue Feb 19 17:38:55 =
2013
config:

	NAME            STATE     READ WRITE CKSUM
	scsi-san        ONLINE       0     0     0
	  raidz1-0      ONLINE       0     0     0
	    hast/disk1  ONLINE       0     0     0
	    hast/disk2  ONLINE       0     0     0
	    hast/disk3  ONLINE       0     0     0
	    hast/disk4  ONLINE       0     0     0
	    hast/disk5  ONLINE       0     0     0


  pool: zroot
 state: ONLINE
  scan: none requested
config:

	NAME         STATE     READ WRITE CKSUM
	zroot        ONLINE       0     0     0
	  gpt/disk0  ONLINE       0     0     0


Yesterday I physically pulled disk2 (from node1) out to simulate a =
failure.  ZFS didn't see anything wrong, expected.  hastd did see the =
problem, expected.  'hastctl status' didn't show me anything unusual or =
indicate any problem that I could see on either node.  I saw hastd =
reporting problems in the logs, otherwise everything looked fine.  Is =
there a way to detect a failed disk from hastd besides the log?  =
camcontrol showed the disk had failed and obviously I'll be monitoring =
using it as well.


For recovery I installed a new disk in the same slot.  To protect the =
data reliability the safest way I can think of to recover is to do the =
following:

1 - node1 - stop the apps
2 - node1 - export pool
3 - node1 - hastctl create disk2
4 - node1 - for D in 1 2 3 4 5; do hastctl role secondary;done
5 - node2 - for D in 1 2 3 4 5; do hastctl role primary;done
6 - node2 - import pool
7 - node2 - start the apps

At step 5 the hastd will start to resynchronize node2:disk2 -> =
node1:disk2.  I've been trying to think of a way to re-establish the =
mirror without having to restart/move the pool _and_ not pose additional =
risk of data loss.

To avoid an application outage I suppose the following would work:

1 - insert new disk in node1
2 - hastctl role init disk2
3 - hastctl create disk2
4 - hastctl role primary disk2

At that point ZFS would have seen a disk failure and then started =
resilvering the pool. No application outage, but now only 4 disks =
contain the data (assuming changing bits on the pool, not static =
content).  Using the previous steps application outage, but a healthy =
pool is maintained always.

Is there another scenario I'm thinking of where both data health and no =
application outage could be achieved?


Regards,
Chad