From owner-freebsd-stable@FreeBSD.ORG  Sun Jul  3 15:55:13 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 964371065672;
	Sun,  3 Jul 2011 15:55:13 +0000 (UTC)
	(envelope-from to.my.trociny@gmail.com)
Received: from mail-fx0-f44.google.com (mail-fx0-f44.google.com
	[209.85.161.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 68CCA8FC0A;
	Sun,  3 Jul 2011 15:55:11 +0000 (UTC)
Received: by fxe6 with SMTP id 6so3726703fxe.17
	for <multiple recipients>; Sun, 03 Jul 2011 08:55:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to
	:message-id:user-agent:mime-version:content-type;
	bh=BHkEFEnVDPkt4PRx84BrgGWVGlDajdmIm9n9Gt8ZN+Y=;
	b=V+zgVwQ3coNeybuuILS6DrrtV00a3v0Ni32Ei3jzMHvriteQfMYBzOJpqp46mNZ0mm
	G2sENPyCWCRqrG8easO4F0Z5UD138Fl316Gz4MRsQ/Qzt/Fjklp7e+/AFv4tdK6NIvAd
	zDCx7aZ+kuey6knIRKSJWC1inhS/Ib/Be9Nz8=
Received: by 10.223.55.8 with SMTP id s8mr8028415fag.141.1309708511134;
	Sun, 03 Jul 2011 08:55:11 -0700 (PDT)
Received: from localhost ([95.69.173.122])
	by mx.google.com with ESMTPS id k26sm3875893fak.0.2011.07.03.08.55.08
	(version=TLSv1/SSLv3 cipher=OTHER);
	Sun, 03 Jul 2011 08:55:08 -0700 (PDT)
From: Mikolaj Golub <trociny@freebsd.org>
To: Timothy Smith <tts@personalmis.com>
References: <BANLkTi==ctVw1HpGkw-8QG68abCg-1Vp9g@mail.gmail.com>
	<8639ioadji.fsf@kopusha.home.net>
	<CAAemB=6xnkTAitfuXThrtXdKjXDSw6fiiZg=7AonTBOVtxWsMA@mail.gmail.com>
X-Comment-To: Timothy Smith
Sender: Mikolaj Golub <to.my.trociny@gmail.com>
Date: Sun, 03 Jul 2011 18:55:06 +0300
In-Reply-To: <CAAemB=6xnkTAitfuXThrtXdKjXDSw6fiiZg=7AonTBOVtxWsMA@mail.gmail.com>
	(Timothy Smith's message of "Sat, 2 Jul 2011 14:43:15 -0700")
Message-ID: <861uy7xsth.fsf@kopusha.home.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Pawel Jakub Dawidek <pjd@freebsd.org>, freebsd-stable@freebsd.org
Subject: Re: HAST + ZFS: no action on drive failure
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2011 15:55:13 -0000


On Sat, 2 Jul 2011 14:43:15 -0700 Timothy Smith wrote:

 TS> Hello Mikolaj,

 TS> So, just to be clear, if a local drive fails in my pool, but the
 TS> corresponding remote drive remains available, then hastd will both write to
 TS> and read from the remote drive? That's really very cool!

Yes.

 TS> I looked more closely at the hastd(8) man page. There is some indication of
 TS> what you say, but not so clear:

 TS> "Read operations (BIO_READ) are handled locally unless I/O error occurs or local
 TS> version of the data is not up-to-date yet (synchronization is in progress)."

This is about READ operations, and for WRITE we have just above:

     Every write, delete and flush operation (BIO_WRITE,
     BIO_DELETE, BIO_FLUSH) is send to local component and synchronously
     replicated to the remote (secondary) node if it is available.

There might be things that should be improved in documetation but I don't feel
capable to do this :-)

 TS> Perhaps this can be modified a bit? Adding, "or the local disk is
 TS> unavailable. In such a case, the I/O operation will be handled by the remote
 TS> resource."

 TS> It does makes sense however, since HAST is base on the idea of raid. This
 TS> feature increases the redundancy of the system greatly. My boss will  be
 TS> very impressed, as am I!

 TS> I did notice however that when the pulled drive is reinserted, I need to
 TS> change the associated hast resource to init, then back to primary to allow
 TS> hastd to once again use it (perhaps the same if the secondary drive is
 TS> failed?). Unless it will do this on it's own after some time? I did not wait
 TS> more than a few minutes. But this is easy enough to script or to monitor the
 TS> log and present a notification to admin at such a time.

When you are reinserting the drive the resource should be in init state.

Remember, some data was updated on secondary only, so the right sequence of
operations could be:

1) Failover (switch primary to init and secondary to primary).

2) Fix the disk issue.

3) If this is a new drive, recreate HAST metadata on it with hastctl utility.

4) Switch the repaired resource to secondary and wait until the new primary
connects to it and updates metadata. After this synchronization is started.

5) You can switch to the previous primary before the synchronization is
complete -- it will continue in right direction, but then you should expect
performance degradation until the synchronization is complete -- the READ
requests will go to remote node. So it might be better to wait until the
synchronization is complete before switching back.

-- 
Mikolaj Golub