From owner-freebsd-net@FreeBSD.ORG  Mon Aug  8 15:12:08 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 78C721065673
	for <freebsd-net@freebsd.org>; Mon,  8 Aug 2011 15:12:07 +0000 (UTC)
	(envelope-from ferdinand.goldmann@jku.at)
Received: from emailsecure.uni-linz.ac.at (emailsecure.uni-linz.ac.at
	[140.78.3.66]) by mx1.freebsd.org (Postfix) with ESMTP id 453928FC14
	for <freebsd-net@freebsd.org>; Mon,  8 Aug 2011 15:12:07 +0000 (UTC)
Received: from <hidden>
From: Ferdinand Goldmann <ferdinand.goldmann@jku.at>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Mon, 8 Aug 2011 16:54:10 +0200
Message-Id: <A369574D-BB1B-4330-80F5-084B7D92CBD0@jku.at>
To: freebsd-net@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Subject: Problem using CARP + HAST ...
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Aug 2011 15:12:09 -0000

Hi!

I am trying to create a common resource pool for a certain application =
using
CARP/HAST as described in [1]. However while testing my setup I ran into =
a
problem which I don't know how to fix or work around:

If I shut down only the carp interface on the master (ifconfig carp0 =
down),
the slave will take note of this, make his carp interface the master and
mount the HAST storage using a script called by devd. Everything fine so =
far. BUT:

If, however, I completely shut down the masters network connection =
(using "shut" on
the switchport), the carp interface on the slave will still switch to =
master.=20
But the script for making the HAST storage primary will just hang =
forever:

root  46841  0.0  0.6  3628  1524  ??  S     4:21PM   0:00.08 /bin/sh =
/opt/bin/carp-hast-switch master
root  47043  0.0  2.6 42228  6580  ??  S     4:22PM   0:00.03 hastd: =
hast0 (secondary) (hastd)

Seemingly, this is because the hastd daemons on master and slave are =
unable to=20
communicate. So the script waits forever for the secondary device to go =
away... :

           # Wait for any "hastd secondary" processes to stop
           for disk in ${resources}; do
           while $( pgrep -lf "hastd: ${disk} \(secondary\)" > /dev/null =
2>&1 ); do
                sleep 1
                done

Im a bit puzzled. Is there a way for hastd to make himself the master in =
case of a timeout
or such? Because in normal operation, whenever the carp interface fails, =
the underlying=20
infrastructure will most likely be down as well.

Even if I'd connect the two machines over an extra port for hastd,
this problem would still occur if I pull the plug on the master. I =
suppose the
slave making himself the master will lead to a split-brain condition ... =
but is there
any way for hastd to handle this automagically? Because otherwise, it =
won't be much
good for a scenario like the above. :-/

Can anybody shed light on this please?
TIA & best regards,
Ferdinand


[1] http://www.freebsd.org/doc/en/books/handbook/disks-hast.html=