From owner-freebsd-cluster@FreeBSD.ORG Fri Feb 18 09:26:20 2011 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 152F3106564A for ; Fri, 18 Feb 2011 09:26:20 +0000 (UTC) (envelope-from linuxmail@4lin.net) Received: from mail.4lin.net (mail.4lin.net [46.4.210.97]) by mx1.freebsd.org (Postfix) with ESMTP id C6EE98FC13 for ; Fri, 18 Feb 2011 09:26:19 +0000 (UTC) Received: from localhost (angelica.4lin.net [127.0.0.1]) by mail.4lin.net (Postfix) with ESMTP id 8C58A292D0 for ; Fri, 18 Feb 2011 10:11:03 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.4lin.net Received: from mail.4lin.net ([127.0.0.1]) by localhost (mail.4lin.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tf2mJKsjoJ73 for ; Fri, 18 Feb 2011 10:11:00 +0100 (CET) Received: from [IPv6:2a01:4f8:130:6021::83] (unknown [IPv6:2a01:4f8:130:6021::83]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.4lin.net (Postfix) with ESMTPSA id 23FBA2888D for ; Fri, 18 Feb 2011 10:11:00 +0100 (CET) From: Denny Schierz To: freebsd-cluster@freebsd.org Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-zk4SNcX5D00TLQZXB65v" Date: Fri, 18 Feb 2011 10:08:10 +0100 Message-ID: <1298020090.18890.1684.camel@pcdenny> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Subject: Build failover ZFS, like HA-Storage from Solaris X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2011 09:26:20 -0000 --=-zk4SNcX5D00TLQZXB65v Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable hi, we're searching for an alternative failover solution with ZFS. We have two nodes connected to _one_ SAS Storage (so, no DRBD or anything possible) and we want, to export zfs volumes via ISCSI to other systems. If the primary node fails, the ZFS Pool (which also does 8 times raidz2) has to be move to the secondary node. If that is done, the global IP (carp?) switch to the new node. It works with HA-Storage from Solaris 10, but the license are too expensive on none sun-hardware :-/ for our university. Any solutions? --=-zk4SNcX5D00TLQZXB65v Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEABECAAYFAk1eNvoACgkQKlzhkqt9P+DcTACgj3FXsRRQX6rS8xlPNg7GP9yl aX0AoJ+JFPbpRN8BHEdFHpbLpbYB5ViQ =xMt5 -----END PGP SIGNATURE----- --=-zk4SNcX5D00TLQZXB65v-- From owner-freebsd-cluster@FreeBSD.ORG Fri Feb 18 21:01:40 2011 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BF1C10656CA for ; Fri, 18 Feb 2011 21:01:40 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id BDC928FC1F for ; Fri, 18 Feb 2011 21:01:39 +0000 (UTC) Received: by gyh4 with SMTP id 4so306632gyh.13 for ; Fri, 18 Feb 2011 13:01:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=+5CifwGUIQNJc2MhxaTa2Gfhzz+SY4ODgw/7Oefkqsg=; b=X9yK/B6VeKCHbx4TsLzY8tMF4JBQJhwdCnQf3hwex+ftvIN/0vKbUfMlFxlKqaTlfH FIsGjilcQtCGi9kY8AprBgL2fg8YyA5s7ZlkmKNxNfWxuRD3WCochg18CnHMxDwKy4wp H/l36suipWhNAO+h+DBRzH0QC3gVDlWXE+VCw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=RFZjwjP4WJZXZ/Zue66DtLzBR5gmhkXfFEE0a4V3Grid68P/Xguv0f8gkd/uFt9Vl4 X7VzZNPdE3alAVdImhX+GjYbUgY+VbZTdmEx8i1E4B8pS0aFE/sfhPG3t0p7L7VL/gAl am1bt9f/trAEkvzMb5tUJqNpqczU4YI8Ojyoo= MIME-Version: 1.0 Received: by 10.91.209.6 with SMTP id l6mr1681741agq.42.1298060999983; Fri, 18 Feb 2011 12:29:59 -0800 (PST) Received: by 10.90.32.20 with HTTP; Fri, 18 Feb 2011 12:29:59 -0800 (PST) In-Reply-To: <1298020090.18890.1684.camel@pcdenny> References: <1298020090.18890.1684.camel@pcdenny> Date: Fri, 18 Feb 2011 12:29:59 -0800 Message-ID: From: Freddie Cash To: Denny Schierz Content-Type: text/plain; charset=UTF-8 Cc: freebsd-cluster@freebsd.org Subject: Re: Build failover ZFS, like HA-Storage from Solaris X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2011 21:01:40 -0000 On Fri, Feb 18, 2011 at 1:08 AM, Denny Schierz wrote: > we're searching for an alternative failover solution with ZFS. We have > two nodes connected to _one_ SAS Storage (so, no DRBD or anything > possible) and we want, to export zfs volumes via ISCSI to other systems. > > If the primary node fails, the ZFS Pool (which also does 8 times raidz2) > has to be move to the secondary node. If that is done, the global IP > (carp?) switch to the new node. > > It works with HA-Storage from Solaris 10, but the license are too > expensive on none sun-hardware :-/ for our university. > > Any solutions? FreeBSD + ZFS + HAST + CARP + devd will do what you want. You create a separate hast device for each physical harddrive in the system. That "mirrors" the drives between the two servers. Then you create the ZFS pool on top of the hast devices (use /dev/hast/* instead of /dev/da*). Then you configure CARP to provide the shared virtual IP between the two systems. Configure your iSCSI setup to use this IP. Then you write some scripts to handle the orderly tear down of the ZFS pool on one system, and to handle the orderly importing of the pool on the other system. And you hook those scripts into devd, so that when CARP advertises that it is switching which system is master, then ZFS and iSCSI switches with it. Michael Lucas took some scripts I wrote to do the above and made them a better. You can find a lot of information on doing the above here: http://blather.michaelwlucas.com/?p=241 I've used the above in a VM test setup using a ZFS pool with one raidz1 vdev and iSCSI. Works nicely. You have to make sure your iSCSI clients can handle a small window of inaccessibility while the ZFS pool imports on the slave system. We're planning on moving this to real hardware (24-hot swap drive bays in each server) as soon as it arrives (hopefully next week). -- Freddie Cash fjwcash@gmail.com From owner-freebsd-cluster@FreeBSD.ORG Fri Feb 18 23:45:26 2011 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8AB1E106564A for ; Fri, 18 Feb 2011 23:45:26 +0000 (UTC) (envelope-from linuxmail@4lin.net) Received: from mail.4lin.net (mail.4lin.net [46.4.210.97]) by mx1.freebsd.org (Postfix) with ESMTP id 48E578FC08 for ; Fri, 18 Feb 2011 23:45:25 +0000 (UTC) Received: from localhost (angelica.4lin.net [127.0.0.1]) by mail.4lin.net (Postfix) with ESMTP id 95F58293C1 for ; Sat, 19 Feb 2011 00:48:05 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.4lin.net Received: from mail.4lin.net ([127.0.0.1]) by localhost (mail.4lin.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w_gX8avzoIK3 for ; Sat, 19 Feb 2011 00:48:02 +0100 (CET) Received: from mac.fritz.box (ip-92-50-80-192.unitymediagroup.de [92.50.80.192]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.4lin.net (Postfix) with ESMTPSA id 16ACA2900B for ; Sat, 19 Feb 2011 00:48:02 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) From: Denny Schierz In-Reply-To: Date: Sat, 19 Feb 2011 00:44:56 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1298020090.18890.1684.camel@pcdenny> To: freebsd-cluster@freebsd.org X-Mailer: Apple Mail (2.1082) Subject: Re: Build failover ZFS, like HA-Storage from Solaris X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2011 23:45:26 -0000 hi, Am 18.02.2011 um 21:29 schrieb Freddie Cash: > You create a separate hast device for each physical harddrive in the > system. That "mirrors" the drives between the two servers. why should I mirror the drives, while both systems are connected to one = sas storage? Both hosts can see all drives at the same time, via the SAS = HBA. cu denny= From owner-freebsd-cluster@FreeBSD.ORG Sat Feb 19 01:39:11 2011 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D666D106566C for ; Sat, 19 Feb 2011 01:39:11 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9018B8FC0A for ; Sat, 19 Feb 2011 01:39:11 +0000 (UTC) Received: by gyh4 with SMTP id 4so382095gyh.13 for ; Fri, 18 Feb 2011 17:39:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=/vW0u86i07wi+cHLKeahkvlChQ33fp4E2ID1Gc2+DRQ=; b=mUjhppLjdT6wQTax263Ty5Dp2o/Dy809p6/Yr8HozxAnoRq2ZmK/m6wrxQjYLketx1 CIhxJnYSTJ1U23pRmXc3pQ5dlmavftXkch94FRao9qdtBLN7NjcMHGbliPEdcr5igFvv 11FogwRktCxoeVm99XWWM5OKB9Fr6hnTyIncE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=gkLD2AgSuHzQgojZyuJnW30TAEcrnQdSVB3MnjV/gv/efmuRVJKd43I/M2FNlagx/f R3IRxtTxhIxMP3NuBBGNIbHuDvNxrnfy3toOEO7N2l9va9UkGqik1c+U367h3juHz0oo CY0oepf0vZkb4MnC/dTypRA7AS+ekNI4I1300= MIME-Version: 1.0 Received: by 10.90.18.5 with SMTP id 5mr1960070agr.204.1298079550771; Fri, 18 Feb 2011 17:39:10 -0800 (PST) Received: by 10.90.32.20 with HTTP; Fri, 18 Feb 2011 17:39:10 -0800 (PST) In-Reply-To: References: <1298020090.18890.1684.camel@pcdenny> Date: Fri, 18 Feb 2011 17:39:10 -0800 Message-ID: From: Freddie Cash To: Denny Schierz Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-cluster@freebsd.org Subject: Re: Build failover ZFS, like HA-Storage from Solaris X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Feb 2011 01:39:11 -0000 On Fri, Feb 18, 2011 at 3:44 PM, Denny Schierz wrote: > Am 18.02.2011 um 21:29 schrieb Freddie Cash: > >> You create a separate hast device for each physical harddrive in the >> system. =C2=A0That "mirrors" the drives between the two servers. > > why should I mirror the drives, while both systems are connected to one s= as storage? Both hosts can see all drives at the same time, via the SAS HBA= . Sorry, re-reading your original post I see that I mixed up the layers. :) To make sure I understand completely, you have: [ SAN box ] ---- iSCSI ---- [ node 1 using ZFS ] \-------- [ node 2 using ZFS ] Correct? And you want to fail-over services from node 1 to node 2? You don't need HAST in that situation, as the SAN handles making the storage available to both. My bad. But you can still use CARP and devd. CARP provides the shared IP so that other systems won't notice the switch over. And devd provides the hooks into your custom scripts so that when CARP switches from node 1 to node 2, you export the pool on node 1, and import the pool on node 2. --=20 Freddie Cash fjwcash@gmail.com