From owner-freebsd-fs@FreeBSD.ORG  Thu Jun  6 22:51:18 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 2EC46CEE
 for <freebsd-fs@freebsd.org>; Thu,  6 Jun 2013 22:51:18 +0000 (UTC)
 (envelope-from mxb@alumni.chalmers.se)
Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com
 [209.85.217.177])
 by mx1.freebsd.org (Postfix) with ESMTP id A6FA519E2
 for <freebsd-fs@freebsd.org>; Thu,  6 Jun 2013 22:51:17 +0000 (UTC)
Received: by mail-lb0-f177.google.com with SMTP id 10so3180488lbf.22
 for <freebsd-fs@freebsd.org>; Thu, 06 Jun 2013 15:51:16 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=content-type:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=Z/ULr2H7rJev4hAI1YZ+HOkoMoiXVWHSxrhL2bNls7o=;
 b=AWIRQRLW1pkjT2Dujt8J8s6qMLBrLo9CCUGAmFXXY12RzgGqO6+g8J0oYyR+fMr69F
 DSW3nJXDmwmqc9P5Q6GoB1He4k6lRBFXR7E2v3BKXu1y3Bd4bt7LJzx2dUoT1MpV+aQo
 VCfZojUN/BtepLA0UFfTLsujb1eCaJBHNwOD0Tci+Vu1EEfAmPIhmh9O8VTwW8+jq6OL
 GIx2fwZLzut1rWqXc1aBSdJeD4tU9g4ADn0sdT7gQaF1rECRw8iLhHdtb9GqsN5kS15I
 zKuxesz8kXqjnuhWYNuCJbcnkNa8EXFn4HmgOUJIV1zsVjvTPPHfzrxrYr/hMx3foqd/
 uWuw==
X-Received: by 10.112.199.5 with SMTP id jg5mr863578lbc.67.1370559076115;
 Thu, 06 Jun 2013 15:51:16 -0700 (PDT)
Received: from grey.home.unixconn.com (h-75-17.a183.priv.bahnhof.se.
 [46.59.75.17])
 by mx.google.com with ESMTPSA id o6sm9908260laj.2.2013.06.06.15.51.14
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 06 Jun 2013 15:51:15 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: zpool export/import on failover - The pool metadata is corrupted
From: mxb <mxb@alumni.chalmers.se>
In-Reply-To: <20130606223911.GA45807@icarus.home.lan>
Date: Fri, 7 Jun 2013 00:51:14 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <C3FC39B3-D09F-4E73-9476-3BFC8B817278@alumni.chalmers.se>
References: <D7F099CB-855F-43F8-ACB5-094B93201B4B@alumni.chalmers.se>
 <CAKYr3zyPLpLau8xsv3fCkYrpJVzS0tXkyMn4E2aLz29EMBF9cA@mail.gmail.com>
 <016B635E-4EDC-4CDF-AC58-82AC39CBFF56@alumni.chalmers.se>
 <20130606223911.GA45807@icarus.home.lan>
To: Jeremy Chadwick <jdc@koitsu.org>
X-Mailer: Apple Mail (2.1503)
X-Gm-Message-State: ALoCoQk3BSJ3+FvXJ0V0T4FcH55bBaPJz4BKVYql/T+jOA/r2YcgcvyzO+Ge2Qb10K0jxhdIIp23
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 22:51:18 -0000


Sure, script is not perfects yet and does not handle many of stuff, but =
moving highlight from zpool import/export to the script itself not that
clever,as this works most of the time.

Question is WHY ZFS corrupts metadata then it should not. Sometimes.
I'v seen stale of zpool then manually importing/exporting pool.


On 7 jun 2013, at 00:39, Jeremy Chadwick <jdc@koitsu.org> wrote:

> On Fri, Jun 07, 2013 at 12:12:39AM +0200, mxb wrote:
>>=20
>> Then MASTER goes down, CARP on the second node goes MASTER =
(devd.conf, and script for lifting):
>>=20
>> root@nfs2:/root # cat /etc/devd.conf
>>=20
>>=20
>> notify 30 {
>> match "system"		"IFNET";
>> match "subsystem"	"carp0";
>> match "type"		"LINK_UP";
>> action "/etc/zfs_switch.sh active";
>> };
>>=20
>> notify 30 {
>> match "system"          "IFNET";
>> match "subsystem"       "carp0";
>> match "type"            "LINK_DOWN";
>> action "/etc/zfs_switch.sh backup";
>> };
>>=20
>> root@nfs2:/root # cat /etc/zfs_switch.sh
>> #!/bin/sh
>>=20
>> DATE=3D`date +%Y%m%d`
>> HOSTNAME=3D`hostname`
>>=20
>> ZFS_POOL=3D"jbod"
>>=20
>>=20
>> case $1 in
>> 	active)
>> 		echo "Switching to ACTIVE and importing ZFS" | mail -s =
''$DATE': '$HOSTNAME' switching to ACTIVE' root
>> 		sleep 10
>> 		/sbin/zpool import -f jbod
>> 		/etc/rc.d/mountd restart
>> 		/etc/rc.d/nfsd restart
>> 		;;
>> 	backup)
>> 		echo "Switching to BACKUP and exporting ZFS" | mail -s =
''$DATE': '$HOSTNAME' switching to BACKUP' root
>> 		/sbin/zpool export jbod
>> 		/etc/rc.d/mountd restart
>>                /etc/rc.d/nfsd restart
>> 		;;
>> 	*)
>> 		exit 0
>> 		;;
>> esac
>>=20
>> This works, most of the time, but sometimes I'm forced to re-create =
pool. Those machines suppose to go into prod.
>> Loosing pool(and data inside it) stops me from deploy this setup.
>=20
> This script looks highly error-prone.  Hasty hasty...  :-)
>=20
> This script assumes that the "zpool" commands (import and export) =
always
> work/succeed; there is no exit code ($?) checking being used.
>=20
> Since this is run from within devd(8): where does stdout/stderr go to
> when running a program/script under devd(8)?  Does it effectively go
> to the bit bucket (/dev/null)?  If so, you'd never know if the import =
or
> export actually succeeded or not (the export sounds more likely to be
> the problem point).
>=20
> I imagine there would be some situations where the export would fail
> (some files on filesystems under pool "jbod" still in use), yet CARP =
is
> already blindly assuming everything will be fantastic.  Surprise.
>=20
> I also do not know if devd.conf(5) "action" commands spawn a sub-shell
> (/bin/sh) or not.  If they don't, you won't be able to use things =
like"
> 'action "/etc/zfs_switch.sh active >> /var/log/failover.log";'.  You
> would then need to implement the equivalent of logging within your
> zfs_switch.sh script.
>=20
> You may want to consider the -f flag to zpool import/export
> (particularly export).  However there are risks involved -- userland
> applications which have an fd/fh open on a file which is stored on a
> filesystem that has now completely disappeared can sometimes crash
> (segfault) or behave very oddly (100% CPU usage, etc.) depending on =
how
> they're designed.
>=20
> Basically what I'm trying to say is that devd(8) being used as a form =
of
> HA (high availability) and load balancing is not always possible.
> Real/true HA (especially with SANs) is often done very differently =
(now
> you know why it's often proprietary.  :-) )
>=20
> --=20
> | Jeremy Chadwick                                   jdc@koitsu.org |
> | UNIX Systems Administrator                http://jdc.koitsu.org/ |
> | Making life hard for others since 1977.             PGP 4BD6C0CB |
>=20