From owner-freebsd-fs@FreeBSD.ORG  Sun Mar 17 01:00:08 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 842E8886
 for <freebsd-fs@freebsd.org>; Sun, 17 Mar 2013 01:00:08 +0000 (UTC)
 (envelope-from freebsd@deman.com)
Received: from plato.corp.nas.com (plato.corp.nas.com [66.114.32.138])
 by mx1.freebsd.org (Postfix) with ESMTP id 4A523A81
 for <freebsd-fs@freebsd.org>; Sun, 17 Mar 2013 01:00:07 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by plato.corp.nas.com (Postfix) with ESMTP id 137B9133BB6AE;
 Sat, 16 Mar 2013 18:00:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at corp.nas.com
Received: from plato.corp.nas.com ([127.0.0.1])
 by localhost (plato.corp.nas.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id OjWAJ6Ff0nzp; Sat, 16 Mar 2013 18:00:05 -0700 (PDT)
Received: from [192.168.113.203]
 (75-151-97-138-washington.hfc.comcastbusiness.net [75.151.97.138])
 by plato.corp.nas.com (Postfix) with ESMTPSA id EF1D6133BB6A3;
 Sat, 16 Mar 2013 18:00:04 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: FreeBSD & no single point of failure file service
From: Michael DeMan <freebsd@deman.com>
In-Reply-To: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com>
Date: Sat, 16 Mar 2013 18:00:03 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <CE765321-428B-4554-8F85-482C80D56BD5@deman.com>
References: <CABXB=RSer_euyLds_X5-ZrwdStCYZVv=wMCJY=mmbYGTN8c1WQ@mail.gmail.com>
 <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com>
To: J David <j.david.lists@gmail.com>,
 freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.1499)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Mar 2013 01:00:08 -0000

Errata..

--- by 'out of band', for my case simply another ethernet link that by =
convention is physically separate from the 'primary' storage ethernet.  =
Good enough for my use case.
--- On (F) below - meant 'if neither head unit can decide whether it =
should be the master or not' - then they both deny services.  Better =
that bugs cause outages rather than data loss?
- Mike

On Mar 16, 2013, at 5:48 PM, Michael DeMan <freebsd@deman.com> wrote:

> Hi David,
>=20
> We are looking at the exact same thing -  let me know what you find =
out.
>=20
> I think it is pretty obvious that ixsystems.com has this figured out =
along with all the tricky details - but for the particular company I am =
looking to implement this for - vendors that can't show their prices for =
products are vendors we have to stay away from because not showing =
pricing means it starts at $100K minimum + giant annual support fees.  =
In all honesty some kind of 3rd party designed solution with only =
minimal support would be fine for us, but I don't think that is their =
regular market.
>=20
> I was thinking to maybe test something out like:
>=20
> #1.  A couple old Dell 2970s head units with LSI cards.
> #2.  One dual-port SAS chassis.
> #3.  Figure out what needs to happen with devd+carp in order for the =
head end units to REALIBLY know when to export/import ZFS and when to =
advertise NFS/iSCSI, etc.
>=20
> A couple catches with this of course is that for #3 there could be =
some kind of unexpected heartbeat failure between the two head end units =
where they both decide the other is gone and both become masters - which =
would probably result in catastrophic corruption on the file system.
>=20
> SuperMicro does have that one chassis that accepts lots of drives and =
two custom motherboards that are linked internally via 10GB - I think =
ixsystems uses that.  So in theory the edge case of the accidental =
'master/master' configuration is helped by hardware.  By the same token =
I am skeptical of having both head end units in a single chassis.  =
Pardon me for being paranoid.
>=20
> So what I came to the conclusion with #3 for home-brew design was that =
devd+carp is great overall, but there needs to be an additional =
out-of-band confirmation between the two head end units.
>=20
>=20
> Scenario is:
>=20
> #1-#2 above.
>=20
> The head units are wired up such that they are providing storage and =
also running (hsrp/carp/vrrp) on their main link that they vend their =
storage resources off to the network.
>=20
> They are also connected via another channel - this could be a x-over =
ethernet link, serial cable - or in my case simply re-use the dedicated =
ethernet port that is used for management-only access to the servers and =
is already out of band.
>=20
> If a network engineer comes and tweaks around the NFS/iSCSI switches =
or something else, makes a mistake, and that link between the two head =
end units is broken - both machines are going to want to be masters, and =
write directly to whatever shared physical storage they have?
>=20
> This is where the additional link between the head units comes in.  =
Storage delivery side of things has 'split brain' - head end units can =
not talk to each other, but may be able to talk to some (or all) clients =
that use their services.  With current design for ZFS v28 there can be =
only one master for utilizing the physical attached storage from the =
head ends - otherwise small problem that could have been better fixed by =
just having an outage turns into a potential loss of all the data =
everywhere?
>=20
> So basically failover between the head units works as follows:
>=20
> A) I am secondary on the big storage ethernet link and the primary has =
timed out on telling me it is still alive.
> B) Confirm on the out-of-band link whether the primary is still up or =
not, and what it thinks the state of affairs may be.  (optimize by =
starting this check 1st time primary heartbeat is lost - not after =
timeout?)
> C) If the primary thinks it has lost connectivity to the clients then =
confirm it is also not longer acting as a primary for the physical =
storage, and I should attach the storage and try to become the primary.
> D) ??? If the primary thinks it still can connect to the clients, then =
what?
> E) =46rom (C) above - lets be sure to avoid a flapping situation.
> F) No matter what, if the state of which head end unit should be the =
'master' (vending NFS/iSCSI and also handling the physical storage) - =
then both units should deny services?
>=20
>=20
> Longer e-mail than I expected.  Thanks for the post - it made me think =
about things.  Probably there are huge problems in my above synopsis.  =
The hard work is always in the details, not the design?
> - Mike
>=20
>=20
>=20
>=20
>=20
>=20
>=20
> On Mar 9, 2013, at 3:40 PM, J David <j.david.lists@gmail.com> wrote:
>=20
>> Hello,
>>=20
>> I would like to build a file server with no single point of failure, =
and I
>> would like to use FreeBSD and ZFS to do it.
>>=20
>> The hardware configuration we're looking at would be two servers with =
4x
>> SAS connectors and two SAS JBOD shelves.  Both servers would have =
dual
>> connections to each shelf.
>>=20
>> The disks would be configured in mirrored pairs, with one disk from =
each
>> pair in each shelf.  One pair for ZIL, one or two pairs for L2ARC, =
and the
>> rest for ZFS data.
>>=20
>> We would be shooting for an active/standby configuration where the =
standby
>> system is booted up but doesn't touch the bus unless/until it detects =
CARP
>> failover from the master via devd, then it does a zpool import.  =
(Even so
>> all TCP sessions for NFS and iSCSI will get reset, which seems =
unavoidable
>> but recoverable.)
>>=20
>> This will be really expensive to test, so I would be very interested =
if
>> anyone has feedback on how FreeBSD will handle this type of =
shared-SAS
>> hardware configuration.
>>=20
>> Thanks for any advice!
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20