FreeBSD Mail Archives

Date:      Sun, 19 Jun 2016 21:54:22 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Jordan Hubbard <jkh@ixsystems.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, Alexander Motin <mav@freebsd.org>
Subject:   Re: pNFS server Plan B
Message-ID:  <1996808572.159331289.1466387661988.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <D20C793E-A2FD-49F3-AD88-7C2FED5E7715@ixsystems.com>
References:  <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <D20C793E-A2FD-49F3-AD88-7C2FED5E7715@ixsystems.com>


Jordan Hubbard wrote:
> 
> > On Jun 13, 2016, at 3:28 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> > 
> > You may have already heard of Plan A, which sort of worked
> > and you could test by following the instructions here:
> > 
> > http://people.freebsd.org/~rmacklem/pnfs-setup.txt
> > 
> > However, it is very slow for metadata operations (everything other than
> > read/write) and I don't think it is very useful.
> 
> Hi guys,
> 
> I finally got a chance to catch up and bring up Rick’s pNFS setup on a couple
> of test machines.  He’s right, obviously - The “plan A” approach is a bit
> convoluted and not at all surprisingly slow.  With all of those transits
> twixt kernel and userland, not to mention glusterfs itself which has not
> really been tuned for our platform (there are a number of papers on this we
> probably haven’t even all read yet), we’re obviously still in the “first
> make it work” stage.
> 
> That said, I think there are probably more possible plans than just A and B
> here, and we should give the broader topic of “what does FreeBSD want to do
> in the Enterprise / Cloud computing space?" at least some consideration at
> the same time, since there are more than a few goals running in parallel
> here.
> 
> First, let’s talk about our story around clustered filesystems + associated
> command-and-control APIs in FreeBSD.  There is something of an embarrassment
> of riches in the industry at the moment - glusterfs, ceph, Hadoop HDFS,
> RiakCS, moose, etc.  All or most of them offer different pros and cons, and
> all offer more than just the ability to store files and scale “elastically”.
> They also have ReST APIs for configuring and monitoring the health of the
> cluster, some offer object as well as file storage, and Riak offers a
> distributed KVS for storing information *about* file objects in addition to
> the object themselves (and when your application involves storing and
> managing several million photos, for example, the idea of distributing the
> index as well as the files in a fault-tolerant fashion is also compelling).
> Some, if not most, of them are also far better supported under Linux than
> FreeBSD (I don’t think we even have a working ceph port yet).   I’m not
> saying we need to blindly follow the herds and do all the same things others
> are doing here, either, I’m just saying that it’s a much bigger problem
> space than simply “parallelizing NFS” and if we can kill multiple birds with
> one stone on the way to doing that, we should certainly consider doing so.
> 
> Why?  Because pNFS was first introduced as a draft RFC (RFC5661
> <https://datatracker.ietf.org/doc/rfc5661/>) in 2005.  The linux folks have
> been working on it
> <http://events.linuxfoundation.org/sites/events/files/slides/pnfs.pdf>; since
> 2006.  Ten years is a long time in this business, and when I raised the
> topic of pNFS at the recent SNIA DSI conference (where storage developers
> gather to talk about trends and things), the most prevalent reaction I got
> was “people are still using pNFS?!”   This is clearly one of those
> technologies that may still have some runway left, but it’s been rapidly
> overtaken by other approaches to solving more or less the same problems in
> coherent, distributed filesystem access and if we want to get mindshare for
> this, we should at least have an answer ready for the “why did you guys do
> pNFS that way rather than just shimming it on top of ${someNewerHotness}??”
> argument.   I’m not suggesting pNFS is dead - hell, even AFS
> <https://www.openafs.org/>; still appears to be somewhat alive, but there’s a
> difference between appealing to an increasingly narrow niche and trying to
> solve the sorts of problems most DevOps folks working At Scale these days
> are running into.
> 
Here are a few pNFS papers from the Netapp and Panansas sites. They are
dated 2012->2015: (these papers give a nice overview of what pNFS is)
http://www.netapp.com/us/media/tr-4063.pdf
http://www.netapp.com/us/media/tr-4239.pdf
http://www.netapp.com/us/media/wp-7153.pdf
http://www.panasas.com/products/pnfs-overview

One of these notes that the first Linux distribution that shipped with pNFS
support was RHEL6.4 in 2013.

So, I have no idea if it will catch on, but I don't think it can be considered
end of life. (Many use NFSv3 and its RFC is dated June 1995.)

rick

> That is also why I am not sure I would totally embrace the idea of a central
> MDS being a Real Option.  Sure, the risks can be mitigated (as you say, by
> mirroring it), but even saying the words “central MDS” (or central anything)
> may be such a turn-off to those very same DevOps folks, folks who have been
> burned so many times by SPOFs and scaling bottlenecks in large environments,
> that we'll lose the audience the minute they hear the trigger phrase.  Even
> if it means signing up for Other Problems later, it’s a lot easier to “sell”
> the concept of completely distributed mechanisms where, if there is any
> notion of centralization at all, it’s at least the result of a quorum
> election and the DevOps folks don’t have to do anything manually to cause it
> to happen - the cluster is “resilient" and "self-healing" and they are happy
> with being able to say those buzzwords to the CIO, who nods knowingly and
> tells them they’re doing a fine job!
> 
> Let’s get back, however, to the notion of downing multiple avians with the
> same semi-spherical kinetic projectile:  What seems to be The Rage at the
> moment, and I don’t know how well it actually scales since I’ve yet to be at
> the pointy end of such a real-world deployment, is the idea of clustering
> the storage (“somehow”) underneath and then providing NFS and SMB protocol
> access entirely in userland, usually with both of those services cooperating
> with the same lock manager and even the same ACL translation layer.  Our
> buddies at Red Hat do this with glusterfs at the bottom and NFS Ganesha +
> Samba on top - I talked to one of the Samba core team guys at SNIA and he
> indicated that this was increasingly common, with the team having helped
> here and there when approached by different vendors with the same idea.   We
> (iXsystems) also get a lot of requests to be able to make the same file(s)
> available via both NFS and SMB at the same time and they don’t much at all
> like being told “but that’s dangerous - don’t do that!  Your file contents
> and permissions models are not guaranteed to survive such an experience!”
> They really want to do it, because the rest of the world lives in
> Heterogenous environments and that’s just the way it is.
> 
> Even the object storage folks, like Openstack’s Swift project, are spending
> significant amounts of mental energy on the topic of how to re-export their
> object stores as shared filesystems over NFS and SMB, the single consistent
> and distributed object store being, of course, Their Thing.  They wish, of
> course, that the rest of the world would just fall into line and use their
> object system for everything, but they also get that the "legacy stuff” just
> won’t go away and needs some sort of attention if they’re to remain players
> at the standards table.
> 
> So anyway, that’s the view I have from the perspective of someone who
> actually sells storage solutions for a living, and while I could certainly
> “sell some pNFS” to various customers who just want to add a dash of
> steroids to their current NFS infrastructure, or need to use NFS but also
> need to store far more data into a single namespace than any one box will
> accommodate, I also know that offering even more elastic solutions will be a
> necessary part of offering solutions to the growing contingent of folks who
> are not tied to any existing storage infrastructure and have various
> non-greybearded folks shouting in their ears about object this and cloud
> that.  Might there not be some compromise solution which allows us to put
> more of this in userland with less context switches in and out of the
> kernel, also giving us the option of presenting a more united front to
> multiple protocols that require more ACL and lock impedance-matching than
> we’d ever want to put in the kernel anyway?
> 
> - Jordan
> 
> 
> 
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1996808572.159331289.1466387661988.JavaMail.zimbra>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation