From owner-freebsd-fs@freebsd.org Sun Jun 19 01:14:34 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A98D5A789CD for ; Sun, 19 Jun 2016 01:14:34 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: from mail-pf0-x22e.google.com (mail-pf0-x22e.google.com [IPv6:2607:f8b0:400e:c00::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7A7022F0F; Sun, 19 Jun 2016 01:14:34 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: by mail-pf0-x22e.google.com with SMTP id i123so38505661pfg.0; Sat, 18 Jun 2016 18:14:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:mime-version:in-reply-to:content-transfer-encoding :message-id:cc:from:subject:date:to; bh=A44CKbDxu5iaWZMS7kXVDJbHoOuBuxE2cSJDEam7IVI=; b=uzEr+xm4RUIXkRY2glweH1XayTggJcUuuvhPRBOohgggcxRFGfPgzOicukgdbYDGqI 4n08qXPCZTVjBcEKM5GDxC72tDnjjFcum0OBNXbixTybwbeFZGPFnJaSUNiB20jyIt72 DJcMn1bdVLFSN45A1Sm+LLKZffMpdzqSl2DFTaDqvIiy8m/oVeYDm2gm4AP8UFckZSn1 TaqtgM2mw10aEVh92haoSi8t12YUF0TU3w8MYvDZJV6t2tzYDiceJaHjh2u+vYbqx8+Q g6v841KY0quWXxlEuMr2fGUxbKyVeM9Gb6JiYVfe55zEY11QblGcgNovYErjGkEu+YBx iNbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:references:mime-version:in-reply-to :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=A44CKbDxu5iaWZMS7kXVDJbHoOuBuxE2cSJDEam7IVI=; b=iXGWZZb8xl1LkGvqAW+y7EUo0jJsd4KXQfx3BZCiUqAi/YeN8Rk+DdQ2qurYwBFt5i XMlxiV9abfJ9ncDXK6YNZEh+a0Z5GSnXzUrLmvivMMdKvObGWmhfRp+xofj3fapsoCct Ld1nzHVLPVZ4WxR0zEFwrfejO2TvZa02T//JY3fwM5ygxd0hQcdIbxeI2KewVu3C4TYl bTLYM+X7cLTU3TwTa0rKTuwa05jR+ZHtuifUKsawMMKRFt8dB+cxfAuGpNS6nUviEa9U G4vpCoP6YHEzUkSomNATUfFQUhEjdupGskJeRuqhWRbhaPa6ePjG8khndFG5qFu/TXIX cnIQ== X-Gm-Message-State: ALyK8tJ5fSZzQPTr4W4pZA/qWdxsjfmBvAsyYjOOrTeDsjU+B4tksWW1u/T0sC+/wpk8HA== X-Received: by 10.98.35.27 with SMTP id j27mr12485659pfj.10.1466298873778; Sat, 18 Jun 2016 18:14:33 -0700 (PDT) Received: from [21.247.192.62] ([172.56.30.157]) by smtp.gmail.com with ESMTPSA id wt6sm78137666pab.3.2016.06.18.18.14.31 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 18 Jun 2016 18:14:32 -0700 (PDT) References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-Id: <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com> Cc: Rick Macklem , freebsd-fs , Alexander Motin X-Mailer: iPhone Mail (13F69) From: Chris Watson Subject: Re: pNFS server Plan B Date: Sat, 18 Jun 2016 20:14:29 -0500 To: Jordan Hubbard X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 01:14:34 -0000 Since Jordan brought up clustering, I would be interested to hear Justin Gib= bs thoughts here. I know about a year ago he was asked on an "after hours" v= ideo chat hosted by Matt Aherns about a feature he would really like to see a= nd he mentioned he would really like, in a universe filled with time and mon= ey I'm sure, to work on a native clustering solution for FreeBSD. I don't kn= ow if he is subscribed to the list, and I'm certainly not throwing him under= the bus by bringing his name up, but I know he has at least been thinking a= bout this for some time and probably has some value to add here.=20 Chris Sent from my iPhone 5 > On Jun 18, 2016, at 3:50 PM, Jordan Hubbard wrote: >=20 >=20 >> On Jun 13, 2016, at 3:28 PM, Rick Macklem wrote: >>=20 >> You may have already heard of Plan A, which sort of worked >> and you could test by following the instructions here: >>=20 >> http://people.freebsd.org/~rmacklem/pnfs-setup.txt >>=20 >> However, it is very slow for metadata operations (everything other than >> read/write) and I don't think it is very useful. >=20 > Hi guys, >=20 > I finally got a chance to catch up and bring up Rick=E2=80=99s pNFS setup o= n a couple of test machines. He=E2=80=99s right, obviously - The =E2=80=9Cp= lan A=E2=80=9D approach is a bit convoluted and not at all surprisingly slow= . With all of those transits twixt kernel and userland, not to mention glus= terfs itself which has not really been tuned for our platform (there are a n= umber of papers on this we probably haven=E2=80=99t even all read yet), we=E2= =80=99re obviously still in the =E2=80=9Cfirst make it work=E2=80=9D stage. >=20 > That said, I think there are probably more possible plans than just A and B= here, and we should give the broader topic of =E2=80=9Cwhat does FreeBSD wa= nt to do in the Enterprise / Cloud computing space?" at least some considera= tion at the same time, since there are more than a few goals running in para= llel here. >=20 > First, let=E2=80=99s talk about our story around clustered filesystems + a= ssociated command-and-control APIs in FreeBSD. There is something of an emb= arrassment of riches in the industry at the moment - glusterfs, ceph, Hadoop= HDFS, RiakCS, moose, etc. All or most of them offer different pros and con= s, and all offer more than just the ability to store files and scale =E2=80=9C= elastically=E2=80=9D. They also have ReST APIs for configuring and monitori= ng the health of the cluster, some offer object as well as file storage, and= Riak offers a distributed KVS for storing information *about* file objects i= n addition to the object themselves (and when your application involves stor= ing and managing several million photos, for example, the idea of distributi= ng the index as well as the files in a fault-tolerant fashion is also compel= ling). Some, if not most, of them are also far better supported under Linux= than FreeBSD (I don=E2=80=99t think we even have a working ceph port yet). = I=E2=80=99m not saying we need to blindly follow the herds and do all the s= ame things others are doing here, either, I=E2=80=99m just saying that it=E2= =80=99s a much bigger problem space than simply =E2=80=9Cparallelizing NFS=E2= =80=9D and if we can kill multiple birds with one stone on the way to doing t= hat, we should certainly consider doing so. >=20 > Why? Because pNFS was first introduced as a draft RFC (RFC5661 ) in 2005. The linux folks have been worki= ng on it since 2006. Ten years is a long time in this business, and when I raise= d the topic of pNFS at the recent SNIA DSI conference (where storage develop= ers gather to talk about trends and things), the most prevalent reaction I g= ot was =E2=80=9Cpeople are still using pNFS?!=E2=80=9D This is clearly one= of those technologies that may still have some runway left, but it=E2=80=99= s been rapidly overtaken by other approaches to solving more or less the sam= e problems in coherent, distributed filesystem access and if we want to get m= indshare for this, we should at least have an answer ready for the =E2=80=9C= why did you guys do pNFS that way rather than just shimming it on top of ${s= omeNewerHotness}??=E2=80=9D argument. I=E2=80=99m not suggesting pNFS is d= ead - hell, even AFS still appears to be somewhat= alive, but there=E2=80=99s a difference between appealing to an increasingl= y narrow niche and trying to solve the sorts of problems most DevOps folks w= orking At Scale these days are running into. >=20 > That is also why I am not sure I would totally embrace the idea of a centr= al MDS being a Real Option. Sure, the risks can be mitigated (as you say, b= y mirroring it), but even saying the words =E2=80=9Ccentral MDS=E2=80=9D (or= central anything) may be such a turn-off to those very same DevOps folks, f= olks who have been burned so many times by SPOFs and scaling bottlenecks in l= arge environments, that we'll lose the audience the minute they hear the tri= gger phrase. Even if it means signing up for Other Problems later, it=E2=80= =99s a lot easier to =E2=80=9Csell=E2=80=9D the concept of completely distri= buted mechanisms where, if there is any notion of centralization at all, it=E2= =80=99s at least the result of a quorum election and the DevOps folks don=E2= =80=99t have to do anything manually to cause it to happen - the cluster is =E2= =80=9Cresilient" and "self-healing" and they are happy with being able to sa= y those buzzwords to the CIO, who nods knowingly and tells them they=E2=80=99= re doing a fine job! >=20 > Let=E2=80=99s get back, however, to the notion of downing multiple avians w= ith the same semi-spherical kinetic projectile: What seems to be The Rage a= t the moment, and I don=E2=80=99t know how well it actually scales since I=E2= =80=99ve yet to be at the pointy end of such a real-world deployment, is the= idea of clustering the storage (=E2=80=9Csomehow=E2=80=9D) underneath and t= hen providing NFS and SMB protocol access entirely in userland, usually with= both of those services cooperating with the same lock manager and even the s= ame ACL translation layer. Our buddies at Red Hat do this with glusterfs at= the bottom and NFS Ganesha + Samba on top - I talked to one of the Samba co= re team guys at SNIA and he indicated that this was increasingly common, wit= h the team having helped here and there when approached by different vendors= with the same idea. We (iXsystems) also get a lot of requests to be able t= o make the same file(s) available via both NFS and SMB at the same time and t= hey don=E2=80=99t much at all like being told =E2=80=9Cbut that=E2=80=99s da= ngerous - don=E2=80=99t do that! Your file contents and permissions models a= re not guaranteed to survive such an experience!=E2=80=9D They really want t= o do it, because the rest of the world lives in Heterogenous environments an= d that=E2=80=99s just the way it is. >=20 > Even the object storage folks, like Openstack=E2=80=99s Swift project, are= spending significant amounts of mental energy on the topic of how to re-exp= ort their object stores as shared filesystems over NFS and SMB, the single c= onsistent and distributed object store being, of course, Their Thing. They w= ish, of course, that the rest of the world would just fall into line and use= their object system for everything, but they also get that the "legacy stuf= f=E2=80=9D just won=E2=80=99t go away and needs some sort of attention if th= ey=E2=80=99re to remain players at the standards table. >=20 > So anyway, that=E2=80=99s the view I have from the perspective of someone w= ho actually sells storage solutions for a living, and while I could certainl= y =E2=80=9Csell some pNFS=E2=80=9D to various customers who just want to add= a dash of steroids to their current NFS infrastructure, or need to use NFS b= ut also need to store far more data into a single namespace than any one box= will accommodate, I also know that offering even more elastic solutions wil= l be a necessary part of offering solutions to the growing contingent of fol= ks who are not tied to any existing storage infrastructure and have various n= on-greybearded folks shouting in their ears about object this and cloud that= . Might there not be some compromise solution which allows us to put more o= f this in userland with less context switches in and out of the kernel, also= giving us the option of presenting a more united front to multiple protocol= s that require more ACL and lock impedance-matching than we=E2=80=99d ever w= ant to put in the kernel anyway? >=20 > - Jordan >=20 >=20 >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"