From owner-freebsd-fs@freebsd.org Fri Jul 15 20:10:22 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB67EB98B61 for ; Fri, 15 Jul 2016 20:10:22 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from webmail2.jnielsen.NET (webmail2.jnielsen.net [50.114.224.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "webmail2.jnielsen.net", Issuer "freebsdsolutions.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 9497D1DCB for ; Fri, 15 Jul 2016 20:10:22 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from [10.3.129.88] (50-207-240-162-static.hfc.comcastbusiness.net [50.207.240.162]) (authenticated bits=0) by webmail2.jnielsen.NET (8.15.2/8.15.2) with ESMTPSA id u6FJjvrH028219 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 15 Jul 2016 13:46:00 -0600 (MDT) (envelope-from lists@jnielsen.net) X-Authentication-Warning: webmail2.jnielsen.NET: Host 50-207-240-162-static.hfc.comcastbusiness.net [50.207.240.162] claimed to be [10.3.129.88] Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: pNFS server Plan B From: John Nielsen In-Reply-To: <20e89f76-867f-67b7-bb80-17acf8de6ed3@digiware.nl> Date: Fri, 15 Jul 2016 13:45:57 -0600 Cc: Jordan Hubbard , Willem Jan Withagen Content-Transfer-Encoding: quoted-printable Message-Id: References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <16d38847-f515-f532-1300-d2843005999e@digiware.nl> <0CB465F9-B819-4DA7-969C-690A02BEB66E@ixsystems.com> <20e89f76-867f-67b7-bb80-17acf8de6ed3@digiware.nl> To: freebsd-fs X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2016 20:10:22 -0000 Sorry for the very delayed reply but I've been behind on list emails for = a while. I've enjoyed reading the whole thread and have a few comments = below. I'm interested in a lot of this stuff as both a consumer and an = enterprise sysadmin (but I'm not much of a developer). Hopefully I can = provide some of the perspective Rick was looking for. > On Jun 24, 2016, at 2:21 AM, Willem Jan Withagen = wrote: >=20 > On 24-6-2016 09:35, Jordan Hubbard wrote: >>=20 >>> On Jun 22, 2016, at 1:56 AM, Willem Jan Withagen >>> wrote: >>>=20 >>> In the spare time I have left, I'm trying to get a lot of small >>> fixes into the ceph tree to get it actually compiling, testing, and >>> running on FreeBSD. But Ceph is a lot of code, and since a lot of >>> people are working on it, the number of code changes are big. >>=20 >> Hi Willem, >>=20 >> Yes, I read your paper on the porting effort! Indeed, thank you again. I've been wanting to test your patches but = haven't had time; hopefully that will change soon. >> I also took a look at porting ceph myself, about 2 years ago, and >> rapidly concluded that it wasn=E2=80=99t a small / trivial effort by = any >> means and would require a strong justification in terms of ceph=E2=80=99= s >> feature set over glusterfs / moose / OpenAFS / RiakCS / etc. Since >> that time, there=E2=80=99s been customer interest but nothing truly = =E2=80=9Cstrong=E2=80=9D >> per-se. =20 >=20 > I've been going at it since last November... And all I go in are about = 3 > batches of FreeBSD specific commits. Lots has to do with release = windows > and code slush, like we know on FreeBSD. But then still reviews tend = to > slow and I need people to push to look at them. Whilst in the mean = time > all kinds of thing get pulled and inserted in the tree, that seriously > are not FreeBSD. Sometimes I see them during commit, and "negotiate" > better comparability with the author. At other times I missed the = whole > thing, and I need to rebase to get ride of merge conflicts. To find = out > the hard way that somebody has made the whole > peer communication async. And has thrown kqueue for the BSDs at it. = But > they don't work (yet). So to get my other patches in, if First need to > fix this. Takes a lot of time ..... >=20 > That all said I was in Geneva and a lot of the Ceph people were there > including Sage Weil. And I go the feeling they appreciated a larger > community. I think they see what ZFS has done with OpenZFS and see = that > communities get somewhere. I think too that you're probably wearing them down. :) > Now on of the things to do to continue, now that I sort of can compile > and run the first testset, is set up sort of my own Jenkins stuff. So > that I can at least test drive some of the tree automagically to get > some testcoverage of the code on FreeBSD. In my mind (and Sage warned = me > that that will be more or less required) it is the only way to = actually > get a serious foot in the door with the Ceph guys. >=20 >> My attraction to ceph remains centered around at least these >> 4 things: >>=20 >> 1. Distributed Object store with S3-compatible ReST API=20 >> 2. Interoperates with Openstack via Swift compatibility=20 >> 3. Block storage > (RADOS) - possibly useful for iSCSI and other = block storage >> requirements=20 >> 4. Filesystem interface I will admit I don't have a lot of experience with other things like = GlusterFS, but for me Ceph is very compelling for similar reasons: 1. Block storage (RADOS Block Device). This is the top of my list since = it makes it easy to run a resilient farm of hypervisors that supports = live migration _without_ NFS, iSCSI or anything else. For small = deployments (like I have at home), you can run Ceph and the hypervisors = on the same hardware and still reboot them one at a time without any = storage interruption or having to stop any VMs (just shuffle them = around). No NAS/SAN required at all. Another similar use case (which = just got easier on Linux at least with the release of rbd-nbd support) = is (Docker) containers with persistent data volumes not being tied to = any specific host. I would _love_ to see librbd support in Bhyve but = obviously a working librbd on FreeBSD is a prerequisite for that. 2. Distributed object store with S3 and Swift compatibility. A lot of = different enterprises need this for a lot of different reasons. I know = for a fact that some of the pricey commercial offerings use Ceph under = the covers. For shops where budget is more important than commercial = support this is a great option. 3. Everything else, including but not limited to native object store = (RADOS), POSIX filesystem (which as mentioned is now advertised as = production-quality with experimental support for multiple metadata = servers, support for arbitrary topologies, custom CRUSH maps, erasure = coding for efficient replication, ... I do think Ceph on ZFS would be fantastic (and actually have a Fedora = box with a ZFSonLinux-backed OSD). Not sure if BlueStore will be a good = thing or not (even ignoring the porting hurdles, which are unfortunate). = It would be interesting to compare features and performance of a ZFS OSD = and a ZVOL-backed BlueStore OSD. >> Is there anything we can do to help? =20 >=20 > I'll get back on that in a separate Email. With my $work hat on, I'd be interested in a TrueNAS S3 appliance that = came with support. Anyway, glad to see that both pNFS and Ceph on FreeBSD are potentially = in the works. JN