Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Jul 2016 13:45:57 -0600
From:      John Nielsen <lists@jnielsen.net>
To:        freebsd-fs <freebsd-fs@freebsd.org>
Cc:        Jordan Hubbard <jkh@ixsystems.com>, Willem Jan Withagen <wjw@digiware.nl>
Subject:   Re: pNFS server Plan B
Message-ID:  <E0FA996E-AAE9-421F-8D89-28E67B42B70A@jnielsen.net>
In-Reply-To: <20e89f76-867f-67b7-bb80-17acf8de6ed3@digiware.nl>
References:  <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <D20C793E-A2FD-49F3-AD88-7C2FED5E7715@ixsystems.com> <16d38847-f515-f532-1300-d2843005999e@digiware.nl> <0CB465F9-B819-4DA7-969C-690A02BEB66E@ixsystems.com> <20e89f76-867f-67b7-bb80-17acf8de6ed3@digiware.nl>

next in thread | previous in thread | raw e-mail | index | archive | help

Sorry for the very delayed reply but I've been behind on list emails for a while. I've enjoyed reading the whole thread and have a few comments below. I'm interested in a lot of this stuff as both a consumer and an enterprise sysadmin (but I'm not much of a developer). Hopefully I can provide some of the perspective Rick was looking for.

> On Jun 24, 2016, at 2:21 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> 
> On 24-6-2016 09:35, Jordan Hubbard wrote:
>> 
>>> On Jun 22, 2016, at 1:56 AM, Willem Jan Withagen <wjw@digiware.nl>
>>> wrote:
>>> 
>>> In the spare time I have left, I'm trying to get a lot of small
>>> fixes into the ceph tree to get it actually compiling, testing, and
>>> running on FreeBSD. But Ceph is a lot of code, and since a lot of
>>> people are working on it, the number of code changes are big.
>> 
>> Hi Willem,
>> 
>> Yes, I read your paper on the porting effort!

Indeed, thank you again. I've been wanting to test your patches but haven't had time; hopefully that will change soon.

>> I also took a look at porting ceph myself, about 2 years ago, and
>> rapidly concluded that it wasn’t a small / trivial effort by any
>> means and would require a strong justification in terms of ceph’s
>> feature set over glusterfs / moose / OpenAFS / RiakCS / etc.   Since
>> that time, there’s been customer interest but nothing truly “strong”
>> per-se.  
> 
> I've been going at it since last November... And all I go in are about 3
> batches of FreeBSD specific commits. Lots has to do with release windows
> and code slush, like we know on FreeBSD. But then still reviews tend to
> slow and I need people to push to look at them. Whilst in the mean time
> all kinds of thing get pulled and inserted in the tree, that seriously
> are not FreeBSD. Sometimes I see them during commit, and "negotiate"
> better comparability with the author. At other times I missed the whole
> thing, and I need to rebase to get ride of merge conflicts. To find out
> the hard way that somebody has made the whole
> peer communication async. And has thrown kqueue for the BSDs at it. But
> they don't work (yet). So to get my other patches in, if First need to
> fix this. Takes a lot of time .....
> 
> That all said I was in Geneva and a lot of the Ceph people were there
> including Sage Weil. And I go the feeling they appreciated a larger
> community. I think they see what ZFS has done with OpenZFS and see that
> communities get somewhere.

I think too that you're probably wearing them down. :)

> Now on of the things to do to continue, now that I sort of can compile
> and run the first testset, is set up sort of my own Jenkins stuff. So
> that I can at least test drive some of the tree automagically to get
> some testcoverage of the code on FreeBSD. In my mind (and Sage warned me
> that that will be more or less required) it is the only way to actually
> get a serious foot in the door with the Ceph guys.
> 
>> My attraction to ceph remains centered around at least these
>> 4 things:
>> 
>> 1. Distributed Object store with S3-compatible ReST API 
>> 2. Interoperates with Openstack via Swift compatibility 
>> 3. Block storage > (RADOS) - possibly useful for iSCSI and other block storage
>> requirements 
>> 4. Filesystem interface

I will admit I don't have a lot of experience with other things like GlusterFS, but for me Ceph is very compelling for similar reasons:

1. Block storage (RADOS Block Device). This is the top of my list since it makes it easy to run a resilient farm of hypervisors that supports live migration _without_ NFS, iSCSI or anything else. For small deployments (like I have at home), you can run Ceph and the hypervisors on the same hardware and still reboot them one at a time without any storage interruption or having to stop any VMs (just shuffle them around). No NAS/SAN required at all. Another similar use case (which just got easier on Linux at least with the release of rbd-nbd support) is (Docker) containers with persistent data volumes not being tied to any specific host. I would _love_ to see librbd support in Bhyve but obviously a working librbd on FreeBSD is a prerequisite for that.

2. Distributed object store with S3 and Swift compatibility. A lot of different enterprises need this for a lot of different reasons. I know for a fact that some of the pricey commercial offerings use Ceph under the covers. For shops where budget is more important than commercial support this is a great option.

3. Everything else, including but not limited to native object store (RADOS), POSIX filesystem (which as mentioned is now advertised as production-quality with experimental support for multiple metadata servers, support for arbitrary topologies, custom CRUSH maps, erasure coding for efficient replication, ...

I do think Ceph on ZFS would be fantastic (and actually have a Fedora box with a ZFSonLinux-backed OSD). Not sure if BlueStore will be a good thing or not (even ignoring the porting hurdles, which are unfortunate). It would be interesting to compare features and performance of a ZFS OSD and a ZVOL-backed BlueStore OSD.

>> Is there anything we can do to help?  
> 
> I'll get back on that in a separate Email.

With my $work hat on, I'd be interested in a TrueNAS S3 appliance that came with support.

Anyway, glad to see that both pNFS and Ceph on FreeBSD are potentially in the works.

JN




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E0FA996E-AAE9-421F-8D89-28E67B42B70A>