Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jan 2020 10:53:20 +0100
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        virtualization@FreeBSD.org
Subject:   Re: Adding a different type of blockstore to Bhyve
Message-ID:  <13c3951d-dd0e-7666-a56d-41f6368465c7@digiware.nl>
In-Reply-To: <6e5508d0-4a41-8442-3807-8b9e22bba933@digiware.nl>
References:  <6e5508d0-4a41-8442-3807-8b9e22bba933@digiware.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On 30-12-2019 19:06, Willem Jan Withagen wrote:
> Hi,
> 
> One of the ways to run backing blockstore with KVM/Qemu is thru
> the Ceph Rados Block Device (RBD).
> https://github.com/qemu/qemu/blame/master/block/rbd.c
> 
> And is make it possible use as boot-image or other blockdevice. Where
> the virtual machine using this image can migrate to another Dom0 host.
> 
> I've been working on Ceph for quite some time, and one of the ways to
> offer a block device on FreeBSD is with rbd-ggate.
> This works thru geom-gate and will give a /dev/ggate# device that is
> mapped to an image in a rados pool.
> 
> And I not into migration for Bhyve, but I would like to integrate RBD
> into Bhyve as an alternative backing store....
> 
> Something like:
>    bhyve -s 1,virtio-blk,rbd:poolname/imagename[@snapshotname] \
>                           [:option1=value1[:option2=value2...]]
> 
> So started browsing the bhyve code, and end up in block_if.{hc}.
> But code there is rather strongly targeted towards a local
> filesystem storage....
> 
> I also ran into net_backends.{ch}, and I guess it would be a nicer
> solution to create a block_backends.{ch} as well for interfacing to
> more than just one blockstore provider.
> And then load the RBD provider in the chain of blockstore providers.
> That way would it even be possible to make that code dl-loadable in
> case the LGPL ceph code is not directly importable in the usr.sbin
> tree. (Which I suspect it is)
> 
> The alternative is to start using the /dev/ggate# devices but then
> we probably lose the option of live migration.
> And performance takes a serious hit:
>      A block write/read would go from the vm kernel
>      to the bhyve process in userspace.
>      Then it would go to/dev/ggate# and again end up in the kernel
>      only to have geom-gate send it back to userspace
>      where rbd-ggate sends it to the cluster.
> 
> Just typing this data flow is a lot of steps, showing that this
> might not be the best architecture.
> 
> So the questions are:
> 1)   Is the abstraction of block_backends.{ch} the way to go?
> 1.1) And would the extra indirection there be acceptable?
>       (For network devices it seems no problem)
> 
> 2)   Does anybody already have such a framework for blockdevs?
>       (Otherwise I'll try to morph the net_backends.{ch}
> 
> 3)   Other suggestions I need to consider?

Looking for reviewers of:
https://reviews.freebsd.org/D23010

In the days after newyear I made a first attempt to refactor the
block_if stuff into a generic backend: blockbe_ and and implementation
of the local storage: lockblk_

I've submitted it to phabricator, and I'm seeking reviews and ultimately 
somebody that will commit this when all issues are worked out.

--WjW





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13c3951d-dd0e-7666-a56d-41f6368465c7>