Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Mar 2015 14:37:30 -0600
From:      John Nielsen <lists@jnielsen.net>
To:        Alexander Motin <mav@freebsd.org>
Cc:        freebsd-virtualization@freebsd.org
Subject:   Re: Bhyve storage improvements
Message-ID:  <1F36054F-7F07-4972-870C-65018F3AE5AC@jnielsen.net>
In-Reply-To: <551596AD.8070202@FreeBSD.org>
References:  <5515270A.7050408@FreeBSD.org> <98136D5B-297B-4538-8EF4-EA2872C6640B@jnielsen.net> <551596AD.8070202@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mar 27, 2015, at 11:43 AM, Alexander Motin <mav@freebsd.org> wrote:

> On 27.03.2015 18:47, John Nielsen wrote:
>> Does anyone have plans (or know about any) to implement virtio-scsi =
support in bhyve? That API does support TRIM and should retain most or =
all of the low-overhead virtio goodness.
>=20
> I was thinking about that (not really a plans yet, just some =
thoughts),
> but haven't found a good motivation and understanding of whole =
possible
> infrastructure.
>=20
> I am not sure it worth to emulate SCSI protocol in addition to already
> done ATA in ahci-hd and simple block in virtio-blk just to get =
another,
> possibly faster then AHCI, block storage with TRIM/UNMAP.  Really good
> SCSI disk emulation in CTL in kernel takes about 20K lines of code. It
> is pointless to duplicate it, and may be complicated for =
administration
> to just interface to it.  Indeed I've seen virtio-blk being faster =
then
> ahci-hd in some tests, but those tests were highly synthetic.  I =
haven't
> tested it on real workloads, but I have feeling that real difference =
may
> be not that large.  If somebody wants to check -- more benchmarks are
> highly welcome!  =46rom the theoretical side I'd like to notice that =
both
> ATA and SCSI protocols on guests go through additional ATA/SCSI
> infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk,
> so they have some more overhead by definition.

Agreed, more testing is needed to see how big an effect having TRIM =
remain dependent on AHCI emulation would have on performance.

> Main potential benefit I see from using virtio-scsi is a possibility =
to
> pass through to client not a block device, but some real SCSI device. =
It
> can be some local DVD writer, or remote iSCSI storage. The last would =
be
> especially interesting for large production installations. But the =
main
> problem I see here is booting. To make user-level loader boot the =
kernel
> from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like
> small second copy of CAM in user-level. Booting kernel from some other
> local block storage and then attaching to remote iSCSI storage for =
data
> can be much easier, but it is not convenient. It is possible to nt
> connect to iSCSI directly from user-level, but to make kernel CAM do =
it,
> and then make CAM provide both block layer for booting and SCSI layer
> for virtio-scsi, but I am not sure that it is very good from security
> point to make host system to see virtual disks. Though may be it could
> work if CAM could block kernel/GEOM access to them, alike it is done =
for
> ZVOLs now, supporting "geom" and "dev" modes. Though that complicates
> CAM and the whole infrastructure.

Yes, pass-through of disk devices opens up a number of possibilities. =
Would it be feasible to just have bhyve broker between a pass(4) device =
on the host and virtio_scsi(4) in the guest? That would require the =
guest devices (be they local disks, iSCSI LUNs, etc) be connected to the =
host but I'm not sure that's a huge concern. The host will always have a =
high level of access to the guest's data. (Plus, there's nothing =
preventing a guest from doing its own iSCSI, etc. after it boots). Using =
the existing kernel infrastructure (CAM, iSCSI initiator, etc) would =
also remove the need to duplicate any of that in userland, wouldn't it?

The user-level loader is necessary for now but once UEFI support exists =
in bhyve the external loader can go away. Any workarounds like you've =
described above would similarly be temporary.

Using Qemu+KVM on Linux as a comparison point, there are examples of =
both kernel-level and user-level access by the host to guest disks. =
Local disk images (be they raw or qcow2) are obviously manipulated by =
the Qemu process from userland. RBD (Ceph/RADOS network block device) is =
in userland. SRP (SCSI RDMA Protocol) is in kernel. There are a few ways =
to do host- and/or kernel-based iSCSI. There is also a userland option =
if you link Qemu against libiscsi when you build it. If we do ever want =
userland iSCSI support, libiscsi does claim to be "pure POSIX" and to =
have been tested on FreeBSD, among others.

JN




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1F36054F-7F07-4972-870C-65018F3AE5AC>