Date: Fri, 27 Mar 2015 14:37:30 -0600 From: John Nielsen <lists@jnielsen.net> To: Alexander Motin <mav@freebsd.org> Cc: freebsd-virtualization@freebsd.org Subject: Re: Bhyve storage improvements Message-ID: <1F36054F-7F07-4972-870C-65018F3AE5AC@jnielsen.net> In-Reply-To: <551596AD.8070202@FreeBSD.org> References: <5515270A.7050408@FreeBSD.org> <98136D5B-297B-4538-8EF4-EA2872C6640B@jnielsen.net> <551596AD.8070202@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mar 27, 2015, at 11:43 AM, Alexander Motin <mav@freebsd.org> wrote: > On 27.03.2015 18:47, John Nielsen wrote: >> Does anyone have plans (or know about any) to implement virtio-scsi = support in bhyve? That API does support TRIM and should retain most or = all of the low-overhead virtio goodness. >=20 > I was thinking about that (not really a plans yet, just some = thoughts), > but haven't found a good motivation and understanding of whole = possible > infrastructure. >=20 > I am not sure it worth to emulate SCSI protocol in addition to already > done ATA in ahci-hd and simple block in virtio-blk just to get = another, > possibly faster then AHCI, block storage with TRIM/UNMAP. Really good > SCSI disk emulation in CTL in kernel takes about 20K lines of code. It > is pointless to duplicate it, and may be complicated for = administration > to just interface to it. Indeed I've seen virtio-blk being faster = then > ahci-hd in some tests, but those tests were highly synthetic. I = haven't > tested it on real workloads, but I have feeling that real difference = may > be not that large. If somebody wants to check -- more benchmarks are > highly welcome! =46rom the theoretical side I'd like to notice that = both > ATA and SCSI protocols on guests go through additional ATA/SCSI > infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk, > so they have some more overhead by definition. Agreed, more testing is needed to see how big an effect having TRIM = remain dependent on AHCI emulation would have on performance. > Main potential benefit I see from using virtio-scsi is a possibility = to > pass through to client not a block device, but some real SCSI device. = It > can be some local DVD writer, or remote iSCSI storage. The last would = be > especially interesting for large production installations. But the = main > problem I see here is booting. To make user-level loader boot the = kernel > from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like > small second copy of CAM in user-level. Booting kernel from some other > local block storage and then attaching to remote iSCSI storage for = data > can be much easier, but it is not convenient. It is possible to nt > connect to iSCSI directly from user-level, but to make kernel CAM do = it, > and then make CAM provide both block layer for booting and SCSI layer > for virtio-scsi, but I am not sure that it is very good from security > point to make host system to see virtual disks. Though may be it could > work if CAM could block kernel/GEOM access to them, alike it is done = for > ZVOLs now, supporting "geom" and "dev" modes. Though that complicates > CAM and the whole infrastructure. Yes, pass-through of disk devices opens up a number of possibilities. = Would it be feasible to just have bhyve broker between a pass(4) device = on the host and virtio_scsi(4) in the guest? That would require the = guest devices (be they local disks, iSCSI LUNs, etc) be connected to the = host but I'm not sure that's a huge concern. The host will always have a = high level of access to the guest's data. (Plus, there's nothing = preventing a guest from doing its own iSCSI, etc. after it boots). Using = the existing kernel infrastructure (CAM, iSCSI initiator, etc) would = also remove the need to duplicate any of that in userland, wouldn't it? The user-level loader is necessary for now but once UEFI support exists = in bhyve the external loader can go away. Any workarounds like you've = described above would similarly be temporary. Using Qemu+KVM on Linux as a comparison point, there are examples of = both kernel-level and user-level access by the host to guest disks. = Local disk images (be they raw or qcow2) are obviously manipulated by = the Qemu process from userland. RBD (Ceph/RADOS network block device) is = in userland. SRP (SCSI RDMA Protocol) is in kernel. There are a few ways = to do host- and/or kernel-based iSCSI. There is also a userland option = if you link Qemu against libiscsi when you build it. If we do ever want = userland iSCSI support, libiscsi does claim to be "pure POSIX" and to = have been tested on FreeBSD, among others. JN
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1F36054F-7F07-4972-870C-65018F3AE5AC>