From owner-freebsd-virtualization@FreeBSD.ORG Fri Mar 27 20:37:35 2015 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E679A3E7; Fri, 27 Mar 2015 20:37:35 +0000 (UTC) Received: from webmail2.jnielsen.net (webmail2.jnielsen.net [50.114.224.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "webmail2.jnielsen.net", Issuer "freebsdsolutions.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C297B1F7; Fri, 27 Mar 2015 20:37:34 +0000 (UTC) Received: from [10.10.1.196] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by webmail2.jnielsen.net (8.15.1/8.15.1) with ESMTPSA id t2RKbUWf053463 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 27 Mar 2015 14:37:33 -0600 (MDT) (envelope-from lists@jnielsen.net) X-Authentication-Warning: webmail2.jnielsen.net: Host office.betterlinux.com [199.58.199.60] claimed to be [10.10.1.196] Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: Bhyve storage improvements From: John Nielsen In-Reply-To: <551596AD.8070202@FreeBSD.org> Date: Fri, 27 Mar 2015 14:37:30 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <1F36054F-7F07-4972-870C-65018F3AE5AC@jnielsen.net> References: <5515270A.7050408@FreeBSD.org> <98136D5B-297B-4538-8EF4-EA2872C6640B@jnielsen.net> <551596AD.8070202@FreeBSD.org> To: Alexander Motin X-Mailer: Apple Mail (2.2070.6) Cc: freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2015 20:37:36 -0000 On Mar 27, 2015, at 11:43 AM, Alexander Motin wrote: > On 27.03.2015 18:47, John Nielsen wrote: >> Does anyone have plans (or know about any) to implement virtio-scsi = support in bhyve? That API does support TRIM and should retain most or = all of the low-overhead virtio goodness. >=20 > I was thinking about that (not really a plans yet, just some = thoughts), > but haven't found a good motivation and understanding of whole = possible > infrastructure. >=20 > I am not sure it worth to emulate SCSI protocol in addition to already > done ATA in ahci-hd and simple block in virtio-blk just to get = another, > possibly faster then AHCI, block storage with TRIM/UNMAP. Really good > SCSI disk emulation in CTL in kernel takes about 20K lines of code. It > is pointless to duplicate it, and may be complicated for = administration > to just interface to it. Indeed I've seen virtio-blk being faster = then > ahci-hd in some tests, but those tests were highly synthetic. I = haven't > tested it on real workloads, but I have feeling that real difference = may > be not that large. If somebody wants to check -- more benchmarks are > highly welcome! =46rom the theoretical side I'd like to notice that = both > ATA and SCSI protocols on guests go through additional ATA/SCSI > infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk, > so they have some more overhead by definition. Agreed, more testing is needed to see how big an effect having TRIM = remain dependent on AHCI emulation would have on performance. > Main potential benefit I see from using virtio-scsi is a possibility = to > pass through to client not a block device, but some real SCSI device. = It > can be some local DVD writer, or remote iSCSI storage. The last would = be > especially interesting for large production installations. But the = main > problem I see here is booting. To make user-level loader boot the = kernel > from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like > small second copy of CAM in user-level. Booting kernel from some other > local block storage and then attaching to remote iSCSI storage for = data > can be much easier, but it is not convenient. It is possible to nt > connect to iSCSI directly from user-level, but to make kernel CAM do = it, > and then make CAM provide both block layer for booting and SCSI layer > for virtio-scsi, but I am not sure that it is very good from security > point to make host system to see virtual disks. Though may be it could > work if CAM could block kernel/GEOM access to them, alike it is done = for > ZVOLs now, supporting "geom" and "dev" modes. Though that complicates > CAM and the whole infrastructure. Yes, pass-through of disk devices opens up a number of possibilities. = Would it be feasible to just have bhyve broker between a pass(4) device = on the host and virtio_scsi(4) in the guest? That would require the = guest devices (be they local disks, iSCSI LUNs, etc) be connected to the = host but I'm not sure that's a huge concern. The host will always have a = high level of access to the guest's data. (Plus, there's nothing = preventing a guest from doing its own iSCSI, etc. after it boots). Using = the existing kernel infrastructure (CAM, iSCSI initiator, etc) would = also remove the need to duplicate any of that in userland, wouldn't it? The user-level loader is necessary for now but once UEFI support exists = in bhyve the external loader can go away. Any workarounds like you've = described above would similarly be temporary. Using Qemu+KVM on Linux as a comparison point, there are examples of = both kernel-level and user-level access by the host to guest disks. = Local disk images (be they raw or qcow2) are obviously manipulated by = the Qemu process from userland. RBD (Ceph/RADOS network block device) is = in userland. SRP (SCSI RDMA Protocol) is in kernel. There are a few ways = to do host- and/or kernel-based iSCSI. There is also a userland option = if you link Qemu against libiscsi when you build it. If we do ever want = userland iSCSI support, libiscsi does claim to be "pure POSIX" and to = have been tested on FreeBSD, among others. JN