From owner-freebsd-virtualization@FreeBSD.ORG  Sun Feb  2 05:25:00 2014
Return-Path: <owner-freebsd-virtualization@FreeBSD.ORG>
Delivered-To: freebsd-virtualization@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 42491706
 for <freebsd-virtualization@freebsd.org>; Sun,  2 Feb 2014 05:25:00 +0000 (UTC)
Received: from mail-pd0-x232.google.com (mail-pd0-x232.google.com
 [IPv6:2607:f8b0:400e:c02::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 197DB1DE8
 for <freebsd-virtualization@freebsd.org>; Sun,  2 Feb 2014 05:25:00 +0000 (UTC)
Received: by mail-pd0-f178.google.com with SMTP id y13so5766240pdi.9
 for <freebsd-virtualization@freebsd.org>; Sat, 01 Feb 2014 21:24:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=Uy51HhcUvNXPbw6KcNPtRr8zXtxAeo11uDOqMLtjx4Q=;
 b=ZHaXVu5AuroYpzVSBF+v9oDntG7oF+pRsp2ADBN3SJeM69qGamGOiO30IXKv5bHm5y
 knvfXvWrSzS8Nzeahd6caon97m7UDbAre5TgAy8aBkw5dkSvky6v3FUYB601ymNIMKrd
 HcRooY5i2ABy9/vGug3BmyHBi6sfRBbJ97dcZWYS34N7UbhHSu4VtSq1CexlbzdXZrbp
 aZ86y7r9+R15A3BfGPmu6g37mnXoPmSrMitYL9J4ZcAluZVmI+iuflwS3V+qMUJKHndn
 Ufy7f5SUCVvxuFYmn4j/HNraTXGI3dF1SBsLOHqjZOccj1w0/3T5Hkxs5tO/S/APDDfW
 g8XA==
MIME-Version: 1.0
X-Received: by 10.66.192.74 with SMTP id he10mr30076027pac.126.1391318699678; 
 Sat, 01 Feb 2014 21:24:59 -0800 (PST)
Received: by 10.68.155.38 with HTTP; Sat, 1 Feb 2014 21:24:59 -0800 (PST)
Date: Sun, 2 Feb 2014 00:24:59 -0500
Message-ID: <CAGBxaXmbtsHUCmLDok3LiYzZV7kRHeoL9Ma_g3Z0aka-Pg4HvQ@mail.gmail.com>
Subject: RFC: hyperv disk i/o performance vs. data integrity
From: Aryeh Friedman <aryeh.friedman@gmail.com>
To: "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
X-BeenThere: freebsd-virtualization@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Discussion of various virtualization techniques FreeBSD supports."
 <freebsd-virtualization.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-virtualization>, 
 <mailto:freebsd-virtualization-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-virtualization/>
List-Post: <mailto:freebsd-virtualization@freebsd.org>
List-Help: <mailto:freebsd-virtualization-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization>, 
 <mailto:freebsd-virtualization-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Feb 2014 05:25:00 -0000

Disclaimer: This is more of a thinking out loud then it is an definative
set of suggestions on the matter.   Also a cleaned up version of this will
likely become PetiteCloud's white paper on storage and disaster recovery.
I do not make any promises to when any of it might be implemented and/or if
it will be implemented in the manner described here.

Looking at the link Peter provided in an other thread:

http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaat%2Fliaatbpkvmguestcache.htm

I at first did my standard "OpenStack got it wrong and PetiteCloud got it
right" reaction to it.   I then read deeper and saw that every last
modethat offered reasonable performance also was considered to be
untrustworthy especially in the case of a power failure.   The one
exception seems to be if your going straight to physical disk then if you
can use "none" and get reasonable performance without the issues associated
with abrupt disconnects like power failure or the sudden death of the
hyperv process.

So it seems we are stuck with sucky disk performance.   That is until we
make an other interesting observation that TCP offers the guarantee of
never been more then a few packets out of sync and being 100% reliable if
the network is functioning properly.   At first it might not seem that a
network would ever be faster then disk.  We forget though that we are
talking virtualization and not real networks here so there is no reason why
we can not form networks between instances on the same host and no matter
how inefficent the packet drivers are they are surely faster then any disk
if we only consider transport on the host's motherboard and not between
hosts.

Craig Rodrigues and the FreeNAS team have done a fantastic job already (I
have not personally tried FreeNAS yet but I have heard nothing but good
things about it) and making it so it can run on a bhyve instance.   Given
that the following local machine only archicture might make sense to act as
a solution to the performance vs. safety problem in the hyperv's:


Host +------ Storage (both local and remote)
        |
        +------ FreeNAS instance (as little RAM and VCPU's as possible)
        |
        +------ Production instances

The FreeNAS node would distribute it's storage via iSCSI or the equiv.
Setting the rule that all "primary" iSCSI sessions/devices be local (in
case vs. on rack or somewhere else in the data center) would eliminate the
power failure nightmare that OpenStack seems to have
http://docs.openstack.org/admin-guide-cloud/content/ch_introduction-to-openstack-compute.html#section_nova-disaster-recovery-processwithout
killing performance (in many cases increasing it).   The reason is
it is not an issue since we isolated all the remote disk sessions to one
instance we have used the "blast wall" capability of virtualization.
Namely if FreeNAS blows up we just swap in an other FreeNAS instance with
the same devices attached and the using normal OS (host and guest)
facilities it should be trivial to reconnect the device to the guest (you
will just have to give up the idea of a session that outlives the devices
power cycle though) and do basic recovery.   Now that we have offloaded the
storage from the hyperv all the other aspects of backup/recovery can be
done using normal OS facilities instead of the cloud platform.

(Real) network storage will need to use a completely different model likely
though if you allow it to be passed to the guest OS.

-- 
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org