From owner-freebsd-virtualization@FreeBSD.ORG Sun Feb 2 05:25:00 2014 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 42491706 for ; Sun, 2 Feb 2014 05:25:00 +0000 (UTC) Received: from mail-pd0-x232.google.com (mail-pd0-x232.google.com [IPv6:2607:f8b0:400e:c02::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 197DB1DE8 for ; Sun, 2 Feb 2014 05:25:00 +0000 (UTC) Received: by mail-pd0-f178.google.com with SMTP id y13so5766240pdi.9 for ; Sat, 01 Feb 2014 21:24:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Uy51HhcUvNXPbw6KcNPtRr8zXtxAeo11uDOqMLtjx4Q=; b=ZHaXVu5AuroYpzVSBF+v9oDntG7oF+pRsp2ADBN3SJeM69qGamGOiO30IXKv5bHm5y knvfXvWrSzS8Nzeahd6caon97m7UDbAre5TgAy8aBkw5dkSvky6v3FUYB601ymNIMKrd HcRooY5i2ABy9/vGug3BmyHBi6sfRBbJ97dcZWYS34N7UbhHSu4VtSq1CexlbzdXZrbp aZ86y7r9+R15A3BfGPmu6g37mnXoPmSrMitYL9J4ZcAluZVmI+iuflwS3V+qMUJKHndn Ufy7f5SUCVvxuFYmn4j/HNraTXGI3dF1SBsLOHqjZOccj1w0/3T5Hkxs5tO/S/APDDfW g8XA== MIME-Version: 1.0 X-Received: by 10.66.192.74 with SMTP id he10mr30076027pac.126.1391318699678; Sat, 01 Feb 2014 21:24:59 -0800 (PST) Received: by 10.68.155.38 with HTTP; Sat, 1 Feb 2014 21:24:59 -0800 (PST) Date: Sun, 2 Feb 2014 00:24:59 -0500 Message-ID: Subject: RFC: hyperv disk i/o performance vs. data integrity From: Aryeh Friedman To: "freebsd-virtualization@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Feb 2014 05:25:00 -0000 Disclaimer: This is more of a thinking out loud then it is an definative set of suggestions on the matter. Also a cleaned up version of this will likely become PetiteCloud's white paper on storage and disaster recovery. I do not make any promises to when any of it might be implemented and/or if it will be implemented in the manner described here. Looking at the link Peter provided in an other thread: http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaat%2Fliaatbpkvmguestcache.htm I at first did my standard "OpenStack got it wrong and PetiteCloud got it right" reaction to it. I then read deeper and saw that every last modethat offered reasonable performance also was considered to be untrustworthy especially in the case of a power failure. The one exception seems to be if your going straight to physical disk then if you can use "none" and get reasonable performance without the issues associated with abrupt disconnects like power failure or the sudden death of the hyperv process. So it seems we are stuck with sucky disk performance. That is until we make an other interesting observation that TCP offers the guarantee of never been more then a few packets out of sync and being 100% reliable if the network is functioning properly. At first it might not seem that a network would ever be faster then disk. We forget though that we are talking virtualization and not real networks here so there is no reason why we can not form networks between instances on the same host and no matter how inefficent the packet drivers are they are surely faster then any disk if we only consider transport on the host's motherboard and not between hosts. Craig Rodrigues and the FreeNAS team have done a fantastic job already (I have not personally tried FreeNAS yet but I have heard nothing but good things about it) and making it so it can run on a bhyve instance. Given that the following local machine only archicture might make sense to act as a solution to the performance vs. safety problem in the hyperv's: Host +------ Storage (both local and remote) | +------ FreeNAS instance (as little RAM and VCPU's as possible) | +------ Production instances The FreeNAS node would distribute it's storage via iSCSI or the equiv. Setting the rule that all "primary" iSCSI sessions/devices be local (in case vs. on rack or somewhere else in the data center) would eliminate the power failure nightmare that OpenStack seems to have http://docs.openstack.org/admin-guide-cloud/content/ch_introduction-to-openstack-compute.html#section_nova-disaster-recovery-processwithout killing performance (in many cases increasing it). The reason is it is not an issue since we isolated all the remote disk sessions to one instance we have used the "blast wall" capability of virtualization. Namely if FreeNAS blows up we just swap in an other FreeNAS instance with the same devices attached and the using normal OS (host and guest) facilities it should be trivial to reconnect the device to the guest (you will just have to give up the idea of a session that outlives the devices power cycle though) and do basic recovery. Now that we have offloaded the storage from the hyperv all the other aspects of backup/recovery can be done using normal OS facilities instead of the cloud platform. (Real) network storage will need to use a completely different model likely though if you allow it to be passed to the guest OS. -- Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org