From owner-freebsd-xen@freebsd.org  Wed Sep 20 14:22:02 2017
Return-Path: <owner-freebsd-xen@freebsd.org>
Delivered-To: freebsd-xen@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B5D4E108D2
 for <freebsd-xen@mailman.ysv.freebsd.org>;
 Wed, 20 Sep 2017 14:22:02 +0000 (UTC)
 (envelope-from prvs=429837174=roger.pau@citrix.com)
Received: from SMTP.EU.CITRIX.COM (smtp.ctxuk.citrix.com [185.25.65.24])
 (using TLSv1.2 with cipher RC4-SHA (128/128 bits))
 (Client CN "mail.citrix.com",
 Issuer "DigiCert SHA2 Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A44556D2DA
 for <freebsd-xen@freebsd.org>; Wed, 20 Sep 2017 14:22:00 +0000 (UTC)
 (envelope-from prvs=429837174=roger.pau@citrix.com)
X-IronPort-AV: E=Sophos;i="5.42,421,1500940800"; d="scan'208";a="53157682"
Date: Wed, 20 Sep 2017 12:44:18 +0100
From: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>
To: Karl Pielorz <kpielorz_lst@tdx.co.uk>
CC: <freebsd-xen@freebsd.org>
Subject: Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?
Message-ID: <20170920114418.pq6fhnexol2mvkxv@dhcp-3-128.uk.xensource.com>
References: <62BC29D8E1F6EA5C09759861@[10.12.30.106]>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <62BC29D8E1F6EA5C09759861@[10.12.30.106]>
User-Agent: NeoMutt/20170714 (1.8.3)
X-ClientProxiedBy: AMSPEX02CAS01.citrite.net (10.69.22.112) To
 AMSPEX02CL02.citrite.net (10.69.22.126)
X-BeenThere: freebsd-xen@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Discussion of the freebsd port to xen - implementation and usage
 <freebsd-xen.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-xen>,
 <mailto:freebsd-xen-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-xen/>
List-Post: <mailto:freebsd-xen@freebsd.org>
List-Help: <mailto:freebsd-xen-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-xen>,
 <mailto:freebsd-xen-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Sep 2017 14:22:02 -0000

On Wed, Sep 20, 2017 at 11:35:26AM +0100, Karl Pielorz wrote:
> 
> Hi All,
> 
> We recently experienced an "unplanned storage" fail over on our XenServer
> pool. The pool is 7.1 based (on certified HP kit), and runs a mix of FreeBSD
> (all 10.3 based except for a legacy 9.x VM) - and a few Windows VM's -
> storage is provided by two Citrix certified Synology storage boxes.
> 
> During the fail over - Xen see's the storage paths go down, and come up
> again (re-attaching when they are available again). Timing this - it takes
> around a minute, worst case.
> 
> The process killed 99% of our FreeBSD VM's :(
> 
> The earlier 9.x FreeBSD box survived, and all the Windows VM's survived.
> 
> Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant of
> the I/O delays that occur during a storage fail over?

Do you know whether the VMs saw the disks disconnecting and then
connecting again?

> I've enclosed some of the error we observed below. I realise a full storage
> fail over is a 'stressful time' for VM's - but the Windows VM's, and earlier
> FreeBSD version survived without issue. All the 10.3 boxes logged I/O
> errors, and then panic'd / rebooted.
> 
> We've setup a test lab with the same kit - and can now replicate this at
> will (every time most to all the FreeBSD 10.x boxes panic and reboot, but
> Windows prevails) - so we can test any potential fixes.
> 
> So if anyone can suggest anything we can tweak to minimize the chances of
> this happening (i.e. make I/O more timeout tolerant, or set larger
> timeouts?) that'd be great.

Hm, I have the feeling that part of the problem is that in-flight
requests are basically lost when a disconnect/reconnect happens.

Thanks, Roger.