From owner-freebsd-xen@freebsd.org Wed Sep 20 14:22:02 2017 Return-Path: Delivered-To: freebsd-xen@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B5D4E108D2 for ; Wed, 20 Sep 2017 14:22:02 +0000 (UTC) (envelope-from prvs=429837174=roger.pau@citrix.com) Received: from SMTP.EU.CITRIX.COM (smtp.ctxuk.citrix.com [185.25.65.24]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (Client CN "mail.citrix.com", Issuer "DigiCert SHA2 Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A44556D2DA for ; Wed, 20 Sep 2017 14:22:00 +0000 (UTC) (envelope-from prvs=429837174=roger.pau@citrix.com) X-IronPort-AV: E=Sophos;i="5.42,421,1500940800"; d="scan'208";a="53157682" Date: Wed, 20 Sep 2017 12:44:18 +0100 From: Roger Pau =?iso-8859-1?Q?Monn=E9?= To: Karl Pielorz CC: Subject: Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer? Message-ID: <20170920114418.pq6fhnexol2mvkxv@dhcp-3-128.uk.xensource.com> References: <62BC29D8E1F6EA5C09759861@[10.12.30.106]> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <62BC29D8E1F6EA5C09759861@[10.12.30.106]> User-Agent: NeoMutt/20170714 (1.8.3) X-ClientProxiedBy: AMSPEX02CAS01.citrite.net (10.69.22.112) To AMSPEX02CL02.citrite.net (10.69.22.126) X-BeenThere: freebsd-xen@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion of the freebsd port to xen - implementation and usage List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Sep 2017 14:22:02 -0000 On Wed, Sep 20, 2017 at 11:35:26AM +0100, Karl Pielorz wrote: > > Hi All, > > We recently experienced an "unplanned storage" fail over on our XenServer > pool. The pool is 7.1 based (on certified HP kit), and runs a mix of FreeBSD > (all 10.3 based except for a legacy 9.x VM) - and a few Windows VM's - > storage is provided by two Citrix certified Synology storage boxes. > > During the fail over - Xen see's the storage paths go down, and come up > again (re-attaching when they are available again). Timing this - it takes > around a minute, worst case. > > The process killed 99% of our FreeBSD VM's :( > > The earlier 9.x FreeBSD box survived, and all the Windows VM's survived. > > Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant of > the I/O delays that occur during a storage fail over? Do you know whether the VMs saw the disks disconnecting and then connecting again? > I've enclosed some of the error we observed below. I realise a full storage > fail over is a 'stressful time' for VM's - but the Windows VM's, and earlier > FreeBSD version survived without issue. All the 10.3 boxes logged I/O > errors, and then panic'd / rebooted. > > We've setup a test lab with the same kit - and can now replicate this at > will (every time most to all the FreeBSD 10.x boxes panic and reboot, but > Windows prevails) - so we can test any potential fixes. > > So if anyone can suggest anything we can tweak to minimize the chances of > this happening (i.e. make I/O more timeout tolerant, or set larger > timeouts?) that'd be great. Hm, I have the feeling that part of the problem is that in-flight requests are basically lost when a disconnect/reconnect happens. Thanks, Roger.