Date: Wed, 20 Sep 2017 12:44:18 +0100 From: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com> To: Karl Pielorz <kpielorz_lst@tdx.co.uk> Cc: <freebsd-xen@freebsd.org> Subject: Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer? Message-ID: <20170920114418.pq6fhnexol2mvkxv@dhcp-3-128.uk.xensource.com> In-Reply-To: <62BC29D8E1F6EA5C09759861@[10.12.30.106]> References: <62BC29D8E1F6EA5C09759861@[10.12.30.106]>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 20, 2017 at 11:35:26AM +0100, Karl Pielorz wrote: > > Hi All, > > We recently experienced an "unplanned storage" fail over on our XenServer > pool. The pool is 7.1 based (on certified HP kit), and runs a mix of FreeBSD > (all 10.3 based except for a legacy 9.x VM) - and a few Windows VM's - > storage is provided by two Citrix certified Synology storage boxes. > > During the fail over - Xen see's the storage paths go down, and come up > again (re-attaching when they are available again). Timing this - it takes > around a minute, worst case. > > The process killed 99% of our FreeBSD VM's :( > > The earlier 9.x FreeBSD box survived, and all the Windows VM's survived. > > Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant of > the I/O delays that occur during a storage fail over? Do you know whether the VMs saw the disks disconnecting and then connecting again? > I've enclosed some of the error we observed below. I realise a full storage > fail over is a 'stressful time' for VM's - but the Windows VM's, and earlier > FreeBSD version survived without issue. All the 10.3 boxes logged I/O > errors, and then panic'd / rebooted. > > We've setup a test lab with the same kit - and can now replicate this at > will (every time most to all the FreeBSD 10.x boxes panic and reboot, but > Windows prevails) - so we can test any potential fixes. > > So if anyone can suggest anything we can tweak to minimize the chances of > this happening (i.e. make I/O more timeout tolerant, or set larger > timeouts?) that'd be great. Hm, I have the feeling that part of the problem is that in-flight requests are basically lost when a disconnect/reconnect happens. Thanks, Roger.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170920114418.pq6fhnexol2mvkxv>