Date: Fri, 21 Feb 2014 10:15:15 -0700 From: John Nielsen <lists@jnielsen.net> To: Bryan Venteicher <bryanv@freebsd.org> Cc: "freebsd-stable@freebsd.org Stable" <freebsd-stable@freebsd.org> Subject: Re: recovering from or increasing timeouts on virtio block device Message-ID: <FB4CC1CC-FF06-4354-87D4-72DB79CB7D3C@jnielsen.net> In-Reply-To: <CAGaYwLf%2BEhtUjLGfz6GynCGe3SwFijETLaqDxNjYA5rpN-HOHQ@mail.gmail.com> References: <920CC320-1A95-46E2-BB18-B6987805885E@jnielsen.net> <18D133C0-E71B-4E66-A13F-6DC3B1BF620C@FreeBSD.org> <6F4E2014-5489-4055-962C-4DFC6184A18E@jnielsen.net> <CAGaYwLf%2BEhtUjLGfz6GynCGe3SwFijETLaqDxNjYA5rpN-HOHQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 18, 2014, at 10:14 AM, Bryan Venteicher <bryanv@freebsd.org> = wrote: > On Tue, Feb 18, 2014 at 10:57 AM, John Nielsen <lists@jnielsen.net> = wrote: >> On Feb 18, 2014, at 3:32 AM, Edward Tomasz Napiera=B3a = <trasz@freebsd.org> wrote: >>=20 >> > Wiadomo=B6=E6 napisana przez John Nielsen w dniu 17 lut 2014, o = godz. 21:21: >> >> I run several FreeBSD virtual machines in a Linux KVM environment = with a SAN. The VMs use virtio block storage, and the KVM hosts map the = virtual volumes to targets on the SAN. Occasionally, failover or other = maintenance events on the SAN cause it to be unavailable for 30+ = seconds. When this happens, the FreeBSD VMs have hard failures on the = vtbd* devices, and thereafter any attempted reads or writes return = immediately with an error (even after the SAN is responsive again). The = only way to recover a VM once that happens is to hard boot it. >> >> >> >> Is there any way to adjust the timeouts or enable some kind of = retry for the virtio block devices? It would be nice to be able to = recover gracefully after a SAN event without needing to reboot the VMs. >> > >> > Use gmountver(8) perhaps? >>=20 >> Thanks for the tip (and for writing it :), I haven't encountered that = one before. I will experiment with it but I'm not sure it's a fit for = this particular scenario (at least not by itself). When a SAN event = happens the virtual machine's vtbd0 device doesn't disappear, the = underlying hardware just fails to respond for a long-ish time. I suspect = that the driver gives up after either a certain length of time or number = of errors, but my C driver-fu isn't up to figuring it out exactly. Once = it gives up, any I/O requests to the (still "present") device fail = immediately, and I can't see a way to get the driver to actually try any = (new or old) I/O again. >=20 > The vtbd driver has no internal retry mechanism, and pays no attention = to errors other than report then, and never gives up :) >=20 > It is not clear to me whether IO is getting turned around in FreeBSD = before it reaches the driver, or within the host. Do you continue to see = "hard error ..." messages on the console? Thanks for chiming in. I was in too much of a hurry to get the VM = running again last time the issue appeared to capture any useful log = messages, and of course none of them were committed to disk so nothing = was available following a reboot. I will see what I can get next time it happens and follow up on this = thread again. JN
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FB4CC1CC-FF06-4354-87D4-72DB79CB7D3C>