Date: Tue, 28 Jun 2022 13:38:59 +0200 From: Roger Pau =?utf-8?B?TW9ubsOp?= <roger.pau@citrix.com> To: Brian Buhrow <buhrow@nfbcal.org> Cc: freebsd-xen@freebsd.org Subject: Re: Some kind of race condition in adding and removing domu's causes vm zombies Message-ID: <YrroU0uUKtlA/Vww@Air-de-Roger> In-Reply-To: <202206240130.25O1Uul3002911@nfbcal.org> References: <202206240130.25O1Uul3002911@nfbcal.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jun 23, 2022 at 06:30:56PM -0700, Brian Buhrow wrote: > hello. I don't have a lot more details on the issue, but under xen-4.15 and xen-4.16 with > freeBSD-12 and FreeBSD-13, it's pretty easy to end up with zombie domu's that are unkillable > and unrestartable. Even worse, the block devices associated with these not-quite-gone domus' > are unusable with other domu's without an entire system reboot. > > How to reproduce: > > 1. Shutdown a vm that's currently running, I'm using NetBSD, but FreeBSD domus' wil > demonstrate this behavior as well. > > > 2. If auto-restart is set in the domu's conf file, the domu will restart with a new domain id. > > 3. Just as the newly restarted domu is coming up, issue: > xl destroy <domid-of-newly-started-domain> > > You may see output like the following: > > root# xl destroy 20 > libxl: error: libxl_device.c:1111:device_backend_callback: Domain 20:unable to remove device > with pa > th /local/domain/0/backend/vbd/20/768 > libxl: error: libxl_device.c:1111:device_backend_callback: Domain 20:unable to remove device > with pa > th /local/domain/0/backend/vif/20/0 > libxl: error: libxl_domain.c:1530:devices_destroy_cb: Domain 20:libxl__devices_destroy failed > > Now, issue: > #xl list > (null) 20 0 1 --p--d 2083.7 > > The work around I've found for this issue is to shutdown the domu with the -h flag, causing the > system to wait for a final keypress on the console before rebooting. Then, while it's waiting, > issue the xl destroy command on the old, waiting, domain ID. > > this work around will prevent the issue, but it's my view that I shouldn't be able to wedge the > destruction process in this way such that the entire machine needs to be restarted. Being able > to do this makes the system rather fragile. Hm, I don't seem to be able to reproduce this on HEAD. Could you give a try to a HEAD kernel and see whether you can reproduce? (keep the same userland, that should be fine). Thanks, Roger.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YrroU0uUKtlA/Vww>