Date: Fri, 5 Aug 2016 12:45:53 +0930 From: Shane Ambler <FreeBSD@ShaneWare.Biz> To: freebsd-virtualization@freebsd.org Subject: bhyve in endless loop after filling host disk quota Message-ID: <57A404E9.4030504@ShaneWare.Biz>
next in thread | raw e-mail | index | archive | help
Hey guys, I found a situation where a bhyve that is using a zvol as a single disk zpool can start an endless loop of errors when the host's parent zfs reaches it's quota while the zvol still has some free space. The solution was to increase the parent quota, but while the bhyve console outputs a continuous stream of errors it otherwise appears to be unresponsive and locked up. I started the bhyve using bhyve-rc so didn't have a visible console when it became unresponsive. The host is running stable/10 r299401 on a corei5 8GB with a 3 disk raidz zpool The bhyve was boot up from an 11-beta2 dvd then installed onto a single disk zpool using a geom zvol from the host. Once running I installed svn then checked out current (r303678) then built and installed current with a generic kernel that had the debug options removed. /usr/ports and distfiles are read from the host by nfs. After setting up poudriere and building some ports it seems a lot more disk space has been used than expected which has lead to a situation where the bhyve system is spewing out a continuous stream of error messages and is 99% unresponsive. I say 99% unresponsive as nginx inside the bhyve was able to serve the poudriere info and log files. The existing ssh session that started poudriere was unresponsive and I was unable to start a new ssh session. I could tmux into the bhyve console and see a stream of errors, I could see a login prompt between errors and entered a login and password but the login didn't happen until the bhyve was responsive again. The last port build completed about 5 hours ago so this would appear to have created an endless loop. I found the situation appears to have arisen from the fact that the zvol was given a size of 100G and it's parent zfs has a quota of 300G, with copies=2 and some other 20 to 30G disk images, the 300G has been met, while inside the bhyve it should be seeing a 100G disk with some free space. After adjusting the quota and regaining access, inside the bhyve the zpool reports ALLOC=32.8G FREE=62.7G so some extra disk usage is being used without being obvious. Some relevant host zvol info - copies 2 used 177G volsize 100G usedbysnapshots 35.6G usedbydataset 141G written 28.3G logicalused 86.6G logicalreferenced 68.0G volmode geom A sample of the errors - (with the last two numbers changing) vtbd0: hard error cmd=write 116882384-116882639 vtbd0: hard error cmd=write 116884960-116885215 vtbd0: hard error cmd=write 116885216-116885471 vtbd0: hard error cmd=write 116885472-116885727 vtbd0: hard error cmd=write 116885728-116885983 vtbd0: hard error cmd=write 60261136-60261151 vtbd0: hard error cmd=write 60261152-60261167 vtbd0: hard error cmd=write 60261168-60261183 vtbd0: hard error cmd=write 116882640-116882719 vtbd0: hard error cmd=write 116882720-116882735 -- FreeBSD - the place to B...Software Developing Shane Ambler
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57A404E9.4030504>