Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Aug 2016 12:45:53 +0930
From:      Shane Ambler <FreeBSD@ShaneWare.Biz>
To:        freebsd-virtualization@freebsd.org
Subject:   bhyve in endless loop after filling host disk quota
Message-ID:  <57A404E9.4030504@ShaneWare.Biz>

next in thread | raw e-mail | index | archive | help
Hey guys,

I found a situation where a bhyve that is using a zvol as a single disk
zpool can start an endless loop of errors when the host's parent zfs
reaches it's quota while the zvol still has some free space.

The solution was to increase the parent quota, but while the bhyve
console outputs a continuous stream of errors it otherwise appears to be
unresponsive and locked up. I started the bhyve using bhyve-rc so didn't
have a visible console when it became unresponsive.


The host is running stable/10 r299401 on a corei5 8GB with a 3 disk
raidz zpool

The bhyve was boot up from an 11-beta2 dvd then installed onto a single
disk zpool using a geom zvol from the host. Once running I installed svn
then checked out current (r303678) then built and installed current
with a generic kernel that had the debug options removed.
/usr/ports and distfiles are read from the host by nfs.

After setting up poudriere and building some ports it seems a lot more
disk space has been used than expected which has lead to a situation
where the bhyve system is spewing out a continuous stream of error
messages and is 99% unresponsive.

I say 99% unresponsive as nginx inside the bhyve was able to serve the
poudriere info and log files. The existing ssh session that started
poudriere was unresponsive and I was unable to start a new ssh session.
I could tmux into the bhyve console and see a stream of errors, I could
see a login prompt between errors and entered a login and password but
the login didn't happen until the bhyve was responsive again.

The last port build completed about 5 hours ago so this would appear to
have created an endless loop.

I found the situation appears to have arisen from the fact that the zvol
was given a size of 100G and it's parent zfs has a quota of 300G, with
copies=2 and some other 20 to 30G disk images, the 300G has been met,
while inside the bhyve it should be seeing a 100G disk with some free
space.

After adjusting the quota and regaining access, inside the bhyve the
zpool reports ALLOC=32.8G FREE=62.7G so some extra disk usage is being
used without being obvious.

Some relevant host zvol info -
copies            2
used              177G
volsize           100G
usedbysnapshots   35.6G
usedbydataset     141G
written           28.3G
logicalused       86.6G
logicalreferenced 68.0G
volmode           geom

A sample of the errors - (with the last two numbers changing)

vtbd0: hard error cmd=write 116882384-116882639
vtbd0: hard error cmd=write 116884960-116885215
vtbd0: hard error cmd=write 116885216-116885471
vtbd0: hard error cmd=write 116885472-116885727
vtbd0: hard error cmd=write 116885728-116885983
vtbd0: hard error cmd=write 60261136-60261151
vtbd0: hard error cmd=write 60261152-60261167
vtbd0: hard error cmd=write 60261168-60261183
vtbd0: hard error cmd=write 116882640-116882719
vtbd0: hard error cmd=write 116882720-116882735

-- 
FreeBSD - the place to B...Software Developing

Shane Ambler




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57A404E9.4030504>