Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jun 2012 03:05:48 -0400
From:      Steve Tuts <yiz5hwi@gmail.com>
To:        Kevin Oberman <kob6558@gmail.com>
Cc:        freebsd-emulation@freebsd.org
Subject:   Re: one virtualbox vm disrupts all vms and entire network
Message-ID:  <CAEXKtDqnjEP75m-fwyWrG-TTj-t0jCKb-==Z1pLX5iOwiuJyXg@mail.gmail.com>
In-Reply-To: <CAN6yY1sVaqw3NPAraW452Rg7807q8zi1O=g2D_QZ93q4UE_Gqw@mail.gmail.com>
References:  <CAEXKtDreCQ0O4NAi5opGm_KnR4As=dDvc-zP5Z0z5g84GQQuyg@mail.gmail.com> <assp.050217e07a.6a54445e6fa4183cff3692d9deed5635@ringofsaturn.com> <CAEXKtDrG%2Byj%2B4vOhhKrQcC6h9mEeFOtzHtJaV-UgPMrdn3xisQ@mail.gmail.com> <e1037b202b887d93142a4e693784f874@bluelife.at> <b8f2877663fb73d56222180f8b74cc81@bluelife.at> <CAEXKtDqR0F7btne62C%2Bw90qnWfP3kkU-E851b8WOf26yKeGBhg@mail.gmail.com> <CAN6yY1sVaqw3NPAraW452Rg7807q8zi1O=g2D_QZ93q4UE_Gqw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 6, 2012 at 7:07 PM, Kevin Oberman <kob6558@gmail.com> wrote:

> On Wed, Jun 6, 2012 at 3:46 PM, Steve Tuts <yiz5hwi@gmail.com> wrote:
> > On Wed, Jun 6, 2012 at 3:50 AM, Bernhard Froehlich <decke@freebsd.org
> >wrote:
> >
> >> On 05.06.2012 20:16, Bernhard Froehlich wrote:
> >>
> >>> On 05.06.2012 19:05, Steve Tuts wrote:
> >>>
> >>>> On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl <rnejdl@ringofsaturn.com>
> >>>> wrote:
> >>>>
> >>>>  On 2012-06-02 12:16, Steve Tuts wrote:
> >>>>>
> >>>>>  Hi, we have a Dell poweredge server with a dozen interfaces.  It
> hosts
> >>>>>> a
> >>>>>> few guests of web app and email servers with VirtualBox-4.0.14.  The
> >>>>>> host
> >>>>>> and all guests are FreeBSD 9.0 64bit.  Each guest is bridged to a
> >>>>>> distinct
> >>>>>> interface.  The host and all guests are set to 10.0.0.0 network
> NAT'ed
> >>>>>> to
> >>>>>> a
> >>>>>> cicso router.
> >>>>>>
> >>>>>> This runs well for a couple months, until we added a new guest
> >>>>>> recently.
> >>>>>> Every few hours, none of the guests can be connected.  We can only
> >>>>>> connect
> >>>>>> to the host from outside the router.  We can also go to the console
> of
> >>>>>> the
> >>>>>> guests (except the new guest), but from there we can't ping the
> gateway
> >>>>>> 10.0.0.1 any more.  The new guest just froze.
> >>>>>>
> >>>>>> Furthermore, on the host we can see a vboxheadless process for each
> >>>>>> guest,
> >>>>>> including the new guest.  But we can not kill it, not even with
> "kill
> >>>>>> -9".
> >>>>>> We looked around the web and someone suggested we should use "kill
> >>>>>> -SIGCONT" first since the "ps" output has the "T" flag for that
> >>>>>> vboxheadless process for that new guest, but that doesn't help.  We
> >>>>>> also
> >>>>>> tried all the VBoxManager commands to poweroff/reset etc that new
> >>>>>> guest,
> >>>>>> but they all failed complaining that vm is in Aborted state.  We
> also
> >>>>>> tried
> >>>>>> VBoxManager commands to disconnect the network cable for that new
> >>>>>> guest,
> >>>>>> it
> >>>>>> didn't complain, but there was no effect.
> >>>>>>
> >>>>>> For a couple times, on the host we disabled the interface bridging
> that
> >>>>>> new
> >>>>>> guest, then that vboxheadless process for that new guest disappeared
> >>>>>> (we
> >>>>>> attempted to kill it before that).  And immediately all other vms
> >>>>>> regained
> >>>>>> connection back to normal.
> >>>>>>
> >>>>>> But there is one time even the above didn't help - the vboxheadless
> >>>>>> process
> >>>>>> for that new guest stubbonly remains, and we had to reboot the host.
> >>>>>>
> >>>>>> This is already a production server, so we can't upgrade virtualbox
> to
> >>>>>> the
> >>>>>> latest version until we obtain a test server.
> >>>>>>
> >>>>>> Would you advise:
> >>>>>>
> >>>>>> 1. is there any other way to kill that new guest instead of
> rebooting?
> >>>>>> 2. what might cause the problem?
> >>>>>> 3. what setting and test I can do to analyze this problem?
> >>>>>> ______________________________****_________________
> >>>>>>
> >>>>>>
> >>>>> I haven't seen any comments on this and don't want you to think you
> are
> >>>>> being ignored but I haven't seen this but also, the 4.0 branch was
> >>>>> buggier
> >>>>> for me than the 4.1 releases so yeah, upgrading is probably what you
> are
> >>>>> looking at.
> >>>>>
> >>>>> Rusty Nejdl
> >>>>> ______________________________****_________________
> >>>>>
> >>>>>
> >>>>>  sorry, just realize my reply yesterday didn't go to the list, so am
> >>>> re-sending with some updates.
> >>>>
> >>>> Yes, we upgraded all ports and fortunately everything went back and
> >>>> especially all vms has run peacefully for two days now.  So upgrading
> to
> >>>> the latest virtualbox 4.1.16 solved that problem.
> >>>>
> >>>> But now we got a new problem with this new version of virtualbox:
> >>>> whenever
> >>>> we try to vnc to any vm, that vm will go to Aborted state immediately.
> >>>> Actually, merely telnet from within the host to the vnc port of that
> vm
> >>>> will immediately Abort that vm.  This prevents us from adding new vms.
> >>>> Also, when starting vm with vnc port, we got this message:
> >>>>
> >>>> rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in use
> >>>>
> >>>> , which we found someone else provided a patch at
> >>>> http://permalink.gmane.org/**gmane.os.freebsd.devel.**emulation/10237
> <http://permalink.gmane.org/gmane.os.freebsd.devel.emulation/10237>;
> >>>>
> >>>> So looks like when there are multiple vms on a ipv6 system (we have
> 64bit
> >>>> FreeBSD 9.0) will get this problem.
> >>>>
> >>>
> >>> Glad to hear that 4.1.16 helps for the networking problem. The VNC
> problem
> >>> is also a known one but the mentioned patch does not work at least for
> a
> >>> few people. It seems the bug is somewhere in libvncserver so
> downgrading
> >>> net/libvncserver to an earlier version (and rebuilding virtualbox)
> should
> >>> help until we come up with a proper fix.
> >>>
> >>
> >> You are right about the "Address already in use" problem and the patch
> for
> >> it so I will commit the fix in a few moments.
> >>
> >> I have also tried to reproduce the VNC crash but I couldn't. Probably
> >> because
> >> my system is IPv6 enabled. flo@ has seen the same crash and has no
> IPv6 in
> >> his kernel which lead him to find this commit in libvncserver:
> >>
> >>
> >> commit 66282f58000c8863e104666c30cb67**b1d5cbdee3
> >> Author: Kyle J. McKay <mackyle@gmail.com>
> >> Date:   Fri May 18 00:30:11 2012 -0700
> >>     libvncserver/sockets.c: do not segfault when listenSock/listen6Sock
> ==
> >> -1
> >>
> >> http://libvncserver.git.**
> sourceforge.net/git/gitweb.**cgi?p=libvncserver/
> >> **libvncserver;a=commit;h=**66282f5<
> http://libvncserver.git.sourceforge.net/git/gitweb.cgi?p=libvncserver/libvncserver;a=commit;h=66282f5
> >
> >>
> >>
> >> It looks promising so please test this patch if you can reproduce the
> >> crash.
> >>
> >>
> >> --
> >> Bernhard Froehlich
> >> http://www.bluelife.at/
> >>
> >
> > Sorry, I tried to try this patch, but couldn't figure out how to do that.
> > I use ports to compile everything, and can see the file is at
> >
> /usr/ports/net/libvncserver/work/LibVNCServer-0.9.9/libvncserver/sockets.c
> > .  However, if I edit this file and do make clean, this patch is wiped
> out
> > before I can do "make" out of it.  How to apply this patch in the ports?
>
> To apply patches to ports:
> # make clean
> # make patch
> <Apply patch>
> # make
> # make deinstall
> # make reinstall
>
> Note that the final two steps assume a version of the port is already
> installed. If not: 'make install'
> I you use portmaster, after applying the patch: 'portmaster -C
> net/libvncserver'
> --
> R. Kevin Oberman, Network Engineer
> E-mail: kob6558@gmail.com
>

Thanks.  Last night I realized we are using make :-) so I just did:

cd /usr/ports/net/libvncserver/work/LibVNCServer-0.9.9
make
cd /usr/ports/net/libvncserver
pkg_delete virtualbox-ose-4.1.16_1
make reinstall
cd /usr/ports/emulation/virtualbox-ose
make reinstall

Note I have quite a bunch of servers running.  When I just kill one of the
vm and then start it with vnc, I still get that message:

rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in use

HOWEVER, now telnetting to that vm's vnc port woNOT kill it, and
furthermore I can VNC to it now!

I suspect that rfbListenOnTCP6Port error will disappear if I reboot the
machine or kill all vms.

Is this a valid test?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAEXKtDqnjEP75m-fwyWrG-TTj-t0jCKb-==Z1pLX5iOwiuJyXg>