Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 06 Jun 2012 09:50:35 +0200
From:      Bernhard Froehlich <decke@FreeBSD.org>
To:        Steve Tuts <yiz5hwi@gmail.com>
Cc:        freebsd-emulation@freebsd.org
Subject:   Re: one virtualbox vm disrupts all vms and entire network
Message-ID:  <b8f2877663fb73d56222180f8b74cc81@bluelife.at>
In-Reply-To: <e1037b202b887d93142a4e693784f874@bluelife.at>
References:  <CAEXKtDreCQ0O4NAi5opGm_KnR4As=dDvc-zP5Z0z5g84GQQuyg@mail.gmail.com> <assp.050217e07a.6a54445e6fa4183cff3692d9deed5635@ringofsaturn.com> <CAEXKtDrG%2Byj%2B4vOhhKrQcC6h9mEeFOtzHtJaV-UgPMrdn3xisQ@mail.gmail.com> <e1037b202b887d93142a4e693784f874@bluelife.at>

next in thread | previous in thread | raw e-mail | index | archive | help
On 05.06.2012 20:16, Bernhard Froehlich wrote:
> On 05.06.2012 19:05, Steve Tuts wrote:
>> On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl 
>> <rnejdl@ringofsaturn.com> wrote:
>>
>>> On 2012-06-02 12:16, Steve Tuts wrote:
>>>
>>>> Hi, we have a Dell poweredge server with a dozen interfaces.  It 
>>>> hosts a
>>>> few guests of web app and email servers with VirtualBox-4.0.14.  
>>>> The host
>>>> and all guests are FreeBSD 9.0 64bit.  Each guest is bridged to a 
>>>> distinct
>>>> interface.  The host and all guests are set to 10.0.0.0 network 
>>>> NAT'ed to
>>>> a
>>>> cicso router.
>>>>
>>>> This runs well for a couple months, until we added a new guest 
>>>> recently.
>>>> Every few hours, none of the guests can be connected.  We can only 
>>>> connect
>>>> to the host from outside the router.  We can also go to the 
>>>> console of the
>>>> guests (except the new guest), but from there we can't ping the 
>>>> gateway
>>>> 10.0.0.1 any more.  The new guest just froze.
>>>>
>>>> Furthermore, on the host we can see a vboxheadless process for 
>>>> each guest,
>>>> including the new guest.  But we can not kill it, not even with 
>>>> "kill -9".
>>>> We looked around the web and someone suggested we should use "kill
>>>> -SIGCONT" first since the "ps" output has the "T" flag for that
>>>> vboxheadless process for that new guest, but that doesn't help.  
>>>> We also
>>>> tried all the VBoxManager commands to poweroff/reset etc that new 
>>>> guest,
>>>> but they all failed complaining that vm is in Aborted state.  We 
>>>> also
>>>> tried
>>>> VBoxManager commands to disconnect the network cable for that new 
>>>> guest,
>>>> it
>>>> didn't complain, but there was no effect.
>>>>
>>>> For a couple times, on the host we disabled the interface bridging 
>>>> that
>>>> new
>>>> guest, then that vboxheadless process for that new guest 
>>>> disappeared (we
>>>> attempted to kill it before that).  And immediately all other vms 
>>>> regained
>>>> connection back to normal.
>>>>
>>>> But there is one time even the above didn't help - the 
>>>> vboxheadless
>>>> process
>>>> for that new guest stubbonly remains, and we had to reboot the 
>>>> host.
>>>>
>>>> This is already a production server, so we can't upgrade 
>>>> virtualbox to the
>>>> latest version until we obtain a test server.
>>>>
>>>> Would you advise:
>>>>
>>>> 1. is there any other way to kill that new guest instead of 
>>>> rebooting?
>>>> 2. what might cause the problem?
>>>> 3. what setting and test I can do to analyze this problem?
>>>> ______________________________**_________________
>>>>
>>>
>>> I haven't seen any comments on this and don't want you to think you 
>>> are
>>> being ignored but I haven't seen this but also, the 4.0 branch was 
>>> buggier
>>> for me than the 4.1 releases so yeah, upgrading is probably what 
>>> you are
>>> looking at.
>>>
>>> Rusty Nejdl
>>> ______________________________**_________________
>>>
>>>
>> sorry, just realize my reply yesterday didn't go to the list, so am
>> re-sending with some updates.
>>
>> Yes, we upgraded all ports and fortunately everything went back and
>> especially all vms has run peacefully for two days now.  So 
>> upgrading to
>> the latest virtualbox 4.1.16 solved that problem.
>>
>> But now we got a new problem with this new version of virtualbox: 
>> whenever
>> we try to vnc to any vm, that vm will go to Aborted state 
>> immediately.
>> Actually, merely telnet from within the host to the vnc port of that 
>> vm
>> will immediately Abort that vm.  This prevents us from adding new 
>> vms.
>> Also, when starting vm with vnc port, we got this message:
>>
>> rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in 
>> use
>>
>> , which we found someone else provided a patch at
>> http://permalink.gmane.org/gmane.os.freebsd.devel.emulation/10237
>>
>> So looks like when there are multiple vms on a ipv6 system (we have 
>> 64bit
>> FreeBSD 9.0) will get this problem.
>
> Glad to hear that 4.1.16 helps for the networking problem. The VNC 
> problem
> is also a known one but the mentioned patch does not work at least 
> for a
> few people. It seems the bug is somewhere in libvncserver so 
> downgrading
> net/libvncserver to an earlier version (and rebuilding virtualbox) 
> should
> help until we come up with a proper fix.

You are right about the "Address already in use" problem and the patch 
for
it so I will commit the fix in a few moments.

I have also tried to reproduce the VNC crash but I couldn't. Probably 
because
my system is IPv6 enabled. flo@ has seen the same crash and has no IPv6 
in
his kernel which lead him to find this commit in libvncserver:


commit 66282f58000c8863e104666c30cb67b1d5cbdee3
Author: Kyle J. McKay <mackyle@gmail.com>
Date:   Fri May 18 00:30:11 2012 -0700
      libvncserver/sockets.c: do not segfault when 
listenSock/listen6Sock == -1

http://libvncserver.git.sourceforge.net/git/gitweb.cgi?p=libvncserver/libvncserver;a=commit;h=66282f5


It looks promising so please test this patch if you can reproduce the 
crash.

-- 
Bernhard Froehlich
http://www.bluelife.at/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b8f2877663fb73d56222180f8b74cc81>