From owner-freebsd-emulation@FreeBSD.ORG Wed Jun 6 07:50:36 2012 Return-Path: Delivered-To: freebsd-emulation@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B36E5106566C for ; Wed, 6 Jun 2012 07:50:36 +0000 (UTC) (envelope-from decke@FreeBSD.org) Received: from groupware.itac.at (groupware.itac.at [91.205.172.99]) by mx1.freebsd.org (Postfix) with ESMTP id 622D38FC12 for ; Wed, 6 Jun 2012 07:50:36 +0000 (UTC) Received: from home.bluelife.at (93.104.210.95) by groupware.itac.at (Axigen) with (AES256-SHA encrypted) ESMTPSA id 23948D; Wed, 6 Jun 2012 09:50:38 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Wed, 06 Jun 2012 09:50:35 +0200 From: Bernhard Froehlich To: Steve Tuts In-Reply-To: References: Message-ID: X-Sender: decke@FreeBSD.org User-Agent: Roundcube Webmail/0.7.2 X-AxigenSpam-Level: 1 X-CTCH-RefID: str=0001.0A0B0207.4FCF0BCB.0151,ss=1,fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown Cc: freebsd-emulation@freebsd.org Subject: Re: one virtualbox vm disrupts all vms and entire network X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jun 2012 07:50:36 -0000 On 05.06.2012 20:16, Bernhard Froehlich wrote: > On 05.06.2012 19:05, Steve Tuts wrote: >> On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl >> wrote: >> >>> On 2012-06-02 12:16, Steve Tuts wrote: >>> >>>> Hi, we have a Dell poweredge server with a dozen interfaces. It >>>> hosts a >>>> few guests of web app and email servers with VirtualBox-4.0.14. >>>> The host >>>> and all guests are FreeBSD 9.0 64bit. Each guest is bridged to a >>>> distinct >>>> interface. The host and all guests are set to 10.0.0.0 network >>>> NAT'ed to >>>> a >>>> cicso router. >>>> >>>> This runs well for a couple months, until we added a new guest >>>> recently. >>>> Every few hours, none of the guests can be connected. We can only >>>> connect >>>> to the host from outside the router. We can also go to the >>>> console of the >>>> guests (except the new guest), but from there we can't ping the >>>> gateway >>>> 10.0.0.1 any more. The new guest just froze. >>>> >>>> Furthermore, on the host we can see a vboxheadless process for >>>> each guest, >>>> including the new guest. But we can not kill it, not even with >>>> "kill -9". >>>> We looked around the web and someone suggested we should use "kill >>>> -SIGCONT" first since the "ps" output has the "T" flag for that >>>> vboxheadless process for that new guest, but that doesn't help. >>>> We also >>>> tried all the VBoxManager commands to poweroff/reset etc that new >>>> guest, >>>> but they all failed complaining that vm is in Aborted state. We >>>> also >>>> tried >>>> VBoxManager commands to disconnect the network cable for that new >>>> guest, >>>> it >>>> didn't complain, but there was no effect. >>>> >>>> For a couple times, on the host we disabled the interface bridging >>>> that >>>> new >>>> guest, then that vboxheadless process for that new guest >>>> disappeared (we >>>> attempted to kill it before that). And immediately all other vms >>>> regained >>>> connection back to normal. >>>> >>>> But there is one time even the above didn't help - the >>>> vboxheadless >>>> process >>>> for that new guest stubbonly remains, and we had to reboot the >>>> host. >>>> >>>> This is already a production server, so we can't upgrade >>>> virtualbox to the >>>> latest version until we obtain a test server. >>>> >>>> Would you advise: >>>> >>>> 1. is there any other way to kill that new guest instead of >>>> rebooting? >>>> 2. what might cause the problem? >>>> 3. what setting and test I can do to analyze this problem? >>>> ______________________________**_________________ >>>> >>> >>> I haven't seen any comments on this and don't want you to think you >>> are >>> being ignored but I haven't seen this but also, the 4.0 branch was >>> buggier >>> for me than the 4.1 releases so yeah, upgrading is probably what >>> you are >>> looking at. >>> >>> Rusty Nejdl >>> ______________________________**_________________ >>> >>> >> sorry, just realize my reply yesterday didn't go to the list, so am >> re-sending with some updates. >> >> Yes, we upgraded all ports and fortunately everything went back and >> especially all vms has run peacefully for two days now. So >> upgrading to >> the latest virtualbox 4.1.16 solved that problem. >> >> But now we got a new problem with this new version of virtualbox: >> whenever >> we try to vnc to any vm, that vm will go to Aborted state >> immediately. >> Actually, merely telnet from within the host to the vnc port of that >> vm >> will immediately Abort that vm. This prevents us from adding new >> vms. >> Also, when starting vm with vnc port, we got this message: >> >> rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in >> use >> >> , which we found someone else provided a patch at >> http://permalink.gmane.org/gmane.os.freebsd.devel.emulation/10237 >> >> So looks like when there are multiple vms on a ipv6 system (we have >> 64bit >> FreeBSD 9.0) will get this problem. > > Glad to hear that 4.1.16 helps for the networking problem. The VNC > problem > is also a known one but the mentioned patch does not work at least > for a > few people. It seems the bug is somewhere in libvncserver so > downgrading > net/libvncserver to an earlier version (and rebuilding virtualbox) > should > help until we come up with a proper fix. You are right about the "Address already in use" problem and the patch for it so I will commit the fix in a few moments. I have also tried to reproduce the VNC crash but I couldn't. Probably because my system is IPv6 enabled. flo@ has seen the same crash and has no IPv6 in his kernel which lead him to find this commit in libvncserver: commit 66282f58000c8863e104666c30cb67b1d5cbdee3 Author: Kyle J. McKay Date: Fri May 18 00:30:11 2012 -0700 libvncserver/sockets.c: do not segfault when listenSock/listen6Sock == -1 http://libvncserver.git.sourceforge.net/git/gitweb.cgi?p=libvncserver/libvncserver;a=commit;h=66282f5 It looks promising so please test this patch if you can reproduce the crash. -- Bernhard Froehlich http://www.bluelife.at/