From owner-freebsd-emulation@FreeBSD.ORG Thu Jun 7 06:58:09 2012 Return-Path: Delivered-To: freebsd-emulation@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D354B1065677 for ; Thu, 7 Jun 2012 06:58:08 +0000 (UTC) (envelope-from decke@bluelife.at) Received: from groupware.itac.at (groupware.itac.at [91.205.172.99]) by mx1.freebsd.org (Postfix) with ESMTP id 3F15A8FC15 for ; Thu, 7 Jun 2012 06:58:08 +0000 (UTC) Received: from [10.34.35.99] (89.144.192.75) by groupware.itac.at (Axigen) with (CAMELLIA256-SHA encrypted) ESMTPSA id 2C7039; Thu, 7 Jun 2012 08:58:03 +0200 From: Bernhard =?ISO-8859-1?Q?Fr=F6hlich?= To: Kevin Oberman , Steve Tuts X-Mailer: Modest 3.90.7 References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-ID: <1339052297.16686.1.camel@Nokia-N900-42-11> Date: Thu, 07 Jun 2012 08:58:17 +0200 Message-Id: <1339052297.16686.2.camel@Nokia-N900-42-11> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-AxigenSpam-Level: 1 X-CTCH-RefID: str=0001.0A0B020A.4FD050F8.003B,ss=1,fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown Cc: freebsd-emulation@freebsd.org Subject: Re: one virtualbox vm disrupts all vms and entire network X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Bernhard =?ISO-8859-1?Q?Fr=F6hlich?= List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jun 2012 06:58:10 -0000 On Do.,  7. Jun. 2012 01:07:52 CEST, Kevin Oberman wrote: > On Wed, Jun 6, 2012 at 3:46 PM, Steve Tuts wrote: > > On Wed, Jun 6, 2012 at 3:50 AM, Bernhard Froehlich > > wrote: > > > > > On 05.06.2012 20:16, Bernhard Froehlich wrote: > > > > > > > On 05.06.2012 19:05, Steve Tuts wrote: > > > > > > > > > On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl > > > > > wrote: > > > > > > > > > >  On 2012-06-02 12:16, Steve Tuts wrote: > > > > > > > > > > > >  Hi, we have a Dell poweredge server with a dozen interfaces. > > > > > >  It hosts > > > > > > > a > > > > > > > few guests of web app and email servers with > > > > > > > VirtualBox-4.0.14.  The host > > > > > > > and all guests are FreeBSD 9.0 64bit.  Each guest is bridged > > > > > > > to a distinct > > > > > > > interface.  The host and all guests are set to 10.0.0.0 > > > > > > > network NAT'ed to > > > > > > > a > > > > > > > cicso router. > > > > > > > > > > > > > > This runs well for a couple months, until we added a new > > > > > > > guest recently. > > > > > > > Every few hours, none of the guests can be connected.  We > > > > > > > can only connect > > > > > > > to the host from outside the router.  We can also go to the > > > > > > > console of the > > > > > > > guests (except the new guest), but from there we can't ping > > > > > > > the gateway 10.0.0.1 any more.  The new guest just froze. > > > > > > > > > > > > > > Furthermore, on the host we can see a vboxheadless process > > > > > > > for each guest, > > > > > > > including the new guest.  But we can not kill it, not even > > > > > > > with "kill -9". > > > > > > > We looked around the web and someone suggested we should use > > > > > > > "kill -SIGCONT" first since the "ps" output has the "T" flag > > > > > > > for that vboxheadless process for that new guest, but that > > > > > > > doesn't help.  We also > > > > > > > tried all the VBoxManager commands to poweroff/reset etc > > > > > > > that new guest, > > > > > > > but they all failed complaining that vm is in Aborted state. > > > > > > >  We also tried > > > > > > > VBoxManager commands to disconnect the network cable for > > > > > > > that new guest, > > > > > > > it > > > > > > > didn't complain, but there was no effect. > > > > > > > > > > > > > > For a couple times, on the host we disabled the interface > > > > > > > bridging that new > > > > > > > guest, then that vboxheadless process for that new guest > > > > > > > disappeared (we > > > > > > > attempted to kill it before that).  And immediately all > > > > > > > other vms regained > > > > > > > connection back to normal. > > > > > > > > > > > > > > But there is one time even the above didn't help - the > > > > > > > vboxheadless process > > > > > > > for that new guest stubbonly remains, and we had to reboot > > > > > > > the host. > > > > > > > > > > > > > > This is already a production server, so we can't upgrade > > > > > > > virtualbox to the > > > > > > > latest version until we obtain a test server. > > > > > > > > > > > > > > Would you advise: > > > > > > > > > > > > > > 1. is there any other way to kill that new guest instead of > > > > > > > rebooting? 2. what might cause the problem? > > > > > > > 3. what setting and test I can do to analyze this problem? > > > > > > > ______________________________****_________________ > > > > > > > > > > > > > > > > > > > > I haven't seen any comments on this and don't want you to > > > > > > think you are being ignored but I haven't seen this but also, > > > > > > the 4.0 branch was buggier > > > > > > for me than the 4.1 releases so yeah, upgrading is probably > > > > > > what you are looking at. > > > > > > > > > > > > Rusty Nejdl > > > > > > ______________________________****_________________ > > > > > > > > > > > > > > > > > >  sorry, just realize my reply yesterday didn't go to the list, > > > > > > so am > > > > > re-sending with some updates. > > > > > > > > > > Yes, we upgraded all ports and fortunately everything went back > > > > > and especially all vms has run peacefully for two days now.  So > > > > > upgrading to the latest virtualbox 4.1.16 solved that problem. > > > > > > > > > > But now we got a new problem with this new version of virtualbox: > > > > > whenever > > > > > we try to vnc to any vm, that vm will go to Aborted state > > > > > immediately. Actually, merely telnet from within the host to the > > > > > vnc port of that vm will immediately Abort that vm.  This > > > > > prevents us from adding new vms. Also, when starting vm with vnc > > > > > port, we got this message: > > > > > > > > > > rfbListenOnTCP6Port: error in bind IPv6 socket: Address already > > > > > in use > > > > > > > > > > , which we found someone else provided a patch at > > > > > http://permalink.gmane.org/**gmane.os.freebsd.devel.**emulation/10237 > > > > > > > > > > So looks like when there are multiple vms on a ipv6 system (we > > > > > have 64bit FreeBSD 9.0) will get this problem. > > > > > > > > > > > > > Glad to hear that 4.1.16 helps for the networking problem. The VNC > > > > problem is also a known one but the mentioned patch does not work > > > > at least for a few people. It seems the bug is somewhere in > > > > libvncserver so downgrading net/libvncserver to an earlier version > > > > (and rebuilding virtualbox) should help until we come up with a > > > > proper fix. > > > > > > > > > > You are right about the "Address already in use" problem and the > > > patch for it so I will commit the fix in a few moments. > > > > > > I have also tried to reproduce the VNC crash but I couldn't. Probably > > > because > > > my system is IPv6 enabled. flo@ has seen the same crash and has no > > > IPv6 in his kernel which lead him to find this commit in > > > libvncserver: > > > > > > > > > commit 66282f58000c8863e104666c30cb67**b1d5cbdee3 > > > Author: Kyle J. McKay > > > Date:   Fri May 18 00:30:11 2012 -0700 > > >     libvncserver/sockets.c: do not segfault when > > > listenSock/listen6Sock == -1 > > > > > > http://libvncserver.git.**sourceforge.net/git/gitweb.**cgi?p=libvncserver/ > > > **libvncserver;a=commit;h=**66282f5 > > > > > > > > > It looks promising so please test this patch if you can reproduce the > > > crash. > > > > > > > > > -- > > > Bernhard Froehlich > > > http://www.bluelife.at/ > > > > > > > Sorry, I tried to try this patch, but couldn't figure out how to do > > that. I use ports to compile everything, and can see the file is at > > /usr/ports/net/libvncserver/work/LibVNCServer-0.9.9/libvncserver/sockets.c > > .  However, if I edit this file and do make clean, this patch is wiped > > out before I can do "make" out of it.  How to apply this patch in the > > ports? > > To apply patches to ports: > # make clean > # make patch > > # make > # make deinstall > # make reinstall > > Note that the final two steps assume a version of the port is already > installed. If not: 'make install' > I you use portmaster, after applying the patch: 'portmaster -C > net/libvncserver' -- flo has already committed the patch to net/libvncserver so I guess it fixes the problem. Please update your portstree and verify that it works fine.