From owner-freebsd-emulation@FreeBSD.ORG Thu Jun 7 07:54:52 2012 Return-Path: Delivered-To: freebsd-emulation@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4590D106566B for ; Thu, 7 Jun 2012 07:54:52 +0000 (UTC) (envelope-from yiz5hwi@gmail.com) Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id E7A738FC08 for ; Thu, 7 Jun 2012 07:54:51 +0000 (UTC) Received: by vcbfy7 with SMTP id fy7so188849vcb.13 for ; Thu, 07 Jun 2012 00:54:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Emd2/Xj4ucLTg1+L/SnEHGIl7tzsLeePNI8KRoQxd98=; b=n/wccF+n7NIhLPpFORYRIBfazfeABNJ8AR50I7O4c8KTRXXDTaoxDGRRW+ZfRTi/WY IbzE7HZLXMVNG3I5bMucj4siCj9htGzSeLxDh6/kP1UNesou4yNGO0KTmWG38v5E+pZj ejKR+BniBH2ZvbG9cPhKXFCrkRWwrK2PlxTcXezTsA8mJM1xjjJ8LWM0bd15BRJ74mDh WJfTE9gt3vchYQqgCRdYESZKd9hyN5qAY50ndnO75YGo0WsQeh9aQbJ4UBPl2N8DAVrm Od7ow4OYPuwShI7aScZZCCuBZOnSce4zJYhvTFUnIK2AymN0oqx78J6m5RTkrD2IsZb3 teWw== MIME-Version: 1.0 Received: by 10.52.29.69 with SMTP id i5mr922517vdh.84.1339055691155; Thu, 07 Jun 2012 00:54:51 -0700 (PDT) Received: by 10.52.30.73 with HTTP; Thu, 7 Jun 2012 00:54:51 -0700 (PDT) In-Reply-To: <1339052297.16686.2.camel@Nokia-N900-42-11> References: <1339052297.16686.2.camel@Nokia-N900-42-11> Date: Thu, 7 Jun 2012 03:54:51 -0400 Message-ID: From: Steve Tuts To: freebsd-emulation@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: one virtualbox vm disrupts all vms and entire network X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jun 2012 07:54:52 -0000 On Thu, Jun 7, 2012 at 2:58 AM, Bernhard Fr=F6hlich wro= te: > On Do., 7. Jun. 2012 01:07:52 CEST, Kevin Oberman > wrote: > > > On Wed, Jun 6, 2012 at 3:46 PM, Steve Tuts wrote: > > > On Wed, Jun 6, 2012 at 3:50 AM, Bernhard Froehlich > > > wrote: > > > > > > > On 05.06.2012 20:16, Bernhard Froehlich wrote: > > > > > > > > > On 05.06.2012 19:05, Steve Tuts wrote: > > > > > > > > > > > On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl > > > > > > wrote: > > > > > > > > > > > > On 2012-06-02 12:16, Steve Tuts wrote: > > > > > > > > > > > > > > Hi, we have a Dell poweredge server with a dozen interfaces. > > > > > > > It hosts > > > > > > > > a > > > > > > > > few guests of web app and email servers with > > > > > > > > VirtualBox-4.0.14. The host > > > > > > > > and all guests are FreeBSD 9.0 64bit. Each guest is bridge= d > > > > > > > > to a distinct > > > > > > > > interface. The host and all guests are set to 10.0.0.0 > > > > > > > > network NAT'ed to > > > > > > > > a > > > > > > > > cicso router. > > > > > > > > > > > > > > > > This runs well for a couple months, until we added a new > > > > > > > > guest recently. > > > > > > > > Every few hours, none of the guests can be connected. We > > > > > > > > can only connect > > > > > > > > to the host from outside the router. We can also go to the > > > > > > > > console of the > > > > > > > > guests (except the new guest), but from there we can't ping > > > > > > > > the gateway 10.0.0.1 any more. The new guest just froze. > > > > > > > > > > > > > > > > Furthermore, on the host we can see a vboxheadless process > > > > > > > > for each guest, > > > > > > > > including the new guest. But we can not kill it, not even > > > > > > > > with "kill -9". > > > > > > > > We looked around the web and someone suggested we should us= e > > > > > > > > "kill -SIGCONT" first since the "ps" output has the "T" fla= g > > > > > > > > for that vboxheadless process for that new guest, but that > > > > > > > > doesn't help. We also > > > > > > > > tried all the VBoxManager commands to poweroff/reset etc > > > > > > > > that new guest, > > > > > > > > but they all failed complaining that vm is in Aborted state= . > > > > > > > > We also tried > > > > > > > > VBoxManager commands to disconnect the network cable for > > > > > > > > that new guest, > > > > > > > > it > > > > > > > > didn't complain, but there was no effect. > > > > > > > > > > > > > > > > For a couple times, on the host we disabled the interface > > > > > > > > bridging that new > > > > > > > > guest, then that vboxheadless process for that new guest > > > > > > > > disappeared (we > > > > > > > > attempted to kill it before that). And immediately all > > > > > > > > other vms regained > > > > > > > > connection back to normal. > > > > > > > > > > > > > > > > But there is one time even the above didn't help - the > > > > > > > > vboxheadless process > > > > > > > > for that new guest stubbonly remains, and we had to reboot > > > > > > > > the host. > > > > > > > > > > > > > > > > This is already a production server, so we can't upgrade > > > > > > > > virtualbox to the > > > > > > > > latest version until we obtain a test server. > > > > > > > > > > > > > > > > Would you advise: > > > > > > > > > > > > > > > > 1. is there any other way to kill that new guest instead of > > > > > > > > rebooting? 2. what might cause the problem? > > > > > > > > 3. what setting and test I can do to analyze this problem? > > > > > > > > ______________________________****_________________ > > > > > > > > > > > > > > > > > > > > > > > I haven't seen any comments on this and don't want you to > > > > > > > think you are being ignored but I haven't seen this but also, > > > > > > > the 4.0 branch was buggier > > > > > > > for me than the 4.1 releases so yeah, upgrading is probably > > > > > > > what you are looking at. > > > > > > > > > > > > > > Rusty Nejdl > > > > > > > ______________________________****_________________ > > > > > > > > > > > > > > > > > > > > > sorry, just realize my reply yesterday didn't go to the list= , > > > > > > > so am > > > > > > re-sending with some updates. > > > > > > > > > > > > Yes, we upgraded all ports and fortunately everything went back > > > > > > and especially all vms has run peacefully for two days now. So > > > > > > upgrading to the latest virtualbox 4.1.16 solved that problem. > > > > > > > > > > > > But now we got a new problem with this new version of virtualbo= x: > > > > > > whenever > > > > > > we try to vnc to any vm, that vm will go to Aborted state > > > > > > immediately. Actually, merely telnet from within the host to th= e > > > > > > vnc port of that vm will immediately Abort that vm. This > > > > > > prevents us from adding new vms. Also, when starting vm with vn= c > > > > > > port, we got this message: > > > > > > > > > > > > rfbListenOnTCP6Port: error in bind IPv6 socket: Address already > > > > > > in use > > > > > > > > > > > > , which we found someone else provided a patch at > > > > > > > http://permalink.gmane.org/**gmane.os.freebsd.devel.**emulation/10237< > http://permalink.gmane.org/gmane.os.freebsd.devel.emulation/10237> > > > > > > > > > > > > So looks like when there are multiple vms on a ipv6 system (we > > > > > > have 64bit FreeBSD 9.0) will get this problem. > > > > > > > > > > > > > > > > Glad to hear that 4.1.16 helps for the networking problem. The VN= C > > > > > problem is also a known one but the mentioned patch does not work > > > > > at least for a few people. It seems the bug is somewhere in > > > > > libvncserver so downgrading net/libvncserver to an earlier versio= n > > > > > (and rebuilding virtualbox) should help until we come up with a > > > > > proper fix. > > > > > > > > > > > > > You are right about the "Address already in use" problem and the > > > > patch for it so I will commit the fix in a few moments. > > > > > > > > I have also tried to reproduce the VNC crash but I couldn't. Probab= ly > > > > because > > > > my system is IPv6 enabled. flo@ has seen the same crash and has no > > > > IPv6 in his kernel which lead him to find this commit in > > > > libvncserver: > > > > > > > > > > > > commit 66282f58000c8863e104666c30cb67**b1d5cbdee3 > > > > Author: Kyle J. McKay > > > > Date: Fri May 18 00:30:11 2012 -0700 > > > > libvncserver/sockets.c: do not segfault when > > > > listenSock/listen6Sock =3D=3D -1 > > > > > > > > http://libvncserver.git.** > sourceforge.net/git/gitweb.**cgi?p=3Dlibvncserver/ > > > > **libvncserver;a=3Dcommit;h=3D**66282f5< > http://libvncserver.git.sourceforge.net/git/gitweb.cgi?p=3Dlibvncserver/l= ibvncserver;a=3Dcommit;h=3D66282f5 > > > > > > > > > > > > > > It looks promising so please test this patch if you can reproduce t= he > > > > crash. > > > > > > > > > > > > -- > > > > Bernhard Froehlich > > > > http://www.bluelife.at/ > > > > > > > > > > Sorry, I tried to try this patch, but couldn't figure out how to do > > > that. I use ports to compile everything, and can see the file is at > > > > /usr/ports/net/libvncserver/work/LibVNCServer-0.9.9/libvncserver/sockets.= c > > > . However, if I edit this file and do make clean, this patch is wipe= d > > > out before I can do "make" out of it. How to apply this patch in the > > > ports? > > > > To apply patches to ports: > > # make clean > > # make patch > > > > # make > > # make deinstall > > # make reinstall > > > > Note that the final two steps assume a version of the port is already > > installed. If not: 'make install' > > I you use portmaster, after applying the patch: 'portmaster -C > > net/libvncserver' -- > > flo has already committed the patch to net/libvncserver so I guess it > fixes the problem. Please update your portstree and verify that it works > fine. > I confirmed after upgrading all ports and noticing libvncserver upgraded to 0.99_1 and reboot, then I can vnc to the vms now. Also, starting vms with vnc doesn't have that error now, instead it issues the following info, so all problem are solved. 07/06/2012 03:49:14 Listening for VNC connections on TCP port 5903 07/06/2012 03:49:14 Listening for VNC connections on TCP6 port 5903 Thanks everyone for your great help!