From owner-freebsd-emulation@FreeBSD.ORG Wed Jun 6 22:46:09 2012 Return-Path: Delivered-To: freebsd-emulation@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A41F41065672 for ; Wed, 6 Jun 2012 22:46:09 +0000 (UTC) (envelope-from yiz5hwi@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 52E678FC0C for ; Wed, 6 Jun 2012 22:46:09 +0000 (UTC) Received: by vbmv11 with SMTP id v11so5087207vbm.13 for ; Wed, 06 Jun 2012 15:46:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OjoOwnJRq2yy9W8Tq1p4NEqztNnGb2zi60mBHXEie9k=; b=h1AERWc2J7t6+reE5KmxWhPGhX6a64tMghYa0LOWcm4ozN5rxoW2snSQewN4YUtLSh XWpYKfTvKmfwoeeXIGZpc/oipaJxrSHy2VkvR05DIlYSZFWsoqIPZ1mnq8mrSpDP4gFj bm2xOUwOqrE3WPkSSuFmGIGKM7PScdyx6E7T0YFFVXzOH13hH7fMzutpFt29rSVKMyPK sSO1keD0m3rBCj7WaYQJwaADqReTOc3uN595Q2yVe2VJPK4u24t2eNXozyEjiFmqIH1F QmAKfV6U3OJi+JVeasx2C54Fzxs5yMwROrLU6HXybOer1BQKy+5Gi8SQrhaIMeFnx84d iwcA== MIME-Version: 1.0 Received: by 10.220.115.12 with SMTP id g12mr22501032vcq.44.1339022767125; Wed, 06 Jun 2012 15:46:07 -0700 (PDT) Received: by 10.52.30.73 with HTTP; Wed, 6 Jun 2012 15:46:07 -0700 (PDT) In-Reply-To: References: Date: Wed, 6 Jun 2012 18:46:07 -0400 Message-ID: From: Steve Tuts To: freebsd-emulation@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: one virtualbox vm disrupts all vms and entire network X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jun 2012 22:46:09 -0000 On Wed, Jun 6, 2012 at 3:50 AM, Bernhard Froehlich wrote: > On 05.06.2012 20:16, Bernhard Froehlich wrote: > >> On 05.06.2012 19:05, Steve Tuts wrote: >> >>> On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl >>> wrote: >>> >>> On 2012-06-02 12:16, Steve Tuts wrote: >>>> >>>> Hi, we have a Dell poweredge server with a dozen interfaces. It hosts >>>>> a >>>>> few guests of web app and email servers with VirtualBox-4.0.14. The >>>>> host >>>>> and all guests are FreeBSD 9.0 64bit. Each guest is bridged to a >>>>> distinct >>>>> interface. The host and all guests are set to 10.0.0.0 network NAT'ed >>>>> to >>>>> a >>>>> cicso router. >>>>> >>>>> This runs well for a couple months, until we added a new guest >>>>> recently. >>>>> Every few hours, none of the guests can be connected. We can only >>>>> connect >>>>> to the host from outside the router. We can also go to the console of >>>>> the >>>>> guests (except the new guest), but from there we can't ping the gateway >>>>> 10.0.0.1 any more. The new guest just froze. >>>>> >>>>> Furthermore, on the host we can see a vboxheadless process for each >>>>> guest, >>>>> including the new guest. But we can not kill it, not even with "kill >>>>> -9". >>>>> We looked around the web and someone suggested we should use "kill >>>>> -SIGCONT" first since the "ps" output has the "T" flag for that >>>>> vboxheadless process for that new guest, but that doesn't help. We >>>>> also >>>>> tried all the VBoxManager commands to poweroff/reset etc that new >>>>> guest, >>>>> but they all failed complaining that vm is in Aborted state. We also >>>>> tried >>>>> VBoxManager commands to disconnect the network cable for that new >>>>> guest, >>>>> it >>>>> didn't complain, but there was no effect. >>>>> >>>>> For a couple times, on the host we disabled the interface bridging that >>>>> new >>>>> guest, then that vboxheadless process for that new guest disappeared >>>>> (we >>>>> attempted to kill it before that). And immediately all other vms >>>>> regained >>>>> connection back to normal. >>>>> >>>>> But there is one time even the above didn't help - the vboxheadless >>>>> process >>>>> for that new guest stubbonly remains, and we had to reboot the host. >>>>> >>>>> This is already a production server, so we can't upgrade virtualbox to >>>>> the >>>>> latest version until we obtain a test server. >>>>> >>>>> Would you advise: >>>>> >>>>> 1. is there any other way to kill that new guest instead of rebooting? >>>>> 2. what might cause the problem? >>>>> 3. what setting and test I can do to analyze this problem? >>>>> ______________________________****_________________ >>>>> >>>>> >>>> I haven't seen any comments on this and don't want you to think you are >>>> being ignored but I haven't seen this but also, the 4.0 branch was >>>> buggier >>>> for me than the 4.1 releases so yeah, upgrading is probably what you are >>>> looking at. >>>> >>>> Rusty Nejdl >>>> ______________________________****_________________ >>>> >>>> >>>> sorry, just realize my reply yesterday didn't go to the list, so am >>> re-sending with some updates. >>> >>> Yes, we upgraded all ports and fortunately everything went back and >>> especially all vms has run peacefully for two days now. So upgrading to >>> the latest virtualbox 4.1.16 solved that problem. >>> >>> But now we got a new problem with this new version of virtualbox: >>> whenever >>> we try to vnc to any vm, that vm will go to Aborted state immediately. >>> Actually, merely telnet from within the host to the vnc port of that vm >>> will immediately Abort that vm. This prevents us from adding new vms. >>> Also, when starting vm with vnc port, we got this message: >>> >>> rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in use >>> >>> , which we found someone else provided a patch at >>> http://permalink.gmane.org/**gmane.os.freebsd.devel.**emulation/10237 >>> >>> So looks like when there are multiple vms on a ipv6 system (we have 64bit >>> FreeBSD 9.0) will get this problem. >>> >> >> Glad to hear that 4.1.16 helps for the networking problem. The VNC problem >> is also a known one but the mentioned patch does not work at least for a >> few people. It seems the bug is somewhere in libvncserver so downgrading >> net/libvncserver to an earlier version (and rebuilding virtualbox) should >> help until we come up with a proper fix. >> > > You are right about the "Address already in use" problem and the patch for > it so I will commit the fix in a few moments. > > I have also tried to reproduce the VNC crash but I couldn't. Probably > because > my system is IPv6 enabled. flo@ has seen the same crash and has no IPv6 in > his kernel which lead him to find this commit in libvncserver: > > > commit 66282f58000c8863e104666c30cb67**b1d5cbdee3 > Author: Kyle J. McKay > Date: Fri May 18 00:30:11 2012 -0700 > libvncserver/sockets.c: do not segfault when listenSock/listen6Sock == > -1 > > http://libvncserver.git.**sourceforge.net/git/gitweb.**cgi?p=libvncserver/ > **libvncserver;a=commit;h=**66282f5 > > > It looks promising so please test this patch if you can reproduce the > crash. > > > -- > Bernhard Froehlich > http://www.bluelife.at/ > Sorry, I tried to try this patch, but couldn't figure out how to do that. I use ports to compile everything, and can see the file is at /usr/ports/net/libvncserver/work/LibVNCServer-0.9.9/libvncserver/sockets.c . However, if I edit this file and do make clean, this patch is wiped out before I can do "make" out of it. How to apply this patch in the ports?