Date: Thu, 7 Jun 2012 03:05:48 -0400 From: Steve Tuts <yiz5hwi@gmail.com> To: Kevin Oberman <kob6558@gmail.com> Cc: freebsd-emulation@freebsd.org Subject: Re: one virtualbox vm disrupts all vms and entire network Message-ID: <CAEXKtDqnjEP75m-fwyWrG-TTj-t0jCKb-==Z1pLX5iOwiuJyXg@mail.gmail.com> In-Reply-To: <CAN6yY1sVaqw3NPAraW452Rg7807q8zi1O=g2D_QZ93q4UE_Gqw@mail.gmail.com> References: <CAEXKtDreCQ0O4NAi5opGm_KnR4As=dDvc-zP5Z0z5g84GQQuyg@mail.gmail.com> <assp.050217e07a.6a54445e6fa4183cff3692d9deed5635@ringofsaturn.com> <CAEXKtDrG%2Byj%2B4vOhhKrQcC6h9mEeFOtzHtJaV-UgPMrdn3xisQ@mail.gmail.com> <e1037b202b887d93142a4e693784f874@bluelife.at> <b8f2877663fb73d56222180f8b74cc81@bluelife.at> <CAEXKtDqR0F7btne62C%2Bw90qnWfP3kkU-E851b8WOf26yKeGBhg@mail.gmail.com> <CAN6yY1sVaqw3NPAraW452Rg7807q8zi1O=g2D_QZ93q4UE_Gqw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 6, 2012 at 7:07 PM, Kevin Oberman <kob6558@gmail.com> wrote: > On Wed, Jun 6, 2012 at 3:46 PM, Steve Tuts <yiz5hwi@gmail.com> wrote: > > On Wed, Jun 6, 2012 at 3:50 AM, Bernhard Froehlich <decke@freebsd.org > >wrote: > > > >> On 05.06.2012 20:16, Bernhard Froehlich wrote: > >> > >>> On 05.06.2012 19:05, Steve Tuts wrote: > >>> > >>>> On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl <rnejdl@ringofsaturn.com> > >>>> wrote: > >>>> > >>>> On 2012-06-02 12:16, Steve Tuts wrote: > >>>>> > >>>>> Hi, we have a Dell poweredge server with a dozen interfaces. It > hosts > >>>>>> a > >>>>>> few guests of web app and email servers with VirtualBox-4.0.14. The > >>>>>> host > >>>>>> and all guests are FreeBSD 9.0 64bit. Each guest is bridged to a > >>>>>> distinct > >>>>>> interface. The host and all guests are set to 10.0.0.0 network > NAT'ed > >>>>>> to > >>>>>> a > >>>>>> cicso router. > >>>>>> > >>>>>> This runs well for a couple months, until we added a new guest > >>>>>> recently. > >>>>>> Every few hours, none of the guests can be connected. We can only > >>>>>> connect > >>>>>> to the host from outside the router. We can also go to the console > of > >>>>>> the > >>>>>> guests (except the new guest), but from there we can't ping the > gateway > >>>>>> 10.0.0.1 any more. The new guest just froze. > >>>>>> > >>>>>> Furthermore, on the host we can see a vboxheadless process for each > >>>>>> guest, > >>>>>> including the new guest. But we can not kill it, not even with > "kill > >>>>>> -9". > >>>>>> We looked around the web and someone suggested we should use "kill > >>>>>> -SIGCONT" first since the "ps" output has the "T" flag for that > >>>>>> vboxheadless process for that new guest, but that doesn't help. We > >>>>>> also > >>>>>> tried all the VBoxManager commands to poweroff/reset etc that new > >>>>>> guest, > >>>>>> but they all failed complaining that vm is in Aborted state. We > also > >>>>>> tried > >>>>>> VBoxManager commands to disconnect the network cable for that new > >>>>>> guest, > >>>>>> it > >>>>>> didn't complain, but there was no effect. > >>>>>> > >>>>>> For a couple times, on the host we disabled the interface bridging > that > >>>>>> new > >>>>>> guest, then that vboxheadless process for that new guest disappeared > >>>>>> (we > >>>>>> attempted to kill it before that). And immediately all other vms > >>>>>> regained > >>>>>> connection back to normal. > >>>>>> > >>>>>> But there is one time even the above didn't help - the vboxheadless > >>>>>> process > >>>>>> for that new guest stubbonly remains, and we had to reboot the host. > >>>>>> > >>>>>> This is already a production server, so we can't upgrade virtualbox > to > >>>>>> the > >>>>>> latest version until we obtain a test server. > >>>>>> > >>>>>> Would you advise: > >>>>>> > >>>>>> 1. is there any other way to kill that new guest instead of > rebooting? > >>>>>> 2. what might cause the problem? > >>>>>> 3. what setting and test I can do to analyze this problem? > >>>>>> ______________________________****_________________ > >>>>>> > >>>>>> > >>>>> I haven't seen any comments on this and don't want you to think you > are > >>>>> being ignored but I haven't seen this but also, the 4.0 branch was > >>>>> buggier > >>>>> for me than the 4.1 releases so yeah, upgrading is probably what you > are > >>>>> looking at. > >>>>> > >>>>> Rusty Nejdl > >>>>> ______________________________****_________________ > >>>>> > >>>>> > >>>>> sorry, just realize my reply yesterday didn't go to the list, so am > >>>> re-sending with some updates. > >>>> > >>>> Yes, we upgraded all ports and fortunately everything went back and > >>>> especially all vms has run peacefully for two days now. So upgrading > to > >>>> the latest virtualbox 4.1.16 solved that problem. > >>>> > >>>> But now we got a new problem with this new version of virtualbox: > >>>> whenever > >>>> we try to vnc to any vm, that vm will go to Aborted state immediately. > >>>> Actually, merely telnet from within the host to the vnc port of that > vm > >>>> will immediately Abort that vm. This prevents us from adding new vms. > >>>> Also, when starting vm with vnc port, we got this message: > >>>> > >>>> rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in use > >>>> > >>>> , which we found someone else provided a patch at > >>>> http://permalink.gmane.org/**gmane.os.freebsd.devel.**emulation/10237 > <http://permalink.gmane.org/gmane.os.freebsd.devel.emulation/10237> > >>>> > >>>> So looks like when there are multiple vms on a ipv6 system (we have > 64bit > >>>> FreeBSD 9.0) will get this problem. > >>>> > >>> > >>> Glad to hear that 4.1.16 helps for the networking problem. The VNC > problem > >>> is also a known one but the mentioned patch does not work at least for > a > >>> few people. It seems the bug is somewhere in libvncserver so > downgrading > >>> net/libvncserver to an earlier version (and rebuilding virtualbox) > should > >>> help until we come up with a proper fix. > >>> > >> > >> You are right about the "Address already in use" problem and the patch > for > >> it so I will commit the fix in a few moments. > >> > >> I have also tried to reproduce the VNC crash but I couldn't. Probably > >> because > >> my system is IPv6 enabled. flo@ has seen the same crash and has no > IPv6 in > >> his kernel which lead him to find this commit in libvncserver: > >> > >> > >> commit 66282f58000c8863e104666c30cb67**b1d5cbdee3 > >> Author: Kyle J. McKay <mackyle@gmail.com> > >> Date: Fri May 18 00:30:11 2012 -0700 > >> libvncserver/sockets.c: do not segfault when listenSock/listen6Sock > == > >> -1 > >> > >> http://libvncserver.git.** > sourceforge.net/git/gitweb.**cgi?p=libvncserver/ > >> **libvncserver;a=commit;h=**66282f5< > http://libvncserver.git.sourceforge.net/git/gitweb.cgi?p=libvncserver/libvncserver;a=commit;h=66282f5 > > > >> > >> > >> It looks promising so please test this patch if you can reproduce the > >> crash. > >> > >> > >> -- > >> Bernhard Froehlich > >> http://www.bluelife.at/ > >> > > > > Sorry, I tried to try this patch, but couldn't figure out how to do that. > > I use ports to compile everything, and can see the file is at > > > /usr/ports/net/libvncserver/work/LibVNCServer-0.9.9/libvncserver/sockets.c > > . However, if I edit this file and do make clean, this patch is wiped > out > > before I can do "make" out of it. How to apply this patch in the ports? > > To apply patches to ports: > # make clean > # make patch > <Apply patch> > # make > # make deinstall > # make reinstall > > Note that the final two steps assume a version of the port is already > installed. If not: 'make install' > I you use portmaster, after applying the patch: 'portmaster -C > net/libvncserver' > -- > R. Kevin Oberman, Network Engineer > E-mail: kob6558@gmail.com > Thanks. Last night I realized we are using make :-) so I just did: cd /usr/ports/net/libvncserver/work/LibVNCServer-0.9.9 make cd /usr/ports/net/libvncserver pkg_delete virtualbox-ose-4.1.16_1 make reinstall cd /usr/ports/emulation/virtualbox-ose make reinstall Note I have quite a bunch of servers running. When I just kill one of the vm and then start it with vnc, I still get that message: rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in use HOWEVER, now telnetting to that vm's vnc port woNOT kill it, and furthermore I can VNC to it now! I suspect that rfbListenOnTCP6Port error will disappear if I reboot the machine or kill all vms. Is this a valid test?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAEXKtDqnjEP75m-fwyWrG-TTj-t0jCKb-==Z1pLX5iOwiuJyXg>