Date: Tue, 21 Nov 2017 09:39:28 +0100 From: Vincenzo Maffione <v.maffione@gmail.com> To: Harry Schmalzbauer <freebsd@omnilan.de> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Giuseppe Lettieri <g.lettieri@iet.unipi.it> Subject: Re: netmap/vale periodic deadlock Message-ID: <CA%2B_eA9giPsMJ2_O1CLvOro=rMm5TaJyQ-et_U01Re5J9%2B9VSqg@mail.gmail.com> In-Reply-To: <5A0F14CD.3040407@omnilan.de> References: <5A0F14CD.3040407@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, It's hard to say, specially because it happens after two days of normal use. Can't you enable deadlock debugging features in your kernel? https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne= ldebug-deadlocks.html However, if I understand correctly you have created some VLAN interfaces vlan0, vlan1, vlan2, ... on top of a NIC (say em0). And you have attached each VLAN interface to a vale switch: # vale-ctl -a vale0:vlan0 # vale-ctl -a vale1:vlan1 # vale-ctl -a vale2:vlan2 and each VALE switch is attached to a different set of bhyve guests. If this is the case, although you are allowed to do that, I don't think it's a convenient way to use netmap. Since VLAN interfaces like vlan0 do not have (and cannot have) native netmap support, you are falling back to emulated netmap adapters (which are probably buggy on FreeBSD, specially when combined with VALE). Apart from bugs I think that with this setup you can't get decent performance that would justify using netmap rather than the standard kernel bridge and TAP devices. The right way to do it imho would be to write your own (userspace) netmap application that forwards packets between your bhyve guests and the NIC, prepending/stripping VLAN headers according to configuration (e.g. guest A is configured to be on VLAN 100, guest B on VLAN 200), etc. I think this would be a very interesting netmap application in general, and more importantly you would get the performance that you can't get with your setup. Cheers, Vincenzo 2017-11-17 17:56 GMT+01:00 Harry Schmalzbauer <freebsd@omnilan.de>: > Hello, > > sorry for annoying with another question/problem. > > I'm using netmap's vale (on stable/11) for bhyve(8) virtio-net backed SDN= . > > The guests =E2=80=93 unfortunately in production already =E2=80=93 quit n= etwork services > (resp. are not able to transceive any packets anymore) after about 2 > days; repeatedly and most likely not load related, since there is no > significant load. > Each guest is running fine, the host also runs without any other > problem, no network problem elsewhere (different NICs; I use one > dedicated NIC with vlan(4) children, each child connected to one vale > switch). > > At some point, the complete netmap subsystem seems to deadlock: > 'vale-ctl' hangs uninteruptable. > Trying to attach a tcpdump to a vale switch also hands uninteruptable. > Stoping (shuting down from inside) bhyve guests works up to the point > where the vale port should be destroyed. > I could continue the list of symptoms, but that doesn't help in any way > I guess. > > My question is, where can I start finding out what happens with the > netmap subsystem? > > There were no kernel messages right before or during the deadlock! > > The only userland tool I'm familar with (vale-ctl) isn't usable at all > in that situation. > Any hints what to try? > > > Here's a excerpt of processes running when the netmap-lockuped host has > all guests shut down, just before I rebooted. > Snipped alot, the interesing ones are thos in state "netmap_g": > =E2=80=A6 > 0 14213 1 0 20 0 5864 0 wait IW 3 0:00,00 (sh) > 0 14214 14213 0 -92 0 5358120 3586232 nm_kn_lo TC 3 148:02,02 bhyve: > kallisto (bhyve) > 0 14976 2522 0 20 0 6976 0 wait IW 3 0:00,00 su > 0 14981 14976 0 20 0 8256 0 pause IW 3 0:00,00 _su (csh) > 0 61615 14981 0 20 0 5864 0 wait IW 3 0:00,00 (sh) > 0 61616 61615 0 52 0 2180648 1973252 netmap_g DEC 3 286:11,91 bhyve: > preed (bhyve) > 0 62845 14981 0 20 0 11624 3328 bdg lock L+ 3 0:00,01 tcpdump -n -e -s > 150 -i vale1:test > =E2=80=A6 > 0 1390 1388 0 -92 0 2330024 767756 nm_kn_lo TC v0- 94:01,90 bhyve: styx0 > (bhyve) > 0 1401 1 0 52 0 5784 0 wait IW v0- 0:00,00 (sh) > 0 1403 1401 0 20 0 368328 43444 - TC v0- 3:35,66 bhyve: korso (bhyve) > =E2=80=A6 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > --=20 Vincenzo Maffione
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B_eA9giPsMJ2_O1CLvOro=rMm5TaJyQ-et_U01Re5J9%2B9VSqg>