Date: Fri, 17 Nov 2017 17:56:45 +0100 From: Harry Schmalzbauer <freebsd@omnilan.de> To: freebsd-net@freebsd.org Subject: netmap/vale periodic deadlock Message-ID: <5A0F14CD.3040407@omnilan.de>
next in thread | raw e-mail | index | archive | help
Hello, sorry for annoying with another question/problem. I'm using netmap's vale (on stable/11) for bhyve(8) virtio-net backed SDN. The guests – unfortunately in production already – quit network services (resp. are not able to transceive any packets anymore) after about 2 days; repeatedly and most likely not load related, since there is no significant load. Each guest is running fine, the host also runs without any other problem, no network problem elsewhere (different NICs; I use one dedicated NIC with vlan(4) children, each child connected to one vale switch). At some point, the complete netmap subsystem seems to deadlock: 'vale-ctl' hangs uninteruptable. Trying to attach a tcpdump to a vale switch also hands uninteruptable. Stoping (shuting down from inside) bhyve guests works up to the point where the vale port should be destroyed. I could continue the list of symptoms, but that doesn't help in any way I guess. My question is, where can I start finding out what happens with the netmap subsystem? There were no kernel messages right before or during the deadlock! The only userland tool I'm familar with (vale-ctl) isn't usable at all in that situation. Any hints what to try? Here's a excerpt of processes running when the netmap-lockuped host has all guests shut down, just before I rebooted. Snipped alot, the interesing ones are thos in state "netmap_g": … 0 14213 1 0 20 0 5864 0 wait IW 3 0:00,00 (sh) 0 14214 14213 0 -92 0 5358120 3586232 nm_kn_lo TC 3 148:02,02 bhyve: kallisto (bhyve) 0 14976 2522 0 20 0 6976 0 wait IW 3 0:00,00 su 0 14981 14976 0 20 0 8256 0 pause IW 3 0:00,00 _su (csh) 0 61615 14981 0 20 0 5864 0 wait IW 3 0:00,00 (sh) 0 61616 61615 0 52 0 2180648 1973252 netmap_g DEC 3 286:11,91 bhyve: preed (bhyve) 0 62845 14981 0 20 0 11624 3328 bdg lock L+ 3 0:00,01 tcpdump -n -e -s 150 -i vale1:test … 0 1390 1388 0 -92 0 2330024 767756 nm_kn_lo TC v0- 94:01,90 bhyve: styx0 (bhyve) 0 1401 1 0 52 0 5784 0 wait IW v0- 0:00,00 (sh) 0 1403 1401 0 20 0 368328 43444 - TC v0- 3:35,66 bhyve: korso (bhyve) …
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A0F14CD.3040407>