From owner-freebsd-net@freebsd.org Tue Nov 21 09:58:04 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D7593DE9552 for ; Tue, 21 Nov 2017 09:58:04 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6940867577 for ; Tue, 21 Nov 2017 09:58:04 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id vAL9w1VC069800; Tue, 21 Nov 2017 10:58:01 +0100 (CET) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id B51CE908; Tue, 21 Nov 2017 10:58:00 +0100 (CET) Message-ID: <5A13F8A8.2020209@omnilan.de> Date: Tue, 21 Nov 2017 10:58:00 +0100 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Vincenzo Maffione CC: "freebsd-net@freebsd.org" , Giuseppe Lettieri Subject: Re: netmap/vale periodic deadlock References: <5A0F14CD.3040407@omnilan.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Tue, 21 Nov 2017 10:58:01 +0100 (CET) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Nov 2017 09:58:04 -0000 Bezüglich Vincenzo Maffione's Nachricht vom 21.11.2017 09:39 (localtime): > Hi, > It's hard to say, specially because it happens after two days of > normal use. > Can't you enable deadlock debugging features in your kernel? > https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > However, if I understand correctly you have created some VLAN interfaces > vlan0, vlan1, vlan2, ... on top of a NIC (say em0). And you have > attached each VLAN interface to a vale switch: > > # vale-ctl -a vale0:vlan0 > # vale-ctl -a vale1:vlan1 > # vale-ctl -a vale2:vlan2 > > and each VALE switch is attached to a different set of bhyve guests. Hello Vincenzo, thank you very much for your help again! Your assumption is correct, here's my vale-ctl: 603.811416 bdg_ctl [148] bridge:0 port:0 vale1:nic1_dmz 603.811428 bdg_ctl [148] bridge:0 port:1 vale1:styx0 603.811430 bdg_ctl [148] bridge:0 port:2 vale1:korso 603.811432 bdg_ctl [148] bridge:0 port:3 vale1:kallisto 603.811434 bdg_ctl [148] bridge:1 port:0 vale2:nic1_inop 603.811435 bdg_ctl [148] bridge:1 port:1 vale2:styx0 603.811437 bdg_ctl [148] bridge:2 port:0 vale3:nic1_vnl 603.811439 bdg_ctl [148] bridge:2 port:1 vale3:styx0 603.811441 bdg_ctl [148] bridge:3 port:0 vale4:nic1_egn 603.811442 bdg_ctl [148] bridge:3 port:1 vale4:styx0 603.811444 bdg_ctl [148] bridge:3 port:2 vale4:preed … > If this is the case, although you are allowed to do that, I don't think > it's a convenient way to use netmap. > Since VLAN interfaces like vlan0 do not have (and cannot have) native > netmap support, you are falling back to emulated netmap adapters (which > are probably buggy on FreeBSD, specially when combined with VALE). > Apart from bugs I think that with this setup you can't get decent > performance that would justify using netmap rather than the standard > kernel bridge and TAP devices. I'm aware about the lost netmap-performace-benefit due to emulated netmap fallback. But there were some resonons why I chose vale(4) instead if_bridge(4): 1) Inter-Guest-traffic (virtio-net causes lot of LAPIC/IRQ overhead, but still less overhead than tap(4)/if_bridge(4)) 2) Future ptnetmap(4) upgrade path (which should save a lot of LAPIC/IRQ CPU cycles and unleash huge performace benefits with inter-vm traffic) 3) Admin-mess and MTU limitation. Each if_bridge(4) causes a host-stack interface, which I don't use and which spams ifconfig(8) output; which if_vtnet(4) even doubles. Most important disadvantage: if_bridge(4) needs all members to have exactly the same MTU. This has been a problem for me many times over the last years in various setups. So with my current setup the overhead/efficiency of host-external packet flow of bhyve_virtio-net+dyn_vale_port+vale(4) is equal to bhyve_virtio-net+if_vtnet(4)+if_bridge(4) But I have less disadvanteges with vale(4); as long as emulated netmap mode doesn't destabilize my setup :-( My second choice was ng_bridge(4). Which I made great experiences in my router-vm, running on that host in question (and in turn uses virtio-net interfaces attached to the individual vale(4) switches on the host). [ Even more impressive: pf(4) runs in a VIMAGE jail in that guest, utilizing those vale(4) interfaces. Reason for that complicated setup: Closest hardware abstraction possible. The setup (guest) should be easily migratable to real hardware ]. > The right way to do it imho would be to write your own (userspace) > netmap application that forwards packets between your bhyve guests and > the NIC, prepending/stripping VLAN headers according to configuration > (e.g. guest A is configured to be on VLAN 100, guest B on VLAN 200), etc. > I think this would be a very interesting netmap application in general, > and more importantly you would get the performance that you can't get > with your setup. I agree that having a userland application which, like you described, utilizes netmap to enable minimalistic SDN features, would be a great solution. But I would need really a lot of time, since my C skills are lousy, and I really don't have any time, not even one more day. I'll see if I can get any useful information with the kernel deadlock debuging feature you suggested (https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html), as soon as the problem shows up again. Since I forgot to add all production-RAM, I had to shutdown yesterday, so the lockup counter was reset ;-) Another last-minute change was with netmap ring size: I changed the vale-uplink interface. The one I used for passthrough had 2 queues (with EM_MULTIQUEUE support) and the one for the vale uplink onyl one, and during evaluation phase I reduced rx/tx descriptors to make netmap's default ring size working. Now I use the 2-queue NIC with vale uplink and increased ring size to 81920 while leaving the hardware default of 4096 rx/tx desriptors. But my wording wasn't technically correct I think, because I guess what I'm suffering isn't a real deadlock in terms of locking, but any netmap-internal lockup/overflow/limit/whatever. Just guesing here! I don't know netmap code! I only link symptoms, and since that setup is working really nice for some limited time, I hoped you or any other netmap expert could teach me how to find the root cause. Your sentence about FreeBSD's netmap-interface-emulation leaves a bad presentiment... Thank you very much, -harry