From owner-freebsd-net@freebsd.org  Tue Nov 21 08:39:30 2017
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75E9CDE7ACA
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Tue, 21 Nov 2017 08:39:30 +0000 (UTC)
 (envelope-from v.maffione@gmail.com)
Received: from mail-qk0-x22d.google.com (mail-qk0-x22d.google.com
 [IPv6:2607:f8b0:400d:c09::22d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2B32B64F06
 for <freebsd-net@freebsd.org>; Tue, 21 Nov 2017 08:39:30 +0000 (UTC)
 (envelope-from v.maffione@gmail.com)
Received: by mail-qk0-x22d.google.com with SMTP id 136so10954628qkd.4
 for <freebsd-net@freebsd.org>; Tue, 21 Nov 2017 00:39:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=bCqbgXaXJ+K6xcKvmmb4ISiCT8Zwh6aZwWB5omO87rg=;
 b=lJJ7Hd6GK609+wdswX3AoSvag8LvdiCR97np1/VuzGweikV6tUuGorj3Rf/+nsZ6mm
 G2ENpnTzbv3sHjL/qcJYiSKzxujXQSyeZg+GgQ+Mv3GZxwxQxuQz1N5ztOiGtvweYm8X
 StblgZUE4CZIa+G0IM+wyOZWjs0QdDfU+ZT2EpaKQ0oW0MlnOlcYJ/++0nkTS5xUii8r
 eROaquPFnHKeRqYzM4PtYrxd0ii9/Vc2oKWOXsMskDCtzTMBjaMKVaod9NiAKRVU0kZt
 y3CbWeK0Vl6oEhxoUw4DHlf4oit0dbZ3eUcJ/uDpuviW3yrzA726CItPwuFRkKe++wsZ
 Jymg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=bCqbgXaXJ+K6xcKvmmb4ISiCT8Zwh6aZwWB5omO87rg=;
 b=psqe89PEky4jgCl51rX0f2ZvOnAqDZBGTu7hpYX/ISWxcj0+GeITCxzxviRcTNhafG
 jwf4w5qanI/oD8q5PfdAUe8hqdc6Fe26VeWBXmDQodMKh5d+Q+8AvXWoQKTAFZMq3rP/
 SwKmoIc1OKHECVDfbtnZ+C/OvI7Loyjx6i3NX26Z56lb1HtM0+U2vVFYmwQUWX25olUY
 7rzY8ady6pzenkPSwgCsjdX8T4PnD5DBBgvjBHkrK5scMTndsx593C/Y1X+iXCJ0OsPu
 ACLvqeAxOwPBmFYXLriyCBUFnw7m9ldsYmo7Yjks26/9dorE85yoTCr3Vx6U7u1jGGoK
 fxhQ==
X-Gm-Message-State: AJaThX4kYjB5M+h2h4ODyjyLQT/xGFtKeyuwEM/a411rkgqEKmhgGddL
 5a0aNyeOtpuWSrnFsrbXrqqZOw89e714iC8Eu8rosA==
X-Google-Smtp-Source: AGs4zMayEVh9U9EK3Ciq8kgso6WrYxFrY3iQnoXKDwvwNjV3fCwcpHkVSgJ665GMWDAxVtrVSP7l4SXjU1q83VRVNlY=
X-Received: by 10.55.55.135 with SMTP id e129mr25932707qka.2.1511253568966;
 Tue, 21 Nov 2017 00:39:28 -0800 (PST)
MIME-Version: 1.0
Received: by 10.12.174.25 with HTTP; Tue, 21 Nov 2017 00:39:28 -0800 (PST)
In-Reply-To: <5A0F14CD.3040407@omnilan.de>
References: <5A0F14CD.3040407@omnilan.de>
From: Vincenzo Maffione <v.maffione@gmail.com>
Date: Tue, 21 Nov 2017 09:39:28 +0100
Message-ID: <CA+_eA9giPsMJ2_O1CLvOro=rMm5TaJyQ-et_U01Re5J9+9VSqg@mail.gmail.com>
Subject: Re: netmap/vale periodic deadlock
To: Harry Schmalzbauer <freebsd@omnilan.de>
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>,
 Giuseppe Lettieri <g.lettieri@iet.unipi.it>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.25
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Nov 2017 08:39:30 -0000

Hi,
  It's hard to say, specially because it happens after two days of normal
use.
Can't you enable deadlock debugging features in your kernel?
https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne=
ldebug-deadlocks.html

However, if I understand correctly you have created some VLAN interfaces
vlan0, vlan1, vlan2, ... on top of a NIC (say em0). And you have attached
each VLAN interface to a vale switch:

# vale-ctl -a vale0:vlan0
# vale-ctl -a vale1:vlan1
# vale-ctl -a vale2:vlan2

and each VALE switch is attached to a different set of bhyve guests.

If this is the case, although you are allowed to do that, I don't think
it's a convenient way to use netmap.
Since VLAN interfaces like vlan0 do not have (and cannot have) native
netmap support, you are falling back to emulated netmap adapters (which are
probably buggy on FreeBSD, specially when combined with VALE).
Apart from bugs I think that with this setup you can't get decent
performance that would justify using netmap rather than the standard kernel
bridge and TAP devices.

The right way to do it imho would be to write your own (userspace) netmap
application that forwards packets between your bhyve guests and the NIC,
prepending/stripping VLAN headers according to configuration (e.g. guest A
is configured to be on VLAN 100, guest B on VLAN 200), etc.
I think this would be a very interesting netmap application in general, and
more importantly you would get the performance that you can't get with your
setup.

Cheers,
  Vincenzo

2017-11-17 17:56 GMT+01:00 Harry Schmalzbauer <freebsd@omnilan.de>:

>  Hello,
>
> sorry for annoying with another question/problem.
>
> I'm using netmap's vale (on stable/11) for bhyve(8) virtio-net backed SDN=
.
>
> The guests =E2=80=93 unfortunately in production already =E2=80=93 quit n=
etwork services
> (resp. are not able to transceive any packets anymore) after about 2
> days; repeatedly and most likely not load related, since there is no
> significant load.
> Each guest is running fine, the host also runs without any other
> problem, no network problem elsewhere (different NICs; I use one
> dedicated NIC with vlan(4) children, each child connected to one vale
> switch).
>
> At some point, the complete netmap subsystem seems to deadlock:
> 'vale-ctl' hangs uninteruptable.
> Trying to attach a tcpdump to a vale switch also hands uninteruptable.
> Stoping (shuting down from inside) bhyve guests works up to the point
> where the vale port should be destroyed.
> I could continue the list of symptoms, but that doesn't help in any way
> I guess.
>
> My question is, where can I start finding out what happens with the
> netmap subsystem?
>
> There were no kernel messages right before or during the deadlock!
>
> The only userland tool I'm familar with (vale-ctl) isn't usable at all
> in that situation.
> Any hints what to try?
>
>
> Here's a excerpt of processes running when the netmap-lockuped host has
> all guests shut down, just before I rebooted.
> Snipped alot, the interesing ones are thos in state "netmap_g":
> =E2=80=A6
> 0 14213 1 0 20 0 5864 0 wait IW 3 0:00,00 (sh)
> 0 14214 14213 0 -92 0 5358120 3586232 nm_kn_lo TC 3 148:02,02 bhyve:
> kallisto (bhyve)
> 0 14976 2522 0 20 0 6976 0 wait IW 3 0:00,00 su
> 0 14981 14976 0 20 0 8256 0 pause IW 3 0:00,00 _su (csh)
> 0 61615 14981 0 20 0 5864 0 wait IW 3 0:00,00 (sh)
> 0 61616 61615 0 52 0 2180648 1973252 netmap_g DEC 3 286:11,91 bhyve:
> preed (bhyve)
> 0 62845 14981 0 20 0 11624 3328 bdg lock L+ 3 0:00,01 tcpdump -n -e -s
> 150 -i vale1:test
> =E2=80=A6
> 0 1390 1388 0 -92 0 2330024 767756 nm_kn_lo TC v0- 94:01,90 bhyve: styx0
> (bhyve)
> 0 1401 1 0 52 0 5784 0 wait IW v0- 0:00,00 (sh)
> 0 1403 1401 0 20 0 368328 43444 - TC v0- 3:35,66 bhyve: korso (bhyve)
> =E2=80=A6
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>



--=20
Vincenzo Maffione