Date: Fri, 20 Dec 2019 11:31:59 +0000 From: =?UTF-8?Q?Goran_Meki=C4=87?= <meka@tilda.center> To: freebsd-net@freebsd.org, Marko Zec <zec@fer.hr>, "Patrick M. Hausen" <hausen@punkt.de> Cc: Kristof Provost <kp@eurobsdcon.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: Continuing problems in a bridged VNET setup Message-ID: <1AB8ACD6-0FF0-487C-963D-3A1B05288FD9@tilda.center> In-Reply-To: <20191220122256.76942c07@x23> References: <BD4018F8-0BB7-4EA9-A726-F6383E9AC892@punkt.de> <20191220122256.76942c07@x23>
next in thread | previous in thread | raw e-mail | index | archive | help
On December 20, 2019 11:22:01 AM UTC, Marko Zec <zec@fer.hr> wrote:
>Perhaps you could ditch if_bridge(4) and epair(4), and try ng_eiface(4)
>with ng_bridge(4) instead? Works rock-solid 24/7 here on 11.2 / 11.3.
>
>Marko
>
>On Fri, 20 Dec 2019 11:19:24 +0100
>"Patrick M. Hausen" <hausen@punkt.de> wrote:
>
>> Hi all,
>>
>> we still experience occasional network outages in production,
>> yet have not been able to find the root cause.
>>
>> We run around 50 servers with VNET jails. some of them with
>> a handful, the busiest ones with 50 or more jails each.
>>
>> Every now and then the jails are not reachable over the net,
>> anymore. The server itself is up and running, all jails are
>> up and running, one can ssh to the server but none of the
>> jails can communicate over the network.
>>
>> There seems to be no pattern to the time of occurrance except
>> that more jails on one system make it "more likely".
>> Also having more than one bridge, e.g. for private networks
>> between jails seems to increase the probability.
>> When a server shows the problem it tends to get into the state
>> rather frequently, a couple of hours inbetween. Then again
>> most servers run for weeks without exhibiting the problem.
>> That's what makes it so hard to reproduce. The last couple of
>> days one system was failing regularly until we reduced the number
>> of jails from around 80 to around 50. Now it seems stable again.
>>
>> I have a test system with lots of jails that I work with gatling
>> that did not show a single failure so far :-(
>>
>>
>> Setup:
>>
>> All jails are iocage jails with VNET interfaces. They are
>> connected to at least one bridge that starts with the
>> physical external interface as a member and gets jails'
>> epair interfaces added as they start up. All jails are managed
>> by iocage.
>>
>> ifconfig_igb0="-rxcsum -rxcsum6 -txcsum -txcsum6 -vlanhwtag
>> -vlanhwtso up" cloned_interfaces="bridge0"
>> ifconfig_bridge0_name="inet0"
>> ifconfig_inet0="addm igb0 up"
>> ifconfig_inet0_ipv6="inet6 <host-address>/64 auto_linklocal"
>>
>> $ iocage get interfaces vpro0087
>> vnet0:inet0
>>
>> $ ifconfig inet0
>> inet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0
>> mtu 1500 ether 90:1b:0e:63:ef:51
>> inet6 fe80::921b:eff:fe63:ef51%inet0 prefixlen 64 scopeid 0x4
>> inet6 <host-address> prefixlen 64
>> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>> groups: bridge
>> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
>> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
>> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
>> member: vnet0.4 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>> ifmaxaddr 0 port 7 priority 128 path cost 2000
>> member: vnet0.1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>> ifmaxaddr 0 port 6 priority 128 path cost 2000
>> member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>> ifmaxaddr 0 port 1 priority 128 path cost 2000000
>>
>>
>> What we tried:
>>
>> At first we suspected the bridge to become "wedged" somehow. This was
>> corroborated by talking to various people at devsummits and
>EuroBSDCon
>> with Kristof Provost specifically suggesting that if_bridge was
>> still under giant lock and there might be a problem here that the
>> lock is not released under some race condition and then the entire
>> bridge subsystem would be stalled. That sounds plausible given the
>> random occurrance.
>>
>> But I think we can rule out that one, because:
>>
>> - ifconfig up/down does not help
>> - the host is still communicating fine over the same bridge interface
>> - tearing down the bridge, kldunload (!) of if_bridge.ko followed by
>> a new kldload and reconstructing the members with `ifconfig addm`
>> does not help, either
>> - only a host reboot restores function
>>
>> Finally I created a not iocage managed jail on the problem host.
>> Please ignore the `iocage` in the path, I used it to populate the
>> root directory. But it is not started by iocage at boot time and
>> the manual config is this:
>>
>> testjail {
>> host.hostname = "testjail"; # hostname
>> path = "/iocage/jails/testjail/root"; # root directory
>> exec.clean;
>> exec.system_user = "root";
>> exec.jail_user = "root";
>> vnet;
>> vnet.interface = "epair999b";
>> exec.prestart += "ifconfig epair999 create; ifconfig
>> epair999a inet6 2A00:B580:8000:8000::1/64 auto_linklocal";
>> exec.poststop += "sleep 2; ifconfig epair999a destroy; sleep 2";
>> # Standard stuff
>> exec.start += "/bin/sh /etc/rc";
>> exec.stop = "/bin/sh /etc/rc.shutdown";
>> exec.consolelog = "/var/log/jail_testjail_console.log";
>> mount.devfs; #mount devfs
>> allow.raw_sockets; #allow ping-pong
>> devfs_ruleset="4"; #devfs ruleset for this jail
>> }
>>
>> $ cat /iocage/jails/testjail/root/etc/rc.conf
>> hostname="testjail"
>>
>> ifconfig_epair999b_ipv6="inet6 2A00:B580:8000:8000::2/64
>> auto_linklocal"
>>
>> When I do `service jail onestart testjail` I can then ping6 the jail
>> from the host and the host from the jail. As you can see the
>> if_bridge is not involved in this traffic.
>>
>> When the host is in the wedged state and I start this testjail the
>> same way, no communication across the epair interface is possible.
>>
>> To me this seems to indicate that not the bridge but all epair
>> interfaces stop working at the very same time.
>>
>>
>> OS is RELENG_11_3, hardware and specifically network adapters vary,
>> we have igb, ix, ixl, bnxt ...
>>
>>
>> Does anyone have a suggestion what diagnostic measures could help to
>> pinpoint the culprit? The random occurrance and the fact that the
>> problem seems to prefer the production environment only makes this a
>> real pain ...
>>
>>
>> Thanks and kind regards,
>> Patrick
>
>_______________________________________________
>freebsd-net@freebsd.org mailing list
>https://lists.freebsd.org/mailman/listinfo/freebsd-net
>To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Does it work with pf?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
From owner-freebsd-net@freebsd.org Fri Dec 20 11:43:32 2019
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
by mailman.nyi.freebsd.org (Postfix) with ESMTP id 82D0C1D72CF
for <freebsd-net@mailman.nyi.freebsd.org>;
Fri, 20 Dec 2019 11:43:32 +0000 (UTC) (envelope-from zec@fer.hr)
Received: from EUR03-AM5-obe.outbound.protection.outlook.com
(mail-eopbgr30075.outbound.protection.outlook.com [40.107.3.75])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.protection.outlook.com",
Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK))
by mx1.freebsd.org (Postfix) with ESMTPS id 47fRhg1p0Sz3NW0
for <freebsd-net@freebsd.org>; Fri, 20 Dec 2019 11:43:30 +0000 (UTC)
(envelope-from zec@fer.hr)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
b=llMtIV6mEZMO7nim0w/45uB1qzz2CUm3QMkpL2TqRGogjXDY0TtOuu7f4rKaGuO9boHtvFq8l3a4chlTYMtfDT7X3sxaRA1v8rQrFgG9yirl4LItzEocv2lOuTLHFc9DVXui2Yk7OaOSUYXq6w1ZTHB50CSweCzcFam6HOZToMDipGNPDO6w5iENTtMdu4PkuCaxA22l/KW/HSMXS2TamfzNUntnW+vrCw0aRhLNo0r8geANbC2HYeSwXnOWxTMeSXElOtctr/aO2NfXL11CVNXD3d5WydHari9MjkS9FhWE9sVNefEvAAdA9QU716q3rjNQ2e/+jeMzpiiio/v4Bg=ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=fNEGiQWsLWqNJE0V6O/2BpHOhRHvfAK3O13pmxA+eic=;
b=Qbmb5jZhyL3xU02vmbmMme471W4SHmH3hCGchQqj4HTKS2TtIKToDGEAH6eGJR5wSvBuSDSDRFV5avq7eZpAILwf2gleYY+Ylja5PqGB89DI/Gkfy6EHD1X8OdcrqvfSIK9ThfjKN0vdox0da49oL9bwqHgHECgJUumvEW4fMHISadHCDgFcTrTwk1kjGYcEa1q9Rlr1ESKKPm5TV0pm8IrMOf/kiDonzzc4fwwcoO2L6waZfUsrNuPFYFsBuSvQXxsTY+Im9dSlzmcmfJ+y52SJNjtSdtpyrqOOZnXMeAQ6qx3gSh94tQVidM0uZTsQXzzTm7INiLEbdNBMwIfBWg=ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfromþr.hr; dmarc=pass action=none header.fromþr.hr; dkim=pass
header.dþr.hr; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; dþrhr.onmicrosoft.com;
s=selector2-ferhr-onmicrosoft-com;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=fNEGiQWsLWqNJE0V6O/2BpHOhRHvfAK3O13pmxA+eic=;
b=G+xENGXu8k84TVQh1sQCbLUeU/mpbutA8mRuHmWmwH+7UradI2+pNOs7z43FqqVCTof4oMdxyGj1UfbZy2nyDzPNNuDu0IJvZpjhWsDVlT1z7eCgTIvzGaTLQ+YWCw2j3cTp11E6dJiMFtEe/BvZV8E8PnYpUx9VzAXMffV4/EQReceived: from AM6PR08MB3078.eurprd08.prod.outlook.com (52.135.164.16) by
AM6PR08MB4246.eurprd08.prod.outlook.com (20.179.6.141) with Microsoft SMTP
Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
15.20.2538.19; Fri, 20 Dec 2019 11:43:28 +0000
Received: from AM6PR08MB3078.eurprd08.prod.outlook.com
([fe80::a8d0:1e6:a51:66aa]) by AM6PR08MB3078.eurprd08.prod.outlook.com
([fe80::a8d0:1e6:a51:66aa%3]) with mapi id 15.20.2559.016; Fri, 20 Dec 2019
11:43:28 +0000
From: Marko Zec <zec@fer.hr>
To: Goran Meki? <meka@tilda.center>
CC: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, "Patrick M. Hausen"
<hausen@punkt.de>, Kristof Provost <kp@eurobsdcon.org>
Subject: Re: Continuing problems in a bridged VNET setup
Thread-Topic: Continuing problems in a bridged VNET setup
Thread-Index: AQHVtx7/PdlOyA2FWE2l/uUQiHRcnKfC4WkAgAACh4CAAAN2AA=Date: Fri, 20 Dec 2019 11:43:28 +0000
Message-ID: <20191220124422.11c03f5c@x23>
References: <BD4018F8-0BB7-4EA9-A726-F6383E9AC892@punkt.de>
<20191220122256.76942c07@x23>
<1AB8ACD6-0FF0-487C-963D-3A1B05288FD9@tilda.center>
In-Reply-To: <1AB8ACD6-0FF0-487C-963D-3A1B05288FD9@tilda.center>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-clientproxiedby: FRYP281CA0002.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10::12)
To AM6PR08MB3078.eurprd08.prod.outlook.com
(2603:10a6:209:46::16)
x-ms-exchange-messagesentrepresentingtype: 1
x-mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; amd64-portbld-freebsd11.3)
x-originating-ip: [161.53.19.9]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 96b44354-e427-4b4d-6493-08d78541d47d
x-ms-traffictypediagnostic: AM6PR08MB4246:
x-microsoft-antispam-prvs: <AM6PR08MB42465059C0A71A923FDFC628C32D0@AM6PR08MB4246.eurprd08.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:4303;
x-forefront-prvs: 025796F161
x-forefront-antispam-report: SFV:NSPM;
SFS:(10009020)(7916004)(346002)(366004)(376002)(136003)(39850400004)(396003)(199004)(189003)(66446008)(66476007)(64756008)(186003)(66946007)(8936002)(81166006)(81156014)(5660300002)(8676002)(66556008)(9686003)(6512007)(54906003)(786003)(71200400001)(6506007)(52116002)(4326008)(33716001)(86362001)(2906002)(478600001)(6486002)(26005)(1076003)(4744005)(6916009)(316002)(39210200001);
DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB4246;
H:AM6PR08MB3078.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en;
PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: fer.hr does not designate
permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: BeFpJYiA4mz15buwlV3efoY7juRMEfvtJjW0pwoI9HN73Y78dHu9UPx3nSpc+jdIaSVB4AF+aiFLvkuvQi2h70Wg5rNYiPKAnbfKV0nVD+n/EbWNNMdWx6RT4Z+mTtfxJNnIGuPzh+N6xpEEdlcMhwlYlhOyhc3dCwEVUjt0zRj515D2l/aHZTLEkZsPJIcqvjcyXDVIjQ4cmHeuRjMvIsI6o1e7PQxaZkTg7SOXWEQ+6CVjN+PlzKG/bRum4IV/mZOSGSXfkmG6Y17h6tIOKLWJTq9igsDpFJqNTAhd2XQSHxoG9VQ+aiRHKjZyrLC76DeOpjpVGT5iS/8Mo9qMI2wBS/OzeSh/ci3xFEhIUZz4YtcFhp6GmQhsKYlk0K+sEi6jUMuKmWYJdTyOfvLTz/nja8EpdrGKqM6XWtyCGkgU8CQ8x+Dk0v/G101y5DWajaIt2TdDPUbVxv7fuY2x5Gp544cklrITerkBRD99/ucJsKF4/khfPaGAFyKplvZK
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-ID: <DFA90255E300544AA45083EA2421D476@eurprd08.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: fer.hr
X-MS-Exchange-CrossTenant-Network-Message-Id: 96b44354-e427-4b4d-6493-08d78541d47d
X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Dec 2019 11:43:28.2729 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: ca71eddc-cc7b-4e5b-95bd-55b658e696be
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: xdqBARlWbrNQcAiwJUjDSmGVNMPsdJptboG4B5DZ2WQuMrKPAuledpQbdgyPtNZj
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB4246
X-Rspamd-Queue-Id: 47fRhg1p0Sz3NW0
X-Spamd-Bar: ----
Authentication-Results: mx1.freebsd.org;
dkim=pass header.dþrhr.onmicrosoft.com
header.s=selector2-ferhr-onmicrosoft-com header.b=G+xENGXu;
dmarc=none;
spf=pass (mx1.freebsd.org: domain of zec@fer.hr designates 40.107.3.75 as
permitted sender) smtp.mailfrom=zec@fer.hr
X-Spamd-Result: default: False [-4.26 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[];
NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
R_DKIM_ALLOW(-0.20)[ferhr.onmicrosoft.com:s=selector2-ferhr-onmicrosoft-com];
HAS_XOIP(0.00)[]; FROM_HAS_DN(0.00)[];
RCPT_COUNT_THREE(0.00)[4];
R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16];
NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[fer.hr];
TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3];
TO_MATCH_ENVRCPT_SOME(0.00)[];
DKIM_TRACE(0.00)[ferhr.onmicrosoft.com:+];
MIME_BASE64_TEXT(0.10)[];
RCVD_IN_DNSWL_NONE(0.00)[75.3.107.40.list.dnswl.org : 127.0.3.0];
IP_SCORE(-1.36)[ipnet: 40.64.0.0/10(-3.84), asn: 8075(-2.92), country:
US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; MID_RHS_NOT_FQDN(0.50)[];
MIME_TRACE(0.00)[0:+];
ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US];
ARC_ALLOW(-1.00)[i=1]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Dec 2019 11:43:32 -0000
T24gRnJpLCAyMCBEZWMgMjAxOSAxMTozMTo1OSArMDAwMA0KR29yYW4gTWVracSHIDxtZWthQHRp
bGRhLmNlbnRlcj4gd3JvdGU6DQoNCj4gT24gRGVjZW1iZXIgMjAsIDIwMTkgMTE6MjI6MDEgQU0g
VVRDLCBNYXJrbyBaZWMgPHplY0BmZXIuaHI+IHdyb3RlOg0KPiA+UGVyaGFwcyB5b3UgY291bGQg
ZGl0Y2ggaWZfYnJpZGdlKDQpIGFuZCBlcGFpcig0KSwgYW5kIHRyeQ0KPiA+bmdfZWlmYWNlKDQp
IHdpdGggbmdfYnJpZGdlKDQpIGluc3RlYWQ/ICBXb3JrcyByb2NrLXNvbGlkIDI0LzcgaGVyZQ0K
PiA+b24gMTEuMiAvIDExLjMuDQo+IA0KPiBEb2VzIGl0IHdvcmsgd2l0aCBwZj8NCg0KSW4gdGhl
IHBhcnRpY3VsYXIgcHJvZHVjdGlvbiBzZXR1cCBJIHdhcyByZWZlcmluZyB0byB3ZSB1c2UgaXBm
dywgc28NCmNhbid0IHNoYXJlIGFueSAxc3QtaGFuZCBleHBlcmllbmNlcyB3aXRoIHBmLg0KDQpN
YXJrbw0K
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1AB8ACD6-0FF0-487C-963D-3A1B05288FD9>
