From owner-freebsd-virtualization@FreeBSD.ORG Sat Mar 7 12:50:00 2015 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1E422BE9 for ; Sat, 7 Mar 2015 12:50:00 +0000 (UTC) Received: from postout2.mail.lrz.de (postout2.mail.lrz.de [IPv6:2001:4ca0:0:103::81bb:ff8a]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B99498F0 for ; Sat, 7 Mar 2015 12:49:59 +0000 (UTC) Received: from lxmhs52.srv.lrz.de (localhost [127.0.0.1]) by postout2.mail.lrz.de (Postfix) with ESMTP id 3kzlyS71gGzySc for ; Sat, 7 Mar 2015 13:49:52 +0100 (CET) X-Virus-Scanned: by amavisd-new at lrz.de in lxmhs52.srv.lrz.de X-Spam-Flag: NO X-Spam-Score: -1.887 X-Spam-Level: X-Spam-Status: No, score=-1.887 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, LRZ_DATE_TZ_0000=0.001, LRZ_DKIM_DESTROY_MTA=0.001, LRZ_DMARC_OVERWRITE=0.001, LRZ_FROM_PHRASE=0.001, LRZ_FROM_PRE_SUR=0.001, LRZ_FROM_PRE_SUR_PHRASE=0.001, LRZ_FWD_MS_EX=0.001, LRZ_HAS_X_ORIG_IP=0.001, LRZ_MSGID_AN_AN=0.001, LRZ_MSGID_D_HU=0.001, LRZ_MSGID_SPAM_68=0.001, LRZ_RCVD_MS_EX=0.001, SPF_HELO_NONE=0.001] autolearn=no Received: from postout2.mail.lrz.de ([127.0.0.1]) by lxmhs52.srv.lrz.de (lxmhs52.srv.lrz.de [127.0.0.1]) (amavisd-new, port 20024) with LMTP id Qg2VUh1asgB2 for ; Sat, 7 Mar 2015 13:49:52 +0100 (CET) Received: from BADWLRZ-SW13MB1.ads.mwn.de (BADWLRZ-SW13MB1.ads.mwn.de [IPv6:2001:4ca0:0:108::155]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "BADWLRZ-SW13MB1", Issuer "BADWLRZ-SW13MB1" (not verified)) by postout2.mail.lrz.de (Postfix) with ESMTPS id 3kzlyS32BJzySn for ; Sat, 7 Mar 2015 13:49:52 +0100 (CET) Received: from BADWLRZ-SW13MB1.ads.mwn.de (2001:4ca0:0:108::155) by BADWLRZ-SW13MB1.ads.mwn.de (2001:4ca0:0:108::155) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Sat, 7 Mar 2015 13:49:51 +0100 Received: from BADWLRZ-SW13MB1.ads.mwn.de ([fe80::89:5514:4b27:d8be]) by BADWLRZ-SW13MB1.ads.mwn.de ([fe80::89:5514:4b27:d8be%12]) with mapi id 15.00.1044.021; Sat, 7 Mar 2015 13:49:51 +0100 From: Noah Bergbauer To: "freebsd-virtualization@freebsd.org" Subject: bhyve virtio-net MTU Thread-Topic: bhyve virtio-net MTU Thread-Index: AQHQWNH7gaNw8KNXVk2UjbPmTlFOyw== Date: Sat, 7 Mar 2015 12:49:50 +0000 Message-ID: <1425732590516.79490@tum.de> Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [217.249.206.110] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Mar 2015 12:50:00 -0000 Hi,=0A= =0A= I'm running FreeBSD 10.1 on a dedicated server, with a Linux VM in bhyve. T= he handbook tells you to bridge the tap interface to a real network interfa= ce, but that's not an option for me because I only have one IPv4 address. S= o instead, I assigned an internal IP address to bridge0 and used pf(4) to s= et up NAT routing.=0A= All of this works without any issues, but I would like to increase the MTU = of 1500. It's a virtual interface after all, why should it be so low? FreeB= SD's loopback interface's MTU is 16384 and on Linux, it's even 65536.=0A= =0A= So I used ifconfig(8) to increase the MTU of tap0 and just like the manpage= says, bridge0 had the same MTU after I added tap0. On the Linux side, I di= d the same with eth0 and then I sent 2000 byte pings to the host machine.= =0A= =0A= It seems to work, but let's use tcpdump(8) to make sure:=0A= =0A= 02:02:46.244678 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1686, s= eq 3, length 2008=0A= 02:02:46.244983 IP 10.42.42.1 > 10.42.42.100: ICMP echo reply, id 1686, seq= 3, length 1480=0A= 02:02:46.245061 IP 10.42.42.1 > 10.42.42.100: ip-proto-1=0A= 02:02:47.244953 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1686, s= eq 4, length 2008=0A= 02:02:47.245347 IP 10.42.42.1 > 10.42.42.100: ICMP echo reply, id 1686, seq= 4, length 1480=0A= 02:02:47.245422 IP 10.42.42.1 > 10.42.42.100: ip-proto-1=0A= =0A= The entire request goes through the virtio NIC, tap0, bridge0 and finally t= o the host's kernel without any issues. Which then sends a _fragmented_=A0r= eply because apparently it still thinks the MTU is 1500.=0A= =0A= A quick check with route(8) confirms this:=0A= =0A= # route show 10.42.42.100=0A= =A0 =A0route to: 10.42.42.100=0A= destination: 10.42.42.0=0A= =A0 =A0 =A0 =A0mask: 255.255.255.0=0A= =A0 =A0 =A0 =A0 fib: 0=0A= =A0 interface: bridge0=0A= =A0 =A0 =A0 flags: =0A= =A0recvpipe =A0sendpipe =A0ssthresh =A0rtt,msec =A0 =A0mtu =A0 =A0 =A0 =A0w= eight =A0 =A0expire=0A= =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 0 =A0 = =A0 =A01500 =A0 =A0 =A0 =A0 1 =A0 =A0 =A0 =A0 0=A0=0A= =0A= So I manually forced a bigger MTU:=0A= =0A= # route change -net 10.42.42.0 -mtu 15000=0A= change net 10.42.42.0=0A= =0A= But now the reply=A0packets get truncated instead of fragmented:=0A= =0A= 02:07:36.921165 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 3, length 2008=0A= 02:07:36.921624 IP truncated-ip - 6 bytes missing! 10.42.42.1 > 10.42.42.10= 0: ICMP echo reply, id 1689, seq 3, length 2008=0A= 02:07:37.921042 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 4, length 2008=0A= 02:07:37.921499 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 4, length 2008=0A= 02:07:38.921522 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 5, length 2008=0A= 02:07:38.922253 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 5, length 2008=0A= 02:07:39.921432 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 6, length 2008=0A= 02:07:39.922165 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 6, length 2008=0A= 02:07:40.921513 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 7, length 2008=0A= 02:07:40.922245 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 7, length 2008=0A= 02:07:41.921393 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 8, length 2008=0A= 02:07:41.922160 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 8, length 2008=0A= 02:07:42.921504 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 9, length 2008=0A= 02:07:42.922348 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 9, length 2008=0A= 02:07:43.923031 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 10, length 2008=0A= 02:07:43.924904 IP truncated-ip - 6 bytes missing! 10.42.42.1 > 10.42.42.10= 0: ICMP echo reply, id 1689, seq 10, length 2008=0A= 02:07:44.926832 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 11, length 2008=0A= 02:07:44.928511 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 11, length 2008=0A= 02:07:45.936968 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 12, length 2008=0A= 02:07:45.937722 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 12, length 2008=0A= 02:07:46.937453 IP 10.42.42.100 > 10.42.42.1: ICMP echo request, id 1689, s= eq 13, length 2008=0A= 02:07:46.938161 IP truncated-ip - 518 bytes missing! 10.42.42.1 > 10.42.42.= 100: ICMP echo reply, id 1689, seq 13, length 2008=0A= ^C=0A= 22 packets captured=0A= 24 packets received by filter=0A= 0 packets dropped by kernel=0A= =0A= This last dump was on the Linux side. Dumping at tap0 shows that the reply = packets are still okay when they reach bhyve. Apparently they get truncated= by bhyve's virtio-net as increasing the MTU works just fine with=A0Virtual= Box's virtio-net (at least on my Linux machine).=0A= =0A= Any ideas on how I can fix this? I had a quick look at the code and while a= comment indicates that=A0Ethernet-sized packets are assumed=A0(https://svn= web.freebsd.org/base/release/10.1.0/usr.sbin/bhyve/pci_virtio_net.c?revisio= n=3D274417&view=3Dmarkup#l257), I was unable to find code that confirms thi= s.=0A= =0A= Noah=