From owner-freebsd-arch@FreeBSD.ORG Sat Sep 18 23:17:21 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1708F16A4CE for ; Sat, 18 Sep 2004 23:17:21 +0000 (GMT) Received: from mail2.speakeasy.net (mail2.speakeasy.net [216.254.0.202]) by mx1.FreeBSD.org (Postfix) with ESMTP id AF78B43D48 for ; Sat, 18 Sep 2004 23:17:20 +0000 (GMT) (envelope-from jmg@hydrogen.funkthat.com) Received: (qmail 24877 invoked from network); 18 Sep 2004 23:17:20 -0000 Received: from gate.funkthat.com (HELO hydrogen.funkthat.com) ([69.17.45.168]) (envelope-sender ) by mail2.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 18 Sep 2004 23:17:20 -0000 Received: from hydrogen.funkthat.com (hcxina@localhost.funkthat.com [127.0.0.1])i8INHJuU084806; Sat, 18 Sep 2004 16:17:20 -0700 (PDT) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.12.10/8.12.10/Submit) id i8INHJWE084805; Sat, 18 Sep 2004 16:17:19 -0700 (PDT) Date: Sat, 18 Sep 2004 16:17:19 -0700 From: John-Mark Gurney To: Andre Oppermann Message-ID: <20040918231719.GV72089@funkthat.com> Mail-Followup-To: Andre Oppermann , freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org References: <20040906050435.GA72089@funkthat.com> <41408D4C.E33B6F98@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41408D4C.E33B6F98@freebsd.org> User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 4.2-RELEASE i386 X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31 96 7A 22 B3 D8 56 36 F4 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html cc: freebsd-net@freebsd.org cc: freebsd-arch@freebsd.org Subject: Re: better MTU support... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2004 23:17:21 -0000 Andre Oppermann wrote this message on Thu, Sep 09, 2004 at 19:05 +0200: Ok, finally got a switch (and gige cards, if_re needs work) capable of jumbo frames.. > John-Mark Gurney wrote: > > In a recent experiment w/ Jumbo frames, I found out that sending ip > > frames completely ignores the MTU set on host routes. This makes it > > difficult (or next to impossible) to support a network that has both > > regular and jumbo frames on it as you can't restrict some hosts to the > > smaller frames. > > What you should do instead is to set the MTU on the interface to 9018 > or so and then have a default route with MTU 1500 for everything else. > Now you can specify larger MTUs for hosts that support it. > > Otherwise you are opening a can of worms... This doesn't fix it, since the output still doesn't honor the mtu on the route.. Note, I'm not testing tcp, only udp and icmp since I've seen that TCP already works fine... # netstat -rnWfinet Routing tables Internet: Destination Gateway Flags Refs Use Mtu Netif Expire default 192.168.0.14 UGS 0 11 1500 em0 127.0.0.1 127.0.0.1 UH 0 40 16384 lo0 192.168.0 link#5 UC 0 0 9000 em0 192.168.0.1 00:a0:c9:59:8b:6c UHLW 0 33 1500 em0 175 192.168.0.3 00:0a:95:9e:8b:88 UHLW 0 1988 9000 em0 374 192.168.0.14 00:a0:c9:31:30:5e UHLW 1 8 1500 em0 955 192.168.0.20 00:07:e9:0d:aa:ca UHLW 0 18 9000 em0 187 192.168.0.21 00:07:e9:0d:ad:06 UHLW 0 2 9000 lo0 tcpdump output: 16:02:14.311079 IP 192.168.0.21 > 192.168.0.1: icmp 5008: echo request seq 14 16:02:15.320981 IP 192.168.0.21 > 192.168.0.1: icmp 5008: echo request seq 15 16:04:54.720890 IP 192.168.0.21 > 128.223.122.47: icmp 5008: echo request seq 0 16:04:55.727148 IP 192.168.0.21 > 128.223.122.47: icmp 5008: echo request seq 1 16:05:02.288989 IP 192.168.0.21 > 192.168.0.20: icmp 5008: echo request seq 0 16:05:02.289856 IP 192.168.0.20 > 192.168.0.21: icmp 5008: echo reply seq 0 16:05:03.296481 IP 192.168.0.21 > 192.168.0.20: icmp 5008: echo request seq 1 16:05:03.297282 IP 192.168.0.20 > 192.168.0.21: icmp 5008: echo reply seq 1 So, as you can see, it's broken... with my patch, ip properly fragments the packets to machines with smaller mtu... > > I now have a patch to ip_output that makes it obay the MTU set on the > > route instead of that of the interface. > > Your patch corrects a problem in ip_output where a smaller MTU on an > rtentry was ignored but that is only for the non-TCP cases. When you > open a TCP session the MTU will be honored (see tcp_subr.c:tcp_maxmtu). > If not it would be a bug. > > Could you try your large MTU setup again using the procedure I desribed > above? > > That should solve your immediate problem. Nope, it doesn't... > For the general 'bug' in ip_output that it doesn't honour a smaller MTU > on a route I'd like to do a more throughout fix. Routes should be > created with MTU 0 if the MTU is not different from the if_mtu. Only > in those cases where you want to have a lower MTU you set it. For cloned > routes the MTU would be cloned from the parent. This range of changes is > more intrusive. On top of that comes the new ARP code which will have a > MTU field as well. This one is supposed to store different MTUs for mixed > MTU L2 networks. How to transport the MTU information is a separate > discussion. > > If the fix above works for you I'd like to do the real fix later (< end > of year) and not change the current behaviour in ip_output at the moment. It wouldn't be hard to add to my patch the check to see if the route's mtu is 0 and just use the if mtu... which then solves the ip part of your more complete fix... Then when you finally fix the route/arp stuff nothing else should be necessary... Sound good? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."