Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Jul 2012 21:49:20 +0200
From:      Andre Oppermann <oppermann@networx.ch>
To:        George Neville-Neil <gnn@freebsd.org>
Cc:        Navdeep Parhar <np@FreeBSD.org>, net@freebsd.org
Subject:   Re: Interface MTU question...
Message-ID:  <50047040.5040506@networx.ch>
In-Reply-To: <C06D346A-97BE-4498-B4E5-0ED85731A8BD@freebsd.org>
References:  <86liiqrnnq.wl%gnn@neville-neil.com> <4FFDF6C7.3030301@FreeBSD.org> <C06D346A-97BE-4498-B4E5-0ED85731A8BD@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12.07.2012 16:55, George Neville-Neil wrote:
>
> On Jul 11, 2012, at 17:57 , Navdeep Parhar wrote:
>
>> On 07/11/12 14:30, gnn@freebsd.org wrote:
>>> Howdy,
>>>
>>> Does anyone know the reason for this particular check in
>>> ip_output.c?
>>>
>>> 	if (rte != NULL && (rte->rt_flags & (RTF_UP|RTF_HOST))) {
>>> 		/*
>>> 		 * This case can happen if the user changed the MTU
>>> 		 * of an interface after enabling IP on it.  Because
>>> 		 * most netifs don't keep track of routes pointing to
>>> 		 * them, there is no way for one to update all its
>>> 		 * routes when the MTU is changed.
>>> 		 */
>>> 		if (rte->rt_rmx.rmx_mtu > ifp->if_mtu)
>>>   			rte->rt_rmx.rmx_mtu = ifp->if_mtu;
>>>   		mtu = rte->rt_rmx.rmx_mtu;
>>>   	} else {
>>> 		mtu = ifp->if_mtu;
>>> 	}
>>>
>>> To my mind the > ought to be != so that any change, up or down, of the
>>> interface MTU is eventually reflected in the route.  Also, this code
>>> does not check if it is both a HOST route and UP, but only if it is
>>> one other the other, so don't be fooled by that, this check happens
>>> for any route we have if it's up.
>>
>> I believe rmx_mtu could be low due to some intermediate node between this host and the final destination.  An increase in the MTU of the local interface should not increase the path MTU if the limit was due to someone else along the route.
>
> Yes, it turns out to be complex.  We have several places that store the MTU.  There is the interface,
> which knows the MTU of the directly connected link, a route, and the host cache.  All three of these
> are used to determine the maximum segment size (MSS) of a TCP packet.  The route and the interface
> determine the maximum MTU that the MSS can have, but, if there is an entry in the host cache
> then it is preferred over either of the first two.  See tcp_update_mss() in tcp_input.c to
> see what I'm talking about.

We have three sources of the MTU for TCP to chose from (sorted in priority order):

  1. Hostcache to use a previous discovered value (pmtud).

  2. Most specific route, which can be manually set when it is known that
     a lower MTU exists along that path.

  3. Interface MTU.

The third one isn't really being used because the routes inherit the MTU
from the interface.  Number 3 is relevant when we don't store the MTU
with the route anymore unless manually set.

> I believe that the quoted code above has been wrong from the day it was written, in that what it
> really says is "if the route is up" and not "if the route is up and is a host route" which is
> what I believe people to read that as.  If the belief is that this code is really only there for
> hosts routes, then the proper fix is to make the sense of the first if match that belief
> and, again, to change the > to != so that when the administrator of the box bumps the MTU in
> either direction that the route reflects this.  It is not possible for PMTU on a single link
> to a host route to bump the number down if the interface says it's not to be bumped.  And,
> even so, any host cache entry will override and avoid this code.

The cited code is wrong in that it doesn't only test for host routes.
It is correct though that it only works one way by reducing the route
MTU to the interface MTU.  Doing an "!=" would break manual setting
of MTU on a route.

IIRC this test comes from the day when we had a host route for every
inpcb and changes to the interface didn't reflect back on all those
host routes.

It can be fixed by either testing just for (rte != NULL) or by doing
away with the bogus RTF_HOST bit.  Passing an inactive route to ip_output()
isn't exactly useful and may lead to some later bogosity.

-- 
Andre



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50047040.5040506>