From owner-freebsd-net@FreeBSD.ORG Mon Sep 27 18:33:36 2010 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C47201065670 for ; Mon, 27 Sep 2010 18:33:36 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 2AC1B8FC1F for ; Mon, 27 Sep 2010 18:33:35 +0000 (UTC) Received: (qmail 84025 invoked from network); 27 Sep 2010 18:26:06 -0000 Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 27 Sep 2010 18:26:06 -0000 Message-ID: <4CA0E382.90101@freebsd.org> Date: Mon, 27 Sep 2010 20:33:38 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100825 Thunderbird/3.1.3 MIME-Version: 1.0 To: Julian Elischer References: <4C9DA26D.7000309@freebsd.org> <4C9DB0C3.5010601@freebsd.org> <4C9EE905.5090701@freebsd.org> <4CA09792.3070307@freebsd.org> <4CA0C2A3.7000508@freebsd.org> In-Reply-To: <4CA0C2A3.7000508@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net Subject: Re: mbuf changes X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 18:33:36 -0000 On 27.09.2010 18:13, Julian Elischer wrote: > On 9/27/10 6:09 AM, Andre Oppermann wrote: >> On 26.09.2010 08:32, Julian Elischer wrote: >>> On 9/25/10 1:20 AM, Andre Oppermann wrote: >>>> On 25.09.2010 09:19, Julian Elischer wrote: >>>>> * dynamically working out what the front padding size should be.. per session.. i.e. >>>>> when a packet is sent out and needs to be adjusted to add more headers, the originating >>>>> socket should be notified, or maybe the route should have this information... >>>>> so that future packets can start out with enough head room. >>>>> (this is not strictly to do with mbufs but might need some added field to point to the structure >>>>> that needs to be >>>>> updated. >>>> >>>> We already have "max_linkhdr" that specifies how much space is left >>>> for prepends at the start of each packet. The link protocols set >>>> this and also IPSec adds itself in there if enabled. If you have >>>> other encapsulations you should make them add in there as well. >>> >>> this doesn't take into account tunneling and encapsulation. >> >> It should/could but the tunneling and encapsulation protocols have to >> add themself to it when active. IPSec does this. > > yes bit the troubel is that every packet is then given a worst -case reserved area at the front Yes, but so what? We've got the space in the mbuf anyway. Right now it lays unused at the end. See below for more detailed explanation. <----------mbuf----------> ppdddddddddd............ now pppppppppdddddddddd..... with large prepend area p = prepend d = data >>> we could do a lot better than this. >>> especially on a per-route basis. >>> if the first mbuf in a session had a pointer to the relevent rtentry, >>> then as it is processed that could be updated.. >> >> Please please please don't add a rtentry pointer to the mbuf. Besides >> that the routing table is a very poor place to do this. We don't have >> host routes anymore and the locking and refcounting is rather expensive. > > yes but we do have a route cache > (and we probably should still have some form of host routes but that's a > different issue not to be argued here.) We have the hostcache (which needs some revisiting). >> max_linkhdr should be sufficient (fix small fixes to some protocol mbuf >> allocators) even for excessive cases of encapsulation: > > max-linkhdr is way too big for 99% of all packets. That doesn't matter in practice. We have a very binary distribution for the packets and the space in the mbuf is there anyway. Today it's simply not used. We tend to have small packets (TCP ACK for example) and large packets at around MTU (bulk data transfer). For normal mbuf's (256 bytes) the header and lots of encapsulation fit. For mbuf clusters (2Kbytes) there is plenty of space too. For packets in between that currently may have fit into a normal mbuf we may have to switch to allocating a cluster earlier. That's no biggy though and doesn't happen too often, is not much overhead and only with excessive encapsulation. Unless you can demonstrate a realistic case where the encapsulation overhead with a large max_linkhdr is actually causing a measurable pessimization I'd say the complexity of adding a mechanism you propose is not justified. >> TCP over IPv4 over IPSec(AH+ESP) over UDP over IPv6 over PPPoE over Ethernet = >> 60 + 20 + (8+24) + 8 + 40 + 8 + 14 = 182 total, of which 102 are prepends. I forgot MPLS, add another 4 bytes. ;-) For 32bit machines (60 bytes mbuf headers) this fits just fine. For 64bit machines (84 bytes mbuf headers) it fits for TCP ACK just fine. >> Maybe we need an API for the tunneling and encapsulation protocols to >> add their overhead to max_linkhdr. -- Andre