Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Feb 1999 13:56:35 -0500 (EST)
From:      Larry Lile <lile@stdio.com>
To:        Bill Paul <wpaul@skynet.ctr.columbia.edu>
Cc:        Guy Helmer <ghelmer@scl.ameslab.gov>, dg@root.com, luigi@labinfo.iet.unipi.it, hackers@FreeBSD.ORG
Subject:   Re: How to handle jumbo etherney frames
Message-ID:  <Pine.BSF.4.05.9902211354150.10455-100000@heathers.stdio.com>
In-Reply-To: <199902211903.OAA14038@skynet.ctr.columbia.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

I would also like to see the mbuf cluster size increased.  I have to 
glue them together for token-ring support.  With mtu's up to 18K
I could probably starve a system of mbuf's pretty quickly.  Although
currently the rest of the network stack seems to deal with the big
mtu's without a whimper, good job guys!

Larry Lile
lile@stdio.com

On Sun, 21 Feb 1999, Bill Paul wrote:

> Of all the gin joints in all the towns in all the world, Guy Helmer had 
> to walk into mine and say: > 
> > On Sun, 21 Feb 1999, David Greenman wrote:
>  
> > > >>    The jumbo frames are only useful if you also have VLAN support, which we
> > > >> don't have currently. We also need support for large mbuf clusters; this
> > > >
> > > >hmmm i don't get this -- why is this related to VLAN ?
> > > 
> > >    Because most ethernets consist of a mix of hosts that don't have jumbo
> > > frame capability. If you use jumbo frames without VLANs, then ALL hosts must
> > > support jumbo frames (and I think would also have to be gigabit ethernet
> > > connected since jumbo frames weren't supported in 802.3...although I'm
> > > assuming that the gigabit spec allows for jumbo frames, which may be a
> > > bad assumption on my part).
> 
> Programming the chip to use a single vlan tag wouldn't require that much
> work. I was contemplating whipping up a small control program that could
> grovel around in /dev/kmem and grab ahold of the driver's softc struct
> and set the tag value, then re-init the interface. (And don't tell me how
> this is hard and messy and I should be using sysctl instead, because I've
> already written a program that does almost exactly this for a different
> driver as an experiment. So there.)
>  
> > AFAIK, Alteon is the only source right now for a Gigabit Ethernet switch
> > that can handle jumbo frames (we have one on-site right now), and it will
> > automatically fragment packets when a jumbo frame is forwarded to a link
> > that uses normal frames.  Seems like VLANs will not be necessary, as long
> > as other switch manufactures provide this feature.  (I'm not sure what the
> > performance hit due to fragmentation would be, though.)
> 
> It seems to me this would only work for IP (or some other protocol(s)
> that the switch knows about), since merely splitting up an ethernet 
> frame into chunks doesn't do a lot of good unless the host knows
> the frame has been split and can reassemble it before passing it to
> the protocols.
> 
> I'm not really inclined to just implement only standard frame support
> and wait around for large mbuf cluster support to materialize since there's
> no telling how long that could take. I think I may be stuck between a
> rock and a hard place though since I found something in the manual which
> seems to suggest that the mbuf cluster chaining approach won't work.
> 
> The Tigon supports several different receive rings: the standard ring,
> which holds buffers large enough to accomodate normal sized ethernet
> frames, the jumbo receive ring, which contains buffers large enough for
> jumbo frames, and the mini ring, which contains small buffers of a 
> user-chosen size for very small non-jumbo ethernet frames. The mini
> ring is an optimization for handling small packets which would end
> up wasting a lot of space were they to be DMAed into standard ring
> buffers. (It's possible to perform this optimization in the driver by
> copying very small frames into small mbuf chains and recycling the cluster
> buffer, but using the mini ring avoids the need to copy).
> 
> Note that the mini ring is only availanle on the Tigon 2.
> 
> For the jumbo ring, you're allowed to use one of two kinds of ring
> descriptors. You can use either the normal ring descriptor type (the
> same as for the standard receive ring) or a special extended jumbo
> receive descriptor, which differs from the normaal descriptor in that
> it can point to four non-contiguous buffers while the normal type can
> only point to one. You can specify the kind of descriptor you want
> by setting a flag in the ring control block during initialization.
> 
> This is important because the manual seems to claim that as far as the
> jumbo ring is concerned, if you use normal descriptors, each descriptor
> buffer will always contain one complete frame. In other words, each 
> descriptor must point to a contiguous buffer large enough to hold a
> 9K frame. If you want to use several non-contiguous buffers, then you
> have to use the extended descriptor format, which only allows four buffers.
> Since an mbuf cluster is only 2K, this isn't enough.
> 
> The only way I can think of to get around this problem is to use an
> mbuf with external storage consisting of a single 9K buffer. However,
> since 9K is larger than the page size, I can't be assured of always
> getting 9K of contiguous storage, so I need to engage in a little
> subterfuge.
> 
> What I'm thinking of doing is this:
> 
> - Program the chip to use extended jumbo ring descriptors.
> - Get an mbuf using MGETHDR().
> - malloc() a 9K buffer and attach it to the mbuf as external storage.
> - Assign the start address of the 9K buffer to the first host address
>   pointer in the ring descriptor.
> - Round the address up to a page boundary.
> - Assign this page address to the second host address pointer in the
>   descriptor.
> - Round up to the next page again.
> - Assign that address to the third host address pointer.
> - Set all the fragment lengths accordingly so we end up with a total
>   of 9K.
> 
> Basically I'm doing page mapping for the chip. It's possible that I
> might end up with contiguously allocated space in which case all of this
> is a pessimization, but I can never know that without grovelling around
> in the kernel page tables, and that would take a lot more work.
> 
> Am I insane? Well, wait: that's a given. But does this scheme at
> least sound reasonable?
> 
> > BTW, we've found that jumbo frames make a significant difference in
> > performance on the new RS/6000's we have -- peak TCP performance jumps
> > from the 500Mbps range to the 800Mpbs range for 1500 vs. 9000 byte MTU.  
> > We assume that the Gigabit NICs in the RS/6000's are Alteon NICs, but
> > there is no identification on the NICs other than IBM's.
> 
> One trick I sometimes use to identify NICs is to do strings -a on the
> driver object modules and look for something incriminating. If something
> like 'tigon' or 'acenic' or 'alt(eon)' leaps out at you, then you know
> it's a Tigon chip.
> 
> Given that the Tigon is a PCI chip, IBM's card must also be PCI.
> If IBM really is using the Tigon chip, then you could probably use the 
> IBM card in an x86 box given the right driver (IBM probably uses their 
> own PCI vendor and device IDs for their card, but that's easy enough to 
> handle).
> 
> -Bill
> 
> -- 
> =============================================================================
> -Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
> Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
> Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
> =============================================================================
>  "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
> =============================================================================
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9902211354150.10455-100000>