Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Feb 1999 15:08:08 -0800 (PST)
From:      Julian Elischer <julian@whistle.com>
To:        Bill Paul <wpaul@skynet.ctr.columbia.edu>
Cc:        Guy Helmer <ghelmer@scl.ameslab.gov>, dg@root.com, luigi@labinfo.iet.unipi.it, hackers@FreeBSD.ORG
Subject:   Re: How to handle jumbo etherney frames
Message-ID:  <Pine.BSF.4.05.9902211501390.282-100000@s204m82.isp.whistle.com>
In-Reply-To: <199902211903.OAA14038@skynet.ctr.columbia.edu>

next in thread | previous in thread | raw e-mail | index | archive | help


On Sun, 21 Feb 1999, Bill Paul wrote:

> Of all the gin joints in all the towns in all the world, Guy Helmer had 
> to walk into mine and say: > 
> > On Sun, 21 Feb 1999, David Greenman wrote:
>  
> > > >>    The jumbo frames are only useful if you also have VLAN support, which we
> > > >> don't have currently. We also need support for large mbuf clusters; this
> > > >
> > > >hmmm i don't get this -- why is this related to VLAN ?
> > > 
> > >    Because most ethernets consist of a mix of hosts that don't have jumbo
> > > frame capability. If you use jumbo frames without VLANs, then ALL hosts must
> > > support jumbo frames (and I think would also have to be gigabit ethernet
> > > connected since jumbo frames weren't supported in 802.3...although I'm
> > > assuming that the gigabit spec allows for jumbo frames, which may be a
> > > bad assumption on my part).
> 
> Programming the chip to use a single vlan tag wouldn't require that much
> work. I was contemplating whipping up a small control program that could
> grovel around in /dev/kmem and grab ahold of the driver's softc struct
> and set the tag value, then re-init the interface. (And don't tell me how
> this is hard and messy and I should be using sysctl instead, because I've
> already written a program that does almost exactly this for a different
> driver as an experiment. So there.)
>  
> > AFAIK, Alteon is the only source right now for a Gigabit Ethernet switch
> > that can handle jumbo frames (we have one on-site right now), and it will
> > automatically fragment packets when a jumbo frame is forwarded to a link
> > that uses normal frames.  Seems like VLANs will not be necessary, as long
> > as other switch manufactures provide this feature.  (I'm not sure what the
> > performance hit due to fragmentation would be, though.)
> 
> It seems to me this would only work for IP (or some other protocol(s)
> that the switch knows about), since merely splitting up an ethernet 
> frame into chunks doesn't do a lot of good unless the host knows
> the frame has been split and can reassemble it before passing it to
> the protocols.
> 
> I'm not really inclined to just implement only standard frame support
> and wait around for large mbuf cluster support to materialize since there's
> no telling how long that could take. I think I may be stuck between a
> rock and a hard place though since I found something in the manual which
> seems to suggest that the mbuf cluster chaining approach won't work.
> 
> The Tigon supports several different receive rings: the standard ring,
> which holds buffers large enough to accomodate normal sized ethernet
> frames, the jumbo receive ring, which contains buffers large enough for
> jumbo frames, and the mini ring, which contains small buffers of a 
> user-chosen size for very small non-jumbo ethernet frames. The mini
> ring is an optimization for handling small packets which would end
> up wasting a lot of space were they to be DMAed into standard ring
> buffers. (It's possible to perform this optimization in the driver by
> copying very small frames into small mbuf chains and recycling the cluster
> buffer, but using the mini ring avoids the need to copy).
> 
> Note that the mini ring is only availanle on the Tigon 2.
> 
> For the jumbo ring, you're allowed to use one of two kinds of ring
> descriptors. You can use either the normal ring descriptor type (the
> same as for the standard receive ring) or a special extended jumbo
> receive descriptor, which differs from the normaal descriptor in that
> it can point to four non-contiguous buffers while the normal type can
> only point to one. You can specify the kind of descriptor you want
> by setting a flag in the ring control block during initialization.
> 
> This is important because the manual seems to claim that as far as the
> jumbo ring is concerned, if you use normal descriptors, each descriptor
> buffer will always contain one complete frame. In other words, each 
> descriptor must point to a contiguous buffer large enough to hold a
> 9K frame. If you want to use several non-contiguous buffers, then you
> have to use the extended descriptor format, which only allows four buffers.
> Since an mbuf cluster is only 2K, this isn't enough.
> 
> The only way I can think of to get around this problem is to use an
> mbuf with external storage consisting of a single 9K buffer. However,
> since 9K is larger than the page size, I can't be assured of always
> getting 9K of contiguous storage, so I need to engage in a little
> subterfuge.

This is what we used at TRW.. We had 16KB physical buffers reserved for
use by the jumbo (in our case 15.5 KB) packets. This is why I fixed the
external buffer code in mbuf.h and uipc_mbuf.c several years ago.  All
packets when there, but packets under a certain size (100 bytes I think)
were copied out to normal mbufs.





> 
> What I'm thinking of doing is this:
> 
> - Program the chip to use extended jumbo ring descriptors.
> - Get an mbuf using MGETHDR().
> - malloc() a 9K buffer and attach it to the mbuf as external storage.

We kept a special pool of them and they were contiguous memory.

> - Assign the start address of the 9K buffer to the first host address
>   pointer in the ring descriptor.
> - Round the address up to a page boundary.
> - Assign this page address to the second host address pointer in the
>   descriptor.
> - Round up to the next page again.
> - Assign that address to the third host address pointer.
> - Set all the fragment lengths accordingly so we end up with a total
>   of 9K.

might work.


> 
> Basically I'm doing page mapping for the chip. It's possible that I
> might end up with contiguously allocated space in which case all of this
> is a pessimization, but I can never know that without grovelling around
> in the kernel page tables, and that would take a lot more work.
> 
> Am I insane? Well, wait: that's a given. But does this scheme at
> least sound reasonable?

sounds fine, but how much ram do you have?
we just allocated 4MB to buffers contiguously and left it at that..
Mind you we weren't using fast etehrnet let alone gigiabit,
but we were using 68020s on a VME bus, and then 386 on ISA.

> 
> > BTW, we've found that jumbo frames make a significant difference in
> > performance on the new RS/6000's we have -- peak TCP performance jumps
> > from the 500Mbps range to the 800Mpbs range for 1500 vs. 9000 byte MTU.  
> > We assume that the Gigabit NICs in the RS/6000's are Alteon NICs, but
> > there is no identification on the NICs other than IBM's.
> 
> One trick I sometimes use to identify NICs is to do strings -a on the
> driver object modules and look for something incriminating. If something
> like 'tigon' or 'acenic' or 'alt(eon)' leaps out at you, then you know
> it's a Tigon chip.
> 
> Given that the Tigon is a PCI chip, IBM's card must also be PCI.
> If IBM really is using the Tigon chip, then you could probably use the 
> IBM card in an x86 box given the right driver (IBM probably uses their 
> own PCI vendor and device IDs for their card, but that's easy enough to 
> handle).
> 
> -Bill
> 
> -- 
> =============================================================================
> -Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
> Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
> Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
> =============================================================================
>  "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
> =============================================================================
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9902211501390.282-100000>