Date: Sun, 21 Feb 1999 13:56:35 -0500 (EST) From: Larry Lile <lile@stdio.com> To: Bill Paul <wpaul@skynet.ctr.columbia.edu> Cc: Guy Helmer <ghelmer@scl.ameslab.gov>, dg@root.com, luigi@labinfo.iet.unipi.it, hackers@FreeBSD.ORG Subject: Re: How to handle jumbo etherney frames Message-ID: <Pine.BSF.4.05.9902211354150.10455-100000@heathers.stdio.com> In-Reply-To: <199902211903.OAA14038@skynet.ctr.columbia.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
I would also like to see the mbuf cluster size increased. I have to glue them together for token-ring support. With mtu's up to 18K I could probably starve a system of mbuf's pretty quickly. Although currently the rest of the network stack seems to deal with the big mtu's without a whimper, good job guys! Larry Lile lile@stdio.com On Sun, 21 Feb 1999, Bill Paul wrote: > Of all the gin joints in all the towns in all the world, Guy Helmer had > to walk into mine and say: > > > On Sun, 21 Feb 1999, David Greenman wrote: > > > > >> The jumbo frames are only useful if you also have VLAN support, which we > > > >> don't have currently. We also need support for large mbuf clusters; this > > > > > > > >hmmm i don't get this -- why is this related to VLAN ? > > > > > > Because most ethernets consist of a mix of hosts that don't have jumbo > > > frame capability. If you use jumbo frames without VLANs, then ALL hosts must > > > support jumbo frames (and I think would also have to be gigabit ethernet > > > connected since jumbo frames weren't supported in 802.3...although I'm > > > assuming that the gigabit spec allows for jumbo frames, which may be a > > > bad assumption on my part). > > Programming the chip to use a single vlan tag wouldn't require that much > work. I was contemplating whipping up a small control program that could > grovel around in /dev/kmem and grab ahold of the driver's softc struct > and set the tag value, then re-init the interface. (And don't tell me how > this is hard and messy and I should be using sysctl instead, because I've > already written a program that does almost exactly this for a different > driver as an experiment. So there.) > > > AFAIK, Alteon is the only source right now for a Gigabit Ethernet switch > > that can handle jumbo frames (we have one on-site right now), and it will > > automatically fragment packets when a jumbo frame is forwarded to a link > > that uses normal frames. Seems like VLANs will not be necessary, as long > > as other switch manufactures provide this feature. (I'm not sure what the > > performance hit due to fragmentation would be, though.) > > It seems to me this would only work for IP (or some other protocol(s) > that the switch knows about), since merely splitting up an ethernet > frame into chunks doesn't do a lot of good unless the host knows > the frame has been split and can reassemble it before passing it to > the protocols. > > I'm not really inclined to just implement only standard frame support > and wait around for large mbuf cluster support to materialize since there's > no telling how long that could take. I think I may be stuck between a > rock and a hard place though since I found something in the manual which > seems to suggest that the mbuf cluster chaining approach won't work. > > The Tigon supports several different receive rings: the standard ring, > which holds buffers large enough to accomodate normal sized ethernet > frames, the jumbo receive ring, which contains buffers large enough for > jumbo frames, and the mini ring, which contains small buffers of a > user-chosen size for very small non-jumbo ethernet frames. The mini > ring is an optimization for handling small packets which would end > up wasting a lot of space were they to be DMAed into standard ring > buffers. (It's possible to perform this optimization in the driver by > copying very small frames into small mbuf chains and recycling the cluster > buffer, but using the mini ring avoids the need to copy). > > Note that the mini ring is only availanle on the Tigon 2. > > For the jumbo ring, you're allowed to use one of two kinds of ring > descriptors. You can use either the normal ring descriptor type (the > same as for the standard receive ring) or a special extended jumbo > receive descriptor, which differs from the normaal descriptor in that > it can point to four non-contiguous buffers while the normal type can > only point to one. You can specify the kind of descriptor you want > by setting a flag in the ring control block during initialization. > > This is important because the manual seems to claim that as far as the > jumbo ring is concerned, if you use normal descriptors, each descriptor > buffer will always contain one complete frame. In other words, each > descriptor must point to a contiguous buffer large enough to hold a > 9K frame. If you want to use several non-contiguous buffers, then you > have to use the extended descriptor format, which only allows four buffers. > Since an mbuf cluster is only 2K, this isn't enough. > > The only way I can think of to get around this problem is to use an > mbuf with external storage consisting of a single 9K buffer. However, > since 9K is larger than the page size, I can't be assured of always > getting 9K of contiguous storage, so I need to engage in a little > subterfuge. > > What I'm thinking of doing is this: > > - Program the chip to use extended jumbo ring descriptors. > - Get an mbuf using MGETHDR(). > - malloc() a 9K buffer and attach it to the mbuf as external storage. > - Assign the start address of the 9K buffer to the first host address > pointer in the ring descriptor. > - Round the address up to a page boundary. > - Assign this page address to the second host address pointer in the > descriptor. > - Round up to the next page again. > - Assign that address to the third host address pointer. > - Set all the fragment lengths accordingly so we end up with a total > of 9K. > > Basically I'm doing page mapping for the chip. It's possible that I > might end up with contiguously allocated space in which case all of this > is a pessimization, but I can never know that without grovelling around > in the kernel page tables, and that would take a lot more work. > > Am I insane? Well, wait: that's a given. But does this scheme at > least sound reasonable? > > > BTW, we've found that jumbo frames make a significant difference in > > performance on the new RS/6000's we have -- peak TCP performance jumps > > from the 500Mbps range to the 800Mpbs range for 1500 vs. 9000 byte MTU. > > We assume that the Gigabit NICs in the RS/6000's are Alteon NICs, but > > there is no identification on the NICs other than IBM's. > > One trick I sometimes use to identify NICs is to do strings -a on the > driver object modules and look for something incriminating. If something > like 'tigon' or 'acenic' or 'alt(eon)' leaps out at you, then you know > it's a Tigon chip. > > Given that the Tigon is a PCI chip, IBM's card must also be PCI. > If IBM really is using the Tigon chip, then you could probably use the > IBM card in an x86 box given the right driver (IBM probably uses their > own PCI vendor and device IDs for their card, but that's easy enough to > handle). > > -Bill > > -- > ============================================================================= > -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu > Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research > Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City > ============================================================================= > "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" > ============================================================================= > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9902211354150.10455-100000>