From owner-freebsd-hackers Sun Feb 21 10:58:27 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70]) by hub.freebsd.org (Postfix) with SMTP id B07EA111D2 for ; Sun, 21 Feb 1999 10:58:18 -0800 (PST) (envelope-from wpaul@skynet.ctr.columbia.edu) Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id OAA14038; Sun, 21 Feb 1999 14:03:31 -0500 From: Bill Paul Message-Id: <199902211903.OAA14038@skynet.ctr.columbia.edu> Subject: Re: How to handle jumbo etherney frames To: ghelmer@scl.ameslab.gov (Guy Helmer) Date: Sun, 21 Feb 1999 14:03:30 -0500 (EST) Cc: dg@root.com, luigi@labinfo.iet.unipi.it, hackers@freebsd.org In-Reply-To: from "Guy Helmer" at Feb 21, 99 11:16:34 am X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 6741 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Of all the gin joints in all the towns in all the world, Guy Helmer had to walk into mine and say: > > On Sun, 21 Feb 1999, David Greenman wrote: > > >> The jumbo frames are only useful if you also have VLAN support, which we > > >> don't have currently. We also need support for large mbuf clusters; this > > > > > >hmmm i don't get this -- why is this related to VLAN ? > > > > Because most ethernets consist of a mix of hosts that don't have jumbo > > frame capability. If you use jumbo frames without VLANs, then ALL hosts must > > support jumbo frames (and I think would also have to be gigabit ethernet > > connected since jumbo frames weren't supported in 802.3...although I'm > > assuming that the gigabit spec allows for jumbo frames, which may be a > > bad assumption on my part). Programming the chip to use a single vlan tag wouldn't require that much work. I was contemplating whipping up a small control program that could grovel around in /dev/kmem and grab ahold of the driver's softc struct and set the tag value, then re-init the interface. (And don't tell me how this is hard and messy and I should be using sysctl instead, because I've already written a program that does almost exactly this for a different driver as an experiment. So there.) > AFAIK, Alteon is the only source right now for a Gigabit Ethernet switch > that can handle jumbo frames (we have one on-site right now), and it will > automatically fragment packets when a jumbo frame is forwarded to a link > that uses normal frames. Seems like VLANs will not be necessary, as long > as other switch manufactures provide this feature. (I'm not sure what the > performance hit due to fragmentation would be, though.) It seems to me this would only work for IP (or some other protocol(s) that the switch knows about), since merely splitting up an ethernet frame into chunks doesn't do a lot of good unless the host knows the frame has been split and can reassemble it before passing it to the protocols. I'm not really inclined to just implement only standard frame support and wait around for large mbuf cluster support to materialize since there's no telling how long that could take. I think I may be stuck between a rock and a hard place though since I found something in the manual which seems to suggest that the mbuf cluster chaining approach won't work. The Tigon supports several different receive rings: the standard ring, which holds buffers large enough to accomodate normal sized ethernet frames, the jumbo receive ring, which contains buffers large enough for jumbo frames, and the mini ring, which contains small buffers of a user-chosen size for very small non-jumbo ethernet frames. The mini ring is an optimization for handling small packets which would end up wasting a lot of space were they to be DMAed into standard ring buffers. (It's possible to perform this optimization in the driver by copying very small frames into small mbuf chains and recycling the cluster buffer, but using the mini ring avoids the need to copy). Note that the mini ring is only availanle on the Tigon 2. For the jumbo ring, you're allowed to use one of two kinds of ring descriptors. You can use either the normal ring descriptor type (the same as for the standard receive ring) or a special extended jumbo receive descriptor, which differs from the normaal descriptor in that it can point to four non-contiguous buffers while the normal type can only point to one. You can specify the kind of descriptor you want by setting a flag in the ring control block during initialization. This is important because the manual seems to claim that as far as the jumbo ring is concerned, if you use normal descriptors, each descriptor buffer will always contain one complete frame. In other words, each descriptor must point to a contiguous buffer large enough to hold a 9K frame. If you want to use several non-contiguous buffers, then you have to use the extended descriptor format, which only allows four buffers. Since an mbuf cluster is only 2K, this isn't enough. The only way I can think of to get around this problem is to use an mbuf with external storage consisting of a single 9K buffer. However, since 9K is larger than the page size, I can't be assured of always getting 9K of contiguous storage, so I need to engage in a little subterfuge. What I'm thinking of doing is this: - Program the chip to use extended jumbo ring descriptors. - Get an mbuf using MGETHDR(). - malloc() a 9K buffer and attach it to the mbuf as external storage. - Assign the start address of the 9K buffer to the first host address pointer in the ring descriptor. - Round the address up to a page boundary. - Assign this page address to the second host address pointer in the descriptor. - Round up to the next page again. - Assign that address to the third host address pointer. - Set all the fragment lengths accordingly so we end up with a total of 9K. Basically I'm doing page mapping for the chip. It's possible that I might end up with contiguously allocated space in which case all of this is a pessimization, but I can never know that without grovelling around in the kernel page tables, and that would take a lot more work. Am I insane? Well, wait: that's a given. But does this scheme at least sound reasonable? > BTW, we've found that jumbo frames make a significant difference in > performance on the new RS/6000's we have -- peak TCP performance jumps > from the 500Mbps range to the 800Mpbs range for 1500 vs. 9000 byte MTU. > We assume that the Gigabit NICs in the RS/6000's are Alteon NICs, but > there is no identification on the NICs other than IBM's. One trick I sometimes use to identify NICs is to do strings -a on the driver object modules and look for something incriminating. If something like 'tigon' or 'acenic' or 'alt(eon)' leaps out at you, then you know it's a Tigon chip. Given that the Tigon is a PCI chip, IBM's card must also be PCI. If IBM really is using the Tigon chip, then you could probably use the IBM card in an x86 box given the right driver (IBM probably uses their own PCI vendor and device IDs for their card, but that's easy enough to handle). -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" ============================================================================= To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message