From owner-freebsd-hackers  Sun Feb 21 10:58:27 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70])
	by hub.freebsd.org (Postfix) with SMTP id B07EA111D2
	for <hackers@freebsd.org>; Sun, 21 Feb 1999 10:58:18 -0800 (PST)
	(envelope-from wpaul@skynet.ctr.columbia.edu)
Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id OAA14038; Sun, 21 Feb 1999 14:03:31 -0500
From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Message-Id: <199902211903.OAA14038@skynet.ctr.columbia.edu>
Subject: Re: How to handle jumbo etherney frames
To: ghelmer@scl.ameslab.gov (Guy Helmer)
Date: Sun, 21 Feb 1999 14:03:30 -0500 (EST)
Cc: dg@root.com, luigi@labinfo.iet.unipi.it, hackers@freebsd.org
In-Reply-To: <Pine.SGI.4.05.9902211103430.9889-100000@demios.scl.ameslab.gov> from "Guy Helmer" at Feb 21, 99 11:16:34 am
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Content-Length: 6741      
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Of all the gin joints in all the towns in all the world, Guy Helmer had 
to walk into mine and say: > 
> On Sun, 21 Feb 1999, David Greenman wrote:
 
> > >>    The jumbo frames are only useful if you also have VLAN support, which we
> > >> don't have currently. We also need support for large mbuf clusters; this
> > >
> > >hmmm i don't get this -- why is this related to VLAN ?
> > 
> >    Because most ethernets consist of a mix of hosts that don't have jumbo
> > frame capability. If you use jumbo frames without VLANs, then ALL hosts must
> > support jumbo frames (and I think would also have to be gigabit ethernet
> > connected since jumbo frames weren't supported in 802.3...although I'm
> > assuming that the gigabit spec allows for jumbo frames, which may be a
> > bad assumption on my part).

Programming the chip to use a single vlan tag wouldn't require that much
work. I was contemplating whipping up a small control program that could
grovel around in /dev/kmem and grab ahold of the driver's softc struct
and set the tag value, then re-init the interface. (And don't tell me how
this is hard and messy and I should be using sysctl instead, because I've
already written a program that does almost exactly this for a different
driver as an experiment. So there.)
 
> AFAIK, Alteon is the only source right now for a Gigabit Ethernet switch
> that can handle jumbo frames (we have one on-site right now), and it will
> automatically fragment packets when a jumbo frame is forwarded to a link
> that uses normal frames.  Seems like VLANs will not be necessary, as long
> as other switch manufactures provide this feature.  (I'm not sure what the
> performance hit due to fragmentation would be, though.)

It seems to me this would only work for IP (or some other protocol(s)
that the switch knows about), since merely splitting up an ethernet 
frame into chunks doesn't do a lot of good unless the host knows
the frame has been split and can reassemble it before passing it to
the protocols.

I'm not really inclined to just implement only standard frame support
and wait around for large mbuf cluster support to materialize since there's
no telling how long that could take. I think I may be stuck between a
rock and a hard place though since I found something in the manual which
seems to suggest that the mbuf cluster chaining approach won't work.

The Tigon supports several different receive rings: the standard ring,
which holds buffers large enough to accomodate normal sized ethernet
frames, the jumbo receive ring, which contains buffers large enough for
jumbo frames, and the mini ring, which contains small buffers of a 
user-chosen size for very small non-jumbo ethernet frames. The mini
ring is an optimization for handling small packets which would end
up wasting a lot of space were they to be DMAed into standard ring
buffers. (It's possible to perform this optimization in the driver by
copying very small frames into small mbuf chains and recycling the cluster
buffer, but using the mini ring avoids the need to copy).

Note that the mini ring is only availanle on the Tigon 2.

For the jumbo ring, you're allowed to use one of two kinds of ring
descriptors. You can use either the normal ring descriptor type (the
same as for the standard receive ring) or a special extended jumbo
receive descriptor, which differs from the normaal descriptor in that
it can point to four non-contiguous buffers while the normal type can
only point to one. You can specify the kind of descriptor you want
by setting a flag in the ring control block during initialization.

This is important because the manual seems to claim that as far as the
jumbo ring is concerned, if you use normal descriptors, each descriptor
buffer will always contain one complete frame. In other words, each 
descriptor must point to a contiguous buffer large enough to hold a
9K frame. If you want to use several non-contiguous buffers, then you
have to use the extended descriptor format, which only allows four buffers.
Since an mbuf cluster is only 2K, this isn't enough.

The only way I can think of to get around this problem is to use an
mbuf with external storage consisting of a single 9K buffer. However,
since 9K is larger than the page size, I can't be assured of always
getting 9K of contiguous storage, so I need to engage in a little
subterfuge.

What I'm thinking of doing is this:

- Program the chip to use extended jumbo ring descriptors.
- Get an mbuf using MGETHDR().
- malloc() a 9K buffer and attach it to the mbuf as external storage.
- Assign the start address of the 9K buffer to the first host address
  pointer in the ring descriptor.
- Round the address up to a page boundary.
- Assign this page address to the second host address pointer in the
  descriptor.
- Round up to the next page again.
- Assign that address to the third host address pointer.
- Set all the fragment lengths accordingly so we end up with a total
  of 9K.

Basically I'm doing page mapping for the chip. It's possible that I
might end up with contiguously allocated space in which case all of this
is a pessimization, but I can never know that without grovelling around
in the kernel page tables, and that would take a lot more work.

Am I insane? Well, wait: that's a given. But does this scheme at
least sound reasonable?

> BTW, we've found that jumbo frames make a significant difference in
> performance on the new RS/6000's we have -- peak TCP performance jumps
> from the 500Mbps range to the 800Mpbs range for 1500 vs. 9000 byte MTU.  
> We assume that the Gigabit NICs in the RS/6000's are Alteon NICs, but
> there is no identification on the NICs other than IBM's.

One trick I sometimes use to identify NICs is to do strings -a on the
driver object modules and look for something incriminating. If something
like 'tigon' or 'acenic' or 'alt(eon)' leaps out at you, then you know
it's a Tigon chip.

Given that the Tigon is a PCI chip, IBM's card must also be PCI.
If IBM really is using the Tigon chip, then you could probably use the 
IBM card in an x86 box given the right driver (IBM probably uses their 
own PCI vendor and device IDs for their card, but that's easy enough to 
handle).

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message