Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Jan 2001 11:12:51 -0600 (CST)
From:      Joseph Thomas <jpt@networkcs.com>
To:        mike@sentex.net (Mike Tancsa)
Cc:        rh@matriplex.com (Richard Hodges), freebsd-atm@FreeBSD.ORG
Subject:   Re: HARP (lockups)  with hea
Message-ID:  <200101191712.LAA08022@us.networkcs.com>
In-Reply-To: <4.2.2.20010118205333.01e8dcd8@marble.sentex.net> from "Mike Tancsa" at Jan 18, 2001 11:35:33 PM

next in thread | previous in thread | raw e-mail | index | archive | help

Couple of things:

	- First, a great big THANKS to all those that have stepped up
and done some real detective work on HARP (especially with regards to
the MBUF issue) and those that have offered encouraging advice and support!

	- Second, an appology for us (the developers) for being as responsive
as we'd like/should be. For some background, there were four of us on the
original project which was developed under DARPA sponsorship. Two of those
are no longer here, the two of us that stayed just completed/are still
on sabbatical,  and there is no funding or support (from DARPA, others, or
current emplyers) to continue this work. In addition, the equipment used
(switches, adapters, fiber, etc) is government owned and no more funding,
we get into some sticky issues dealing with appropriate use, etc. Without
some sponsorship/approval to work on HARP, I suspect the best we can offer
is looking at code and offering suggestions/design decisions without
actually being able to test anything and given all our current job duties,
I can't say that we'll be timely in our responses. History tends to support
that we won't be...

	- HARP and Chuck's code were never trying to solve the same
problem. Chuck and done some excellent work, has provided valuable
advice and insight to us, and has even modified his IDT driver to support
the HARP stack. There will be numerous differences and as seen, some
of very signifcant magnitude. If Chuck's code works better for an
application, please use it!


Onto below:

	The ENI device uses buffer space on the adapter for sending
and receiving PDUs. Alignment, as most driver developers well know,
is always a big issue. To begin with, the ENI memory must be divided
up to provide receive space for any open VCs and the adapter places
limits on what size these must be and how they are aligned. Within
the buffers, the PDUs themselves need alginmet -- you can't simply
start the next PDU where the last one ended. In short, you never can
have 100% memory utilization by real data. This differs dignifcantly
from the FORE device where all PDU data is stored in system memory,
not adapter memory.

	What's happening below is that HARP does not use any throttle
mechanism or secondary queueing for transmitting PDUs. When a PDU
comes down through the stack, if there are no buffers for transmit,
the PDU (mbuf chain) is discarded. I believe that Chuck actually
maintains a seperate queue to store these on and will attempt to
drain it when current tranmit operations complete. Because UDP packets
can be generated faster than TCP, and netperf defaults to UDP (at least
it used to), yes, you will see really bad netperf numbers. Rolled into
this, and I never examined Chuck's code to see how susceptable his
stack is, is the classic LFN -- large fat network, problem. FreeBSD
has never been optimized out of the box (at least upto where we
were still developing and testing) for large fast networks. The send
and receive space for TCP and UDP used to be 8K. [BTW - this is not
a FreeBSD problem, it exists in all network implementations.] In testing
various systems, I used ttcp and varied PDU and socket (send/receive) space.
Best performance was obviously with larger packets but seemed like the
knee for performance seemed to be around a 60-80KB send/receive space.
I have measured ttcp throughputs of 120-125Kb/s on an OC3c interface. [As
an aside, one of the early systems we worked with was the SGI Challenge/Onyx
line with multiple CPUs, etc. We wanted to move data at OC12 rates before
OC12 adapters were available by using multiple OC3c adapters. In my tests
on the SGI and with early FreeBSD (2.X, very early 3.X line), no matter
how many adapters I had running, even on seperate I/O buses when in the
SGI, the systems all seemed to peak at the same point. Adding adapters
dropped the per adapter performance at roughly a 1/n'th share, but the
overall throughput didn't vary. It also appeared that this limitation
was more on the receiving side vs. the sending side and appeared to be
in the kernel code vs. HARP code as I could reproduce that same results
with 100Mbit enet. It would be interesting to redo these tests today
since alot of good things have happened since then...]

Bottom line -- HARP is throwing away one PDU for each "not enough room"
message whereas I believe Chuck's code will queue and attempt to send.

> 
> At 05:13 PM 1/18/2001 -0800, Richard Hodges wrote:
> >On Thu, 18 Jan 2001, Mike Tancsa wrote:
> >
> > > OK, I ran the same tests using the Chuck Cranor driver and the tests
> > > completed as expected.  Any ideas how I would go debugging this ?
> >
> >You could compile the driver with "DO_LOG" and see if anything interesting
> >shows up in the system log.
> 
[snip]
> Doesnt really show too much.
> But kern.* is full of
> 
> Jan 18 22:29:10 ruby2 last message repeated 23 times
> Jan 18 22:29:10 ruby2 /kernel: eni_outt: not enough room in buffer
> Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer
> Jan 18 22:29:10 ruby2 last message repeated 23 times
> Jan 18 22:29:10 ruby2 /kernel: t: not enough room in buffer
> Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer
> Jan 18 22:29:10 ruby2 last message repeated 23 times
> Jan 18 22:29:10 ruby2 /kernel: eni_outt: not enough room in buffer
> Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer
> Jan 18 22:29:10 ruby2 last message repeated 23 times
> Jan 18 22:29:10 ruby2 /kernel: t: not enough room in buffer
> Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer
> Jan 18 22:29:10 ruby2 last message repeated 23 times
> Jan 18 22:29:10 ruby2 /kernel: eni_outt: not enough room in buffer
> Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer
> Jan 18 22:29:10 ruby2 last message repeated 797 times
> Jan 18 22:32:35 ruby2 /kernel: eni_output: not enough room in buffer
> 
[snip]

	Yes, HARP does not attempt to do anything with determinging DMA
burst sizes supported. We found numerous early systems which wouldn't
even support anything over 1WORD DMA. Fortunately, most of these problems
have long since been fixed but the original HARP was defaulted to only
use 1WORD DMA sizes. Our READMEs talked about the problem and suggested
that a knowledgeable user could change the DMA size and find the break
point where things work/don't work.

	Because we are in effect a commercial development organization,
we had lot's of issues to deal with as far use of other code/works and
distribution issues. Being limited in time and money, we couldn't go
off and develop/redevelop all the things that we would have liked HARP
to include. ENI provided us with code which determines supported DMA
sizes as well as Chuck's code, but for the above reasons, we couldn't
use anything we had access to and couldn't justify the time/effort to
develop/document a clean version of this. If someone wanted to do this,
yes, it'd be a great addition. As another aside, this same process is
what prevented us from distributing a microcode object file for the
FORE adapters even though FORE more or less gave us permission to do so. 
I'm sure there are very good legal reasons for doing so and have great
respect and trust for our legal staff here. As a developer, especially
one developing in the free (or atleast open) source arena, yes, it
causes some headaches that I'd rather just "do it" than ask about it.
Oh well...


>   - the HARP driver does not test the DMA engine at boot time to verify
>          proper operation.
>   - the HARP driver does not dynamically determine what DMA burst
>          sizes are valid on a specific system and does not handle the
>          strict DMA alignment required on some systems (e.g. the
>          "alburst" restriction common on sparcs).  Instead the HARP
>          driver hardwares the max DMA burst to 8 words and doesn't make
>          use of 16 word DMA bursts.
> 


	Somethings to try:

	- use Chuck's code;

	- look at adding DMA_USE16WORD support to eni_transmit.c. One would
of course need to ensure that 16WORD DMA works on your machine;

	- check to see if netperf can insert a inter-packet delay. Causing
a small delay between when packets are sent might allow the driver to 
free enough resources to reduce the number of drops;

	- add a delayed transmit queue which could hold packets which
would otherwise have been dropped. Check it when sending before looking
for new additions to the interface xmit queue.

-- 
Joseph Thomas                           E/Mail:  jpt@networkcs.com
Network Computing Services, Inc.    	 	 jpt@magic.net
1200 Washington Ave So.			Tel:	 +1 612 337 3558
Minneapolis, MN     55415-1227          FAX:     +1 612 337 3400

	An elephant is a mouse with an operating system.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-atm" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200101191712.LAA08022>