Date: Fri, 19 Jan 2001 11:12:51 -0600 (CST) From: Joseph Thomas <jpt@networkcs.com> To: mike@sentex.net (Mike Tancsa) Cc: rh@matriplex.com (Richard Hodges), freebsd-atm@FreeBSD.ORG Subject: Re: HARP (lockups) with hea Message-ID: <200101191712.LAA08022@us.networkcs.com> In-Reply-To: <4.2.2.20010118205333.01e8dcd8@marble.sentex.net> from "Mike Tancsa" at Jan 18, 2001 11:35:33 PM
next in thread | previous in thread | raw e-mail | index | archive | help
Couple of things: - First, a great big THANKS to all those that have stepped up and done some real detective work on HARP (especially with regards to the MBUF issue) and those that have offered encouraging advice and support! - Second, an appology for us (the developers) for being as responsive as we'd like/should be. For some background, there were four of us on the original project which was developed under DARPA sponsorship. Two of those are no longer here, the two of us that stayed just completed/are still on sabbatical, and there is no funding or support (from DARPA, others, or current emplyers) to continue this work. In addition, the equipment used (switches, adapters, fiber, etc) is government owned and no more funding, we get into some sticky issues dealing with appropriate use, etc. Without some sponsorship/approval to work on HARP, I suspect the best we can offer is looking at code and offering suggestions/design decisions without actually being able to test anything and given all our current job duties, I can't say that we'll be timely in our responses. History tends to support that we won't be... - HARP and Chuck's code were never trying to solve the same problem. Chuck and done some excellent work, has provided valuable advice and insight to us, and has even modified his IDT driver to support the HARP stack. There will be numerous differences and as seen, some of very signifcant magnitude. If Chuck's code works better for an application, please use it! Onto below: The ENI device uses buffer space on the adapter for sending and receiving PDUs. Alignment, as most driver developers well know, is always a big issue. To begin with, the ENI memory must be divided up to provide receive space for any open VCs and the adapter places limits on what size these must be and how they are aligned. Within the buffers, the PDUs themselves need alginmet -- you can't simply start the next PDU where the last one ended. In short, you never can have 100% memory utilization by real data. This differs dignifcantly from the FORE device where all PDU data is stored in system memory, not adapter memory. What's happening below is that HARP does not use any throttle mechanism or secondary queueing for transmitting PDUs. When a PDU comes down through the stack, if there are no buffers for transmit, the PDU (mbuf chain) is discarded. I believe that Chuck actually maintains a seperate queue to store these on and will attempt to drain it when current tranmit operations complete. Because UDP packets can be generated faster than TCP, and netperf defaults to UDP (at least it used to), yes, you will see really bad netperf numbers. Rolled into this, and I never examined Chuck's code to see how susceptable his stack is, is the classic LFN -- large fat network, problem. FreeBSD has never been optimized out of the box (at least upto where we were still developing and testing) for large fast networks. The send and receive space for TCP and UDP used to be 8K. [BTW - this is not a FreeBSD problem, it exists in all network implementations.] In testing various systems, I used ttcp and varied PDU and socket (send/receive) space. Best performance was obviously with larger packets but seemed like the knee for performance seemed to be around a 60-80KB send/receive space. I have measured ttcp throughputs of 120-125Kb/s on an OC3c interface. [As an aside, one of the early systems we worked with was the SGI Challenge/Onyx line with multiple CPUs, etc. We wanted to move data at OC12 rates before OC12 adapters were available by using multiple OC3c adapters. In my tests on the SGI and with early FreeBSD (2.X, very early 3.X line), no matter how many adapters I had running, even on seperate I/O buses when in the SGI, the systems all seemed to peak at the same point. Adding adapters dropped the per adapter performance at roughly a 1/n'th share, but the overall throughput didn't vary. It also appeared that this limitation was more on the receiving side vs. the sending side and appeared to be in the kernel code vs. HARP code as I could reproduce that same results with 100Mbit enet. It would be interesting to redo these tests today since alot of good things have happened since then...] Bottom line -- HARP is throwing away one PDU for each "not enough room" message whereas I believe Chuck's code will queue and attempt to send. > > At 05:13 PM 1/18/2001 -0800, Richard Hodges wrote: > >On Thu, 18 Jan 2001, Mike Tancsa wrote: > > > > > OK, I ran the same tests using the Chuck Cranor driver and the tests > > > completed as expected. Any ideas how I would go debugging this ? > > > >You could compile the driver with "DO_LOG" and see if anything interesting > >shows up in the system log. > [snip] > Doesnt really show too much. > But kern.* is full of > > Jan 18 22:29:10 ruby2 last message repeated 23 times > Jan 18 22:29:10 ruby2 /kernel: eni_outt: not enough room in buffer > Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer > Jan 18 22:29:10 ruby2 last message repeated 23 times > Jan 18 22:29:10 ruby2 /kernel: t: not enough room in buffer > Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer > Jan 18 22:29:10 ruby2 last message repeated 23 times > Jan 18 22:29:10 ruby2 /kernel: eni_outt: not enough room in buffer > Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer > Jan 18 22:29:10 ruby2 last message repeated 23 times > Jan 18 22:29:10 ruby2 /kernel: t: not enough room in buffer > Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer > Jan 18 22:29:10 ruby2 last message repeated 23 times > Jan 18 22:29:10 ruby2 /kernel: eni_outt: not enough room in buffer > Jan 18 22:29:10 ruby2 /kernel: eni_output: not enough room in buffer > Jan 18 22:29:10 ruby2 last message repeated 797 times > Jan 18 22:32:35 ruby2 /kernel: eni_output: not enough room in buffer > [snip] Yes, HARP does not attempt to do anything with determinging DMA burst sizes supported. We found numerous early systems which wouldn't even support anything over 1WORD DMA. Fortunately, most of these problems have long since been fixed but the original HARP was defaulted to only use 1WORD DMA sizes. Our READMEs talked about the problem and suggested that a knowledgeable user could change the DMA size and find the break point where things work/don't work. Because we are in effect a commercial development organization, we had lot's of issues to deal with as far use of other code/works and distribution issues. Being limited in time and money, we couldn't go off and develop/redevelop all the things that we would have liked HARP to include. ENI provided us with code which determines supported DMA sizes as well as Chuck's code, but for the above reasons, we couldn't use anything we had access to and couldn't justify the time/effort to develop/document a clean version of this. If someone wanted to do this, yes, it'd be a great addition. As another aside, this same process is what prevented us from distributing a microcode object file for the FORE adapters even though FORE more or less gave us permission to do so. I'm sure there are very good legal reasons for doing so and have great respect and trust for our legal staff here. As a developer, especially one developing in the free (or atleast open) source arena, yes, it causes some headaches that I'd rather just "do it" than ask about it. Oh well... > - the HARP driver does not test the DMA engine at boot time to verify > proper operation. > - the HARP driver does not dynamically determine what DMA burst > sizes are valid on a specific system and does not handle the > strict DMA alignment required on some systems (e.g. the > "alburst" restriction common on sparcs). Instead the HARP > driver hardwares the max DMA burst to 8 words and doesn't make > use of 16 word DMA bursts. > Somethings to try: - use Chuck's code; - look at adding DMA_USE16WORD support to eni_transmit.c. One would of course need to ensure that 16WORD DMA works on your machine; - check to see if netperf can insert a inter-packet delay. Causing a small delay between when packets are sent might allow the driver to free enough resources to reduce the number of drops; - add a delayed transmit queue which could hold packets which would otherwise have been dropped. Check it when sending before looking for new additions to the interface xmit queue. -- Joseph Thomas E/Mail: jpt@networkcs.com Network Computing Services, Inc. jpt@magic.net 1200 Washington Ave So. Tel: +1 612 337 3558 Minneapolis, MN 55415-1227 FAX: +1 612 337 3400 An elephant is a mouse with an operating system. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-atm" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200101191712.LAA08022>
