From owner-freebsd-hackers  Fri Mar 19 17:26:31 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70])
	by hub.freebsd.org (Postfix) with SMTP id CD24F1509B
	for <hackers@freebsd.org>; Fri, 19 Mar 1999 17:26:25 -0800 (PST)
	(envelope-from wpaul@skynet.ctr.columbia.edu)
Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id UAA04463; Fri, 19 Mar 1999 20:32:45 -0500
From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Message-Id: <199903200132.UAA04463@skynet.ctr.columbia.edu>
Subject: Re: Gigabit ethernet revisited
To: dillon@apollo.backplane.com (Matthew Dillon)
Date: Fri, 19 Mar 1999 20:32:44 -0500 (EST)
Cc: hackers@freebsd.org
In-Reply-To: <199903191739.JAA59302@apollo.backplane.com> from "Matthew Dillon" at Mar 19, 99 09:39:14 am
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Content-Length: 4270      
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Of all the gin joints in all the towns in all the world, Matthew Dillon 
had to walk into mine and say:
 
>     Hmm.  I definitely think there's a bug due to not repopulating buffers
>     from inside your receive interrupt packet processing loop, but it may
>     not be the cause of this bug. 
> 
>     If the NIC is unable to complete DMA quickly enough, perhaps the burst
>     parameters can be tweaked.  The PC certainly should be able to do DMA
>     writes to memory at the PCI bus speed!  
> 
>     Ok, lets see... what about these:
> 
> 	Read Max DMA parameter 
> 	Write Max DMA parameter 
> 	Minimum DMA parameter
> 	PCI Latency Timer / system PCI latency timer 
> 
>     At a guess, I think you would want:
> 
> 	Read Max DMA parameter 		16
> 	Write Max DMA parameter 	128
> 	Minimum DMA parameter		128
> 	PCI Latency Timer		relatively high for this board
> 					( if you have a per-board choice,
> 					you may not )

Well, as it turns out, at least in this particular instance, the
problem was solved by setting the write max value to zero, which is
unlimited. My machine has an 82443BX chipset, which is supposed to
allow an unlimited burst size. The manual also says that it uses
zero for the read and write max values in the sample driver because
this works best on most systems.

So now, I'm not having problems with a PCI bottleneck anymore. When
I to a 'ttcp -s -n10000 -u -t <otherhost>' the receiving host receives
all the packets (as evidenced by netstat -in) and there are no more
nicDmaWriteRingFull errors.

However, the ttcp -r -s -u that's waiting on the receive side still
does not get all the data. This is what was confusing me: I wasn't
checking the udp stats with netstat -s and comparing them to netstat -in.
I was observing lost packets even when I had the PCI configuration
done right, but for a completely different reason. Netstat -s reports 
that all the UDP datagrams are received (they get as far as udp_input()) 
but a large percentage of them are dropped 'due to full socket buffers' 
(i.e. sbappendaddr() doesn't return 0).

(Now I know what's going to happen here. Somebody's going to glibly
suggest increasing the socket buffer size. I tried that, by increasing
the value of kern.ipc.maxsockbuf. It didn't make any difference. If
somebody wants to suggest something along these lines, don't do it in
a vague and hand-waving fashion. Before you make the suggestion, try
to do it yourself. Make a note of exactly what you do. See if it actually
has a positive effect on the problem. _Then_ explain _exactly_ what you 
did so that I can duplicate it.)

The receiving host is under heavy interrupt load. Andrew Gallatin has
said to me that this is a classic case of livelock, where the system
is so busy processing interrupts that nothing else is getting done.
In this case, the NIC is dutifully DMAing all the packets to the host
and the driver is queing them all to ether_input(), but this is happening 
so often that the other parts of the kernel aren't getting a chance to 
process the packets in time, so the queues fill up and packets get dropped.

This is another problem that I'm not sure how to solve. It sounds to
me like dealing with this involves changing things outside of the driver
itself. In order to handle everything within the driver, it would have
to be able to tell when the various input queues are full, which I
think would require lots of protocol specific that should be somewhere
else. I have tried adjusting the 'rx_coal_ticks' value a bit and it
does help to increase it, however it can't cure the problem completely.
Increasing rx_coal_ticks too much (in conjunction with rx_max_coal_bds)
can introduce a lot of latency, which creates other problems.

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message