Date: Mon, 22 Nov 2004 12:14:06 -0800 From: Sean McNeil <sean@mcneil.com> To: Robert Watson <rwatson@freebsd.org> Cc: Jeremie Le Hen <jeremie@le-hen.org> Subject: Re: Re[4]: serious networking (em) performance (ggate and NFS) problem Message-ID: <1101154446.79991.13.camel@server.mcneil.com> In-Reply-To: <Pine.NEB.3.96L.1041122112718.19086S-100000@fledge.watson.org> References: <Pine.NEB.3.96L.1041122112718.19086S-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-5JFyiJ/y9VFBOBUn+7tM Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, 2004-11-22 at 11:34 +0000, Robert Watson wrote: > On Sun, 21 Nov 2004, Sean McNeil wrote: >=20 > > I have to disagree. Packet loss is likely according to some of my > > tests. With the re driver, no change except placing a 100BT setup with > > no packet loss to a gigE setup (both linksys switches) will cause > > serious packet loss at 20Mbps data rates. I have discovered the only > > way to get good performance with no packet loss was to > >=20 > > 1) Remove interrupt moderation > > 2) defrag each mbuf that comes in to the driver. >=20 > Sounds like you're bumping into a queue limit that is made worse by > interrupting less frequently, resulting in bursts of packets that are > relatively large, rather than a trickle of packets at a higher rate. > Perhaps a limit on the number of outstanding descriptors in the driver or > hardware and/or a limit in the netisr/ifqueue queue depth. You might try > changing the default IFQ_MAXLEN from 50 to 128 to increase the size of th= e > ifnet and netisr queues. You could also try setting net.isr.enable=3D1 t= o > enable direct dispatch, which in the in-bound direction would reduce the > number of context switches and queueing. It sounds like the device drive= r > has a limit of 256 receive and transmit descriptors, which one supposes i= s > probably derived from the hardware limit, but I have no documentation on > hand so can't confirm that. I've tried bumping IFQ_MAXLEN and it made no difference. I could rerun this test to be 100% certain I suppose. It was done a while back. I haven't tried net.isr.enable=3D1, but packet loss is in the transmission direction. The device driver has been modified to have 1024 transmit and receive descriptors each as that is the hardware limitation. That didn't matter either. With 1024 descriptors I still lost packets without the m_defrag. The most difficult thing for me to understand is: if this is some sort of resource limitation why will it work with a slower phy layer perfectly and not with the gigE? The only thing I could think of was that the old driver was doing m_defrag calls when it filled the transmit descriptor queues up to a certain point. Understanding the effects of m_defrag would be helpful in figuring this out I suppose. > It would be interesting on the send and receive sides to inspect the > counters for drops at various points in the network stack; i.e., are we > dropping packets at the ifq handoff because we're overfilling the > descriptors in the driver, are packets dropped on the inbound path going > into the netisr due to over-filling before the netisr is scheduled, etc.=20 > And, it's probably interesting to look at stats on filling the socket > buffers for the same reason: if bursts of packets come up the stack, the > socket buffers could well be being over-filled before the user thread can > run. Yes, this would be very interesting and should point out the problem. I would do such a thing if I had enough knowledge of the network pathways. Alas, I am very green in this area. The receive side has no issues, though, so I would focus on transmit counters (with assistance). --=-5JFyiJ/y9VFBOBUn+7tM Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQBBokiOyQsGN30uGE4RAtJzAJ9rx309a8+iQkElTKKX/GsS+26kuACg6plM uFItZxBt9UmaClVTPWVq89U= =yhZ0 -----END PGP SIGNATURE----- --=-5JFyiJ/y9VFBOBUn+7tM--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1101154446.79991.13.camel>