Date: Tue, 26 Mar 2002 22:39:48 -0800 From: Luigi Rizzo <rizzo@icir.org> To: Lars Eggert <larse@ISI.EDU> Cc: Matthew Luckie <mjl@nlanr.net>, freebsd-net@FreeBSD.ORG Subject: Re: ip_output and ENOBUFS Message-ID: <20020326223947.B16450@iguana.icir.org> In-Reply-To: <3CA0AB3D.5000300@isi.edu> References: <Pine.BSF.4.21.0203260819020.91970-100000@mave.nlanr.net> <3CA0AB3D.5000300@isi.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
the ENOBUFS is very typical with UDP applications that try to send as fast as possible (e.g. the various network test utilities in ports), and as i said in a previous message, putting up a mechanism to pass around queue full/queue not full events is expensive because it might trigger on every single packet, and possibly have to wakeup multiple processes each time (with only one being able to succeed). The tcp handling of ENOBUFS is much cheaper. TCP is not waken up by the device, but from acks coming from the other side, or from timeouts. So there is not per-packet overhead just to implement this mechanism. As a matter of fact, i even implemented a similar thing in dummynet, and if device drivers call if_tx_rdy() when they complete a transmission, then the tx interrupt can be used to clock packets out of the dummynet pipes. A patch for if_tun.c is below, and if_tx_rdy() is in netinet/ip_dummynet.c. You could replace the call to if_tx_rdy with a wakeup() using some appropriate argument to wake up threads waiting for devices to become ready. cheers luigi > lcvs diff -u if_tun.c Index: if_tun.c =================================================================== RCS file: /home/ncvs/src/sys/net/if_tun.c,v retrieving revision 1.51.2.2 diff -u -r1.51.2.2 if_tun.c --- if_tun.c 28 Jul 1999 15:08:06 -0000 1.51.2.2 +++ if_tun.c 19 Jun 2000 12:07:17 -0000 @@ -19,6 +19,7 @@ #include "opt_devfs.h" #include "opt_inet.h" +#include "opt_ipdn.h" #include <sys/param.h> #include <sys/proc.h> @@ -162,6 +163,10 @@ ifp = &tp->tun_if; tp->tun_flags |= TUN_OPEN; TUNDEBUG("%s%d: open\n", ifp->if_name, ifp->if_unit); +#ifdef DUMMYNET + if (ifp->if_snd.ifq_len == 0) /* better be! */ + if_tx_rdy(ifp); +#endif return (0); } @@ -487,6 +492,10 @@ } } } while (m0 == 0); +#ifdef DUMMYNET + if (ifp->if_snd.ifq_len == 0) + if_tx_rdy(ifp); +#endif splx(s); while (m0 && uio->uio_resid > 0 && error == 0) { On Tue, Mar 26, 2002 at 09:09:17AM -0800, Lars Eggert wrote: > Matthew Luckie wrote: > > hmm, we looked at how other protocols handled the ENOBUFS case from > > ip_output. > > > > tcp_output calls tcp_quench on this error. > > > > while the interface may not be able to send any more packets than it > > does currently, closing the congestion window back to 1 segment > > seems a severe way to handle this error, knowing that the network > > did not drop the packet due to congestion. Ideally, there might be > > some form of blocking until such time as a mbuf comes available. > > This sounds as if it will be much easier come FreeBSD 5.0 > > TCP will almost never encouter this scenario, since it's self-clocking. > The NIC is very rarely the bottleneck resource for a given network > connection. Have you looked at mean queue lengths for NICs? They are > typically zero or one. The NIC will only be the bottleneck if you are > sending at a higher rate than line speed and your burt time is too long > to be absorbed by the queue. > > > I'm aware that if people are hitting this condition, they need to > > increase the number of mbufs to get maximum performance. > > No. ENOBUFS in ip_output almost always means that your NIC queue is > full, which isn't controlled through mbufs. You can make the queue > longer, but that won't help if you're sending too fast. > > > This section of code has previously been discussed here: > > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=119188+0+archive/2000/fr- > > eebsd-net/20000730.freebsd-net and has been in use for many years (a > > This is a slightly different problem than you describe. What Archie saw > was an ENOBUFS being handled like a loss inside the network, even though > the sender has information locally that can allow it to make smarter > retransmission decisions. > > Lars > -- > Lars Eggert <larse@isi.edu> Information Sciences Institute > http://www.isi.edu/larse/ University of Southern California To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020326223947.B16450>