Date: Sun, 21 Mar 1999 23:40:20 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: tlambert@primenet.com, hasty@rah.star-gate.com, wes@softweyr.com, ckempf@enigami.com, wpaul@skynet.ctr.columbia.edu, freebsd-hackers@FreeBSD.ORG Subject: Re: Gigabit ethernet -- what am I doing wrong? Message-ID: <199903220740.XAA16463@apollo.backplane.com> References: <199903220545.WAA10719@usr01.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:What we're talking about here is overloading the equipment, and then
:having it fail in such a way that everyone who is loading it takes
:the hit "fairly".
:
:...
:
:> head-of-queue blocking, a combination of software on the gigaswitch
:> and the way the gigabit switch's hardware queues packets for transfer
:> between cards.
:
:Sounds like they failed to implement QOS mechanisms and source quench
:properly. My general response to technology failures is that there is
:a responsible human, somewhere. I know that they had two gigaswitches
:at one point in time, and it's obvious from a technical point of view
:that two gigaswitches are worse than one gigaswitch.
The MAE-WEST gigaswitch failure has nothing to do with QOS or
source quench. There is no such thing as source-quench on a
FDDI/T3 switch. Nobody in their right mind uses source quench
in a router or switch matrix.
:> The problem at MAE-WEST had absolutely nothing to do with this. The
:> problem occured *inside* a *single* switch. If one port overloaded on
:> that switch, all the ports started having problems due to head-of-line
:> blocking.
:
:Look. You can only shove as many bits down a pipe as the pipe will
:take. If it's one port that's killing you, then you start dropping
:packets to and from that port, and punish the port.
:
:While there were humans engaged in overcommit involved, I really have
:a hard time understanding a design that would allow humans doing what
:humans would obvious do, given the circumstances, to cause problems.
:
:If the thing can't handle N/2 ports running at some speed X on each
:port, then the ports shouldn't be run at speed X.
The overcommit problem is not trivially solved when the blockage runs
at the hardware level, because problems can occur without the router
actually overcommitting the destination card's buffer.
The scheduling problem relates heavily to avoiding DMA blockages
in switch matrixes. DMA blockages occur even with full cross bars
and it is not a problem that can be solved by bumping up switch
matrix performance. The problem occurs when several source cards
attempt to DMA packets to the same destination card. Even if the
destination card has sufficient buffer memory to hold the packets,
and even if no overcommit ( time wise ) would occur, most cards cannot
typically handle the bandwidth of several ( > 1 ) other cards sending
to it simultaniously. In fact, most switch matrices can only route one
source to any given destination at a time -- the parallelism occurs
because the switch matrix can route several sources to several different
destinations simultaniously, not route several sources to the same
destination simultaniously.
This creates a situation where a source card can block in DMA. It is
statistically possible for packets to be arranged such that switch
performance is seriously degraded, increasing latency significantly
( even beyond what might be considered acceptable ) *EVEN* when buffer
space is available.
Switch scheduling is required to avoid this problem -- to prevent multiple
sources from trying to DMA packets to the same destination at the same
time in the first place and instead using those time slices to DMA
packets going to other destinations.
Buffer management requires scheduling too. Not only that, it requires
dynamic queue sizing on the source card, because it is on the source
card where dropping a packet ( prior to the packet traversing the switch
matrix and being enqueued in the destination buffer ) yields the best
fairness when a destination is overcommitted. Unfortunately, the best
place to drop a packet when the source and destination speeds are
mismatched is on the destination buffer queue.
The scheduler must deal with these two clashing problems as well in order
to both speed-match the ports AND to properly drop packets from the correct
source(s) when the destination buffer overcommitted. The scheduler in
a real router/switch must also deal with hardware DMA conflicts
( blockages ), and stabilizing buffer latency under a wide range of
load conditions and port<->port combinations.
:> The solution to this at MAE-WEST was to clamp down on the idiots who
:> were selling transit at MAE-WEST and overcommitting their ports, plus
:
:With respect, technology should operate in the absence of human
:imposition of policy. It should have been technically impossible
:for the idiots to successfully engage in the behaviour in the first
:place, and if it wasn't, then that's a design problem with the
:gigaswitches.
With respect, you are assuming that the problems can be solved trivially.
These are *NOT* trivial problems. Very not trivial problems. Not even
*CLOSE* to trivial problems. I can't repeat this enough times.
: Terry Lambert
: terry@lambert.org
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903220740.XAA16463>
