Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Apr 2003 15:07:59 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        bj@dc.luth.se
Cc:        David Gilbert <dgilbert@velocet.ca>
Subject:   Re: tcp_output starving -- is due to mbuf get delay?
Message-ID:  <3E973CBF.FB552960@mindspring.com>
References:  <200304111709.h3BH9LKl087299@dc.luth.se>

next in thread | previous in thread | raw e-mail | index | archive | help
Borje Josefsson wrote:
> On Fri, 11 Apr 2003 09:32:51 PDT Terry Lambert wrote:
> 
> > See other posting; I listed a bunch of OIDs to play with.
> >
> > One other, if you are running -CURRENT, would be:
> >
> >       net.isr.netisr_enable           -> 1
> 
> I'm running 4.8RC on the sender right now.

You might want to try 4.3 or 4.4, as well.

The netisr_enable=1 is a -CURRENT only feature.  It'd be worthwhile
to test, since it deals with protocol processing latency; if you
get receiver livelocked, then it will fix *some* instances of that.


> I did a quick test with some combination of the OID:s You sent, except I
> didn't reboot between each test:

The reboots was intended to keep the statistics counters relatively
accurate between the FreeBSD and NteBSD sender runs.  By doing that,
you can tell if what's happening on the receiver is the same for
both sender machines.  If you don't reboot, then the statistics are
polluted with other traffic, and can't be compared.

You should also start clean on each sender, and get the same stats
on the sender.  I would add "vmstat -i", to look at interrupt overhead.

Note that FreeBSD jumbograms are in external mbufs allocated to the
cards on receive.  On transmit, they are scatter/gathered.  NetBSD
might not have this overhead.  The copy overhead there could account
for a lot of CPU time.

> Netstat -m (tcp and ip portion) when I started and after the trials:

Side-by-side/interleaved is more useful.  I will do it manually for
the ones that change; if we continue this discussion, you get to do
the work in the future (b=before, A=after) -- notice how much more
useful this is, and how much more useful it would be, if the "before"
values were all "0" because you had rebooted...:


b> tcp:
b>         12103350 packets sent
A>         13331442 packets sent
	    1228092

b>                 2692690 data packets (1962127658 bytes)
A>                 3920701 data packets (2694929130 bytes)
		   1228011		  732801472

b>                 10203 data packets (14829100 bytes) retransmitted
A>                 10203 data packets (14829100 bytes) retransmitted

No retransmits; this is good.  Be nicer to see on the transmitters,
though...

b>                 0 resends initiated by MTU discovery
b>                 6446084 ack-only packets (199 delayed)
A>                 6446155 ack-only packets (207 delayed)
		        71                     8

This is odd.  You must be sending data in both directions.  Thus
the lower bandwidth could be the result of negotiated options; you
may want to try turning _on_ rfc1644.

The delayed ACKs are bad.  Can you either set "PUSH" on the socket,
or turn off delayed ACK entirely?

b>                 2955524 window update packets
A>                 2955524 window update packets

This is strange.  I would expect at least 1 update packet.

b>                 216 control packets
A>                 226 control packets
                    10

Be nice to know what these are, and whether NetBSD and FreeBSD end
up with the same number.

b>         16306846 packets received
A>         17131926 packets received
	     825080

...403012 more sent than received.

b>                 847040 acks (for 1962119767 bytes)
A>                 1526734 acks (for 2694921244 bytes)
		    679694	      732801477

...5 more bytes sent than received.  Seems odd, as well.

b>                 14399 duplicate acks
A>                 14404 duplicate acks
		       5

...Until we see this.  The duplicate ACK's indicate either a
timeout, or an unexpected retransmission.  In either case, this is
a potential cause of a pipeline stall.

b>                 0 acks for unsent data
b>                 15281399 packets (2057886406 bytes) received in-sequence
A>                 15281806 packets (2057905910 bytes) received in-sequence
                        407               20504

...ie: most of the data was received out of sequence.  This may indicate
that most of the time was being spent in stream reassembly.

b>                 4425 completely duplicate packets (4551322 bytes)
A>                 4425 completely duplicate packets (4551322 bytes)

...gotta wonder how this is, with 5 duplicate ACKs...

b>                 0 old duplicate packets
b>                 65 packets with some dup. data (14957 bytes duped)
A>                 65 packets with some dup. data (14957 bytes duped)
b>                 38818 out-of-order packets (59514384 bytes)
A>                 38818 out-of-order packets (59514384 bytes)

This doesn't jive with the in-sequence numbers, above.

b>                 0 packets (0 bytes) of data after window
b>                 0 window probes
b>                 120161 window update packets
A>                 265251 window update packets
		   145090

...That's a lot of window updates.  Notice that there are never any
transmit window updates, only receive window updates.  This is odd.

b>                 3 packets received after close
A>                 3 packets received after close
b>                 4 discarded for bad checksums
A>                 4 discarded for bad checksums
b>                 0 discarded for bad header offset fields
b>                 0 discarded because packet too short
b>         160 connection requests
A>         165 connection requests
	     5

You should account for these.

b>         49 connection accepts
A>         49 connection accepts
b>         0 bad connection attempts
b>         0 listen queue overflows
b>         63 connections established (including accepts)
A>         68 connections established (including accepts)
            5

""
b>         383 connections closed (including 5 drops)
A>         388 connections closed (including 5 drops)
             5
""

b>                 10 connections updated cached RTT on close
A>                 15 connections updated cached RTT on close
                    5
b>                 10 connections updated cached RTT variance on close
A>                 15 connections updated cached RTT variance on close
                    5
b>                 2 connections updated cached ssthresh on close
A>                 2 connections updated cached ssthresh on close
b>         146 embryonic connections dropped
A>         146 embryonic connections dropped
b>         846952 segments updated rtt (of 521492 attempts)
A>         1526646 segments updated rtt (of 793534 attempts)
            679694                          272042


b>         36 retransmit timeouts
A>         36 retransmit timeouts
b>                 2 connections dropped by rexmit timeout
A>                 2 connections dropped by rexmit timeout
b>         0 persist timeouts
b>                 0 connections dropped by persist timeout
b>         438 keepalive timeouts
A>         438 keepalive timeouts
b>                 438 keepalive probes sent
A>                 438 keepalive probes sent
b>                 0 connections dropped by keepalive
b>         26449 correct ACK header predictions
A>         79056 correct ACK header predictions
	   52607

Be nice to know why so many were incorrect, but it's not important
for what you are seeing...

b>         15280149 correct data packet header predictions
A>         15280431 correct data packet header predictions
                282
""

b>         49 syncache entries added
A>         49 syncache entries added
b>                 0 retransmitted
b>                 0 dupsyn
b>                 0 dropped
b>                 49 completed
A>                 49 completed
b>                 0 bucket overflow
b>                 0 cache overflow
b>                 0 reset
b>                 0 stale
b>                 0 aborted
b>                 0 badack
b>                 0 unreach
b>                 0 zone failures
b>         0 cookies sent
b>         0 cookies received
b> 
b> ip:
b>         16393480 total packets received
A>         17218567 total packets received
	     825087

Good cross-check on the TCP numbers.  Notice this number: 7 larger.

b>         0 bad header checksums
b>         0 with size smaller than minimum
b>         0 with data size < data length
b>         0 with ip length > max ip packet size
b>         0 with header length < data size
b>         0 with data length < header length
b>         0 with bad options
b>         0 with incorrect version number
b>         1 fragment received
A>         1 fragment received
b>         0 fragments dropped (dup or out of space)
b>         1 fragment dropped after timeout
A>         1 fragment dropped after timeout
b>         0 packets reassembled ok
b>         16393029 packets for this host
A>         17218116 packets for this host
             825087
""

b>         450 packets for unknown/unsupported protocol
A>         450 packets for unknown/unsupported protocol
b>         0 packets forwarded (0 packets fast forwarded)
b>         0 packets not forwardable
b>         0 packets received for unknown multicast group
b>         0 redirects sent
b>         12201936 packets sent from this host
A>         13430039 packets sent from this host
            1228103

92 larger, this time.

b>         115 packets sent with fabricated ip header
A>         115 packets sent with fabricated ip header
b>         1367 output packets dropped due to no bufs, etc.
A>         1367 output packets dropped due to no bufs, etc.
b>         0 output packets discarded due to no route
b>         0 output datagrams fragmented
b>         0 fragments created
b>         0 datagrams that can't be fragmented
b>         0 tunneling packets that can't find gif
b>         0 datagrams with bad address in header


All in all, there's not a lot of weird stuff going on; now you need
to look at the NetBSD vs. the FreeBSD transmitters, in a similar
way, get the deltas for both, and then compare them to each other.

A really important thing to look at is the "vmstat -i" I asked for
earlier, in order to get interrupt counts on the transmitter.  Most
likely, there is a driver difference causing the "problem"; you
should be able to see this in a differential for the transmit
interrupt overhead being higher on the FreeBSD box.

It would also be very interesting to compare the netstat numbsrs
between the transmitters, as suggested above; the numbers should
tell you about differences in implemntation on the driver side.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E973CBF.FB552960>