Date: Fri, 11 Apr 2003 15:07:59 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: bj@dc.luth.se Cc: David Gilbert <dgilbert@velocet.ca> Subject: Re: tcp_output starving -- is due to mbuf get delay? Message-ID: <3E973CBF.FB552960@mindspring.com> References: <200304111709.h3BH9LKl087299@dc.luth.se>
next in thread | previous in thread | raw e-mail | index | archive | help
Borje Josefsson wrote: > On Fri, 11 Apr 2003 09:32:51 PDT Terry Lambert wrote: > > > See other posting; I listed a bunch of OIDs to play with. > > > > One other, if you are running -CURRENT, would be: > > > > net.isr.netisr_enable -> 1 > > I'm running 4.8RC on the sender right now. You might want to try 4.3 or 4.4, as well. The netisr_enable=1 is a -CURRENT only feature. It'd be worthwhile to test, since it deals with protocol processing latency; if you get receiver livelocked, then it will fix *some* instances of that. > I did a quick test with some combination of the OID:s You sent, except I > didn't reboot between each test: The reboots was intended to keep the statistics counters relatively accurate between the FreeBSD and NteBSD sender runs. By doing that, you can tell if what's happening on the receiver is the same for both sender machines. If you don't reboot, then the statistics are polluted with other traffic, and can't be compared. You should also start clean on each sender, and get the same stats on the sender. I would add "vmstat -i", to look at interrupt overhead. Note that FreeBSD jumbograms are in external mbufs allocated to the cards on receive. On transmit, they are scatter/gathered. NetBSD might not have this overhead. The copy overhead there could account for a lot of CPU time. > Netstat -m (tcp and ip portion) when I started and after the trials: Side-by-side/interleaved is more useful. I will do it manually for the ones that change; if we continue this discussion, you get to do the work in the future (b=before, A=after) -- notice how much more useful this is, and how much more useful it would be, if the "before" values were all "0" because you had rebooted...: b> tcp: b> 12103350 packets sent A> 13331442 packets sent 1228092 b> 2692690 data packets (1962127658 bytes) A> 3920701 data packets (2694929130 bytes) 1228011 732801472 b> 10203 data packets (14829100 bytes) retransmitted A> 10203 data packets (14829100 bytes) retransmitted No retransmits; this is good. Be nicer to see on the transmitters, though... b> 0 resends initiated by MTU discovery b> 6446084 ack-only packets (199 delayed) A> 6446155 ack-only packets (207 delayed) 71 8 This is odd. You must be sending data in both directions. Thus the lower bandwidth could be the result of negotiated options; you may want to try turning _on_ rfc1644. The delayed ACKs are bad. Can you either set "PUSH" on the socket, or turn off delayed ACK entirely? b> 2955524 window update packets A> 2955524 window update packets This is strange. I would expect at least 1 update packet. b> 216 control packets A> 226 control packets 10 Be nice to know what these are, and whether NetBSD and FreeBSD end up with the same number. b> 16306846 packets received A> 17131926 packets received 825080 ...403012 more sent than received. b> 847040 acks (for 1962119767 bytes) A> 1526734 acks (for 2694921244 bytes) 679694 732801477 ...5 more bytes sent than received. Seems odd, as well. b> 14399 duplicate acks A> 14404 duplicate acks 5 ...Until we see this. The duplicate ACK's indicate either a timeout, or an unexpected retransmission. In either case, this is a potential cause of a pipeline stall. b> 0 acks for unsent data b> 15281399 packets (2057886406 bytes) received in-sequence A> 15281806 packets (2057905910 bytes) received in-sequence 407 20504 ...ie: most of the data was received out of sequence. This may indicate that most of the time was being spent in stream reassembly. b> 4425 completely duplicate packets (4551322 bytes) A> 4425 completely duplicate packets (4551322 bytes) ...gotta wonder how this is, with 5 duplicate ACKs... b> 0 old duplicate packets b> 65 packets with some dup. data (14957 bytes duped) A> 65 packets with some dup. data (14957 bytes duped) b> 38818 out-of-order packets (59514384 bytes) A> 38818 out-of-order packets (59514384 bytes) This doesn't jive with the in-sequence numbers, above. b> 0 packets (0 bytes) of data after window b> 0 window probes b> 120161 window update packets A> 265251 window update packets 145090 ...That's a lot of window updates. Notice that there are never any transmit window updates, only receive window updates. This is odd. b> 3 packets received after close A> 3 packets received after close b> 4 discarded for bad checksums A> 4 discarded for bad checksums b> 0 discarded for bad header offset fields b> 0 discarded because packet too short b> 160 connection requests A> 165 connection requests 5 You should account for these. b> 49 connection accepts A> 49 connection accepts b> 0 bad connection attempts b> 0 listen queue overflows b> 63 connections established (including accepts) A> 68 connections established (including accepts) 5 "" b> 383 connections closed (including 5 drops) A> 388 connections closed (including 5 drops) 5 "" b> 10 connections updated cached RTT on close A> 15 connections updated cached RTT on close 5 b> 10 connections updated cached RTT variance on close A> 15 connections updated cached RTT variance on close 5 b> 2 connections updated cached ssthresh on close A> 2 connections updated cached ssthresh on close b> 146 embryonic connections dropped A> 146 embryonic connections dropped b> 846952 segments updated rtt (of 521492 attempts) A> 1526646 segments updated rtt (of 793534 attempts) 679694 272042 b> 36 retransmit timeouts A> 36 retransmit timeouts b> 2 connections dropped by rexmit timeout A> 2 connections dropped by rexmit timeout b> 0 persist timeouts b> 0 connections dropped by persist timeout b> 438 keepalive timeouts A> 438 keepalive timeouts b> 438 keepalive probes sent A> 438 keepalive probes sent b> 0 connections dropped by keepalive b> 26449 correct ACK header predictions A> 79056 correct ACK header predictions 52607 Be nice to know why so many were incorrect, but it's not important for what you are seeing... b> 15280149 correct data packet header predictions A> 15280431 correct data packet header predictions 282 "" b> 49 syncache entries added A> 49 syncache entries added b> 0 retransmitted b> 0 dupsyn b> 0 dropped b> 49 completed A> 49 completed b> 0 bucket overflow b> 0 cache overflow b> 0 reset b> 0 stale b> 0 aborted b> 0 badack b> 0 unreach b> 0 zone failures b> 0 cookies sent b> 0 cookies received b> b> ip: b> 16393480 total packets received A> 17218567 total packets received 825087 Good cross-check on the TCP numbers. Notice this number: 7 larger. b> 0 bad header checksums b> 0 with size smaller than minimum b> 0 with data size < data length b> 0 with ip length > max ip packet size b> 0 with header length < data size b> 0 with data length < header length b> 0 with bad options b> 0 with incorrect version number b> 1 fragment received A> 1 fragment received b> 0 fragments dropped (dup or out of space) b> 1 fragment dropped after timeout A> 1 fragment dropped after timeout b> 0 packets reassembled ok b> 16393029 packets for this host A> 17218116 packets for this host 825087 "" b> 450 packets for unknown/unsupported protocol A> 450 packets for unknown/unsupported protocol b> 0 packets forwarded (0 packets fast forwarded) b> 0 packets not forwardable b> 0 packets received for unknown multicast group b> 0 redirects sent b> 12201936 packets sent from this host A> 13430039 packets sent from this host 1228103 92 larger, this time. b> 115 packets sent with fabricated ip header A> 115 packets sent with fabricated ip header b> 1367 output packets dropped due to no bufs, etc. A> 1367 output packets dropped due to no bufs, etc. b> 0 output packets discarded due to no route b> 0 output datagrams fragmented b> 0 fragments created b> 0 datagrams that can't be fragmented b> 0 tunneling packets that can't find gif b> 0 datagrams with bad address in header All in all, there's not a lot of weird stuff going on; now you need to look at the NetBSD vs. the FreeBSD transmitters, in a similar way, get the deltas for both, and then compare them to each other. A really important thing to look at is the "vmstat -i" I asked for earlier, in order to get interrupt counts on the transmitter. Most likely, there is a driver difference causing the "problem"; you should be able to see this in a differential for the transmit interrupt overhead being higher on the FreeBSD box. It would also be very interesting to compare the netstat numbsrs between the transmitters, as suggested above; the numbers should tell you about differences in implemntation on the driver side. -- Terry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E973CBF.FB552960>