Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Nov 2007 23:23:31 -0800
From:      "Matt Reimer" <mattjreimer@gmail.com>
To:        "Mike Silbersack" <silby@silby.com>
Cc:        net@freebsd.org
Subject:   Re: Should syncache.count ever be negative?
Message-ID:  <f383264b0711092323p5148300fu3c0883135f8fb01b@mail.gmail.com>
In-Reply-To: <20071109213846.O46803@odysseus.silby.com>
References:  <f383264b0711091609n81875b6v444055960ab0fd96@mail.gmail.com> <20071109213846.O46803@odysseus.silby.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 9, 2007 7:46 PM, Mike Silbersack <silby@silby.com> wrote:
>
> On Fri, 9 Nov 2007, Matt Reimer wrote:
>
> > On a eight core machine running RELENG_7 I'm seeing TCP stalls,
> > sometimes lasting up to 60 seconds or so. While trying to track this
> > down I noticed that net.inet.tcp.syncache.count is negative. Should it
> > be possible for the count to go negative? Perhaps it indicates a race,
> > or the counter is wrongly being decremented twice?
>
> I just took a look at the code, and you are correct that the count is not
> locked; it looks like you're hitting the race.  However, it doesn't look
> like anything is checking the count, so that should not be the cause of
> your TCP stalls.

Yeah, that's the conclusion I came to also.

> Can you install netperf and run both the TCP_STREAM and UDP_STREAM tests
> just to make sure that your network card is working properly?  We've
> recently found that the fast interrupt handlers we use in some network
> drivers act strangely when sharing interrupts.  So, that's a first thing
> to test before we poke at the upper layers.

Ok, I've run netperf in both directions. The box I've been targeting
is 66.230.193.105 aka wordpress1.

[root@wordpress2 ~]# netperf -p 5000 -t TCP_STREAM -H 66.230.193.105
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
66.230.193.105 (66.230.193.105) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 65536  32768  32768    10.03      93.00

[root@wordpress2 ~]# netperf -p 5000 -t UDP_STREAM -H 66.230.193.105
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
66.230.193.105 (66.230.193.105) port 0 AF_INET
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

  9216    9216   10.01       13145 2588457      96.81
 41600           10.01       13145             96.81

[root@wordpress1 ~]# netperf -H 66.230.193.106 -p 5000 -t TCP_STREAM
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
66.230.193.106 (66.230.193.106) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 65536  32768  32768    10.02      94.03

[root@wordpress1 ~]# netperf -H 66.230.193.106 -p 5000 -t UDP_STREAM
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
66.230.193.106 (66.230.193.106) port 0 AF_INET
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

  9216    9216   10.00       13151 2532692      96.95
 41600           10.00       13135             96.83

> If that doesn't help, can you post more details about how you are
> stressing the system?

Yes, I'll get you whatever information you want.

The machine is a Dell 1950 with 8 x 1.6GHz Xeon 5310s, 8G RAM, and this NIC:

bce0@pci0:9:0:0:        class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12
hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = '5708C Broadcom NetXtreme II Gigabit Ethernet Adapter'
    class      = network
    subclass   = ethernet

[root@wordpress1 ~]# ifconfig bce0
bce0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>
        ether 00:19:b9:ea:99:a4
        inet 66.230.193.105 netmask 0xffffff00 broadcast 66.230.193.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active

I first noticed this problem running ab; then to simplify I used
netrate/http[d]. What's strange is that it seems fine over the local
network (~15800 requests/sec), but it slowed down dramatically (~150
req/sec) when tested from another network 20 ms away. Running systat
-tcp and nload I saw that there was an almost complete stall with only
a handful of packets being sent (probably my ssh packets) for a few
seconds or sometimes even up to 60 seconds or so.

Next I turned on net.inet.tcp.log_debug and ran tcpdump while running
ab against netrate/http. Then I looked at the tcpdump capture and
looked for a gap in the timestamps when the traffic seemed stalled. I
saw that other packets were coming in and going out (e.g. multicast
and ARP traffic) but little TCP traffic. From there I looked backwards
for the TCP traffic leading up to the stall and saw a strange exchange
on the packets with source port 64851 (and others, but I'll stick with
this one for illustration):

        SYN
        SYN-ACK
        ACK
        RST
        RST

Looking at the logs I saw:

Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80; syncache_socket: Socket create failed due to
limits or memory shortage
Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80 tcpflags 0x10<ACK>; tcp_input: Listen socket:
Socket allocation failed due to limits or memory shortage, sending RST
Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80; syncache_socket: Socket create failed due to
limits or memory shortage
Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80 tcpflags 0x18<PUSH,ACK>; tcp_input: Listen socket:
Socket allocation failed due to limits or memory shortage, sending RST

But it doesn't seem to be a memory shortage; vmstat -z during one of
those stalls shows:

socket:                   696,    12330,       14,     3861, 14786330,        0
unpcb:                    248,    12330,        6,     1149,  6052662,        0
ipq:                       56,      819,        0,        0,        0,        0
udpcb:                    280,    12334,        2,      222,     1531,        0
inpcb:                    280,    12334,     2271,     2671,  8732131,        0
tcpcb:                    688,    12330,        6,     2304,  8732131,        0
tcptw:                     88,     2478,     2265,      213,   201553,  6438776
syncache:                 112,    15378,        2,      823,  7197735,        0
hostcache:                136,    15372,        0,      112,       19,        0
tcpreass:                  40,     1680,        0,      672,       50,        0

Corresponding to these error messages there are two RST packets around
14.528 seconds into the tcpdump. Looking at the code it seems that
what must be happening is that syncache_expand() fails somehow,
leaving 'so' NULL. Wireshark shows that:

        SYN
        SYN-ACK
        ACK
        kernel tries to set up the socket; syncache_expand() fails, so
it sends...
        RST
        but the sending host already saw a SYN-ACK so it sends...
        "GET /short.html..."
        kernel replies with another RST since the socket allocation failed

So I looked at the logs again and saw that the same thing happens
again, 97 seconds later, and then again 103 seconds after that. The
sequence of SYNs etc. is identical:

Nov  9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80; syncache_socket: Socket create failed due to
limits or memory shortage
Nov  9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80 tcpflags 0x10<ACK>; tcp_input: Listen socket:
Socket allocation failed due to limits or memory shortage, sending RST
Nov  9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80; syncache_socket: Socket create failed due to
limits or memory shortage
Nov  9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80 tcpflags 0x18<PUSH,ACK>; tcp_input: Listen socket:
Socket allocation failed due to limits or memory shortage, sending RST

And:

Nov  9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80; syncache_socket: Socket create failed due to
limits or memory shortage
Nov  9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80 tcpflags 0x10<ACK>; tcp_input: Listen socket:
Socket allocation failed due to limits or memory shortage, sending RST
Nov  9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80; syncache_socket: Socket create failed due to
limits or memory shortage
Nov  9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
[66.230.193.105]:80 tcpflags 0x18<PUSH,ACK>; tcp_input: Listen socket:
Socket allocation failed due to limits or memory shortage, sending RST

It seems very fishy that a socket allocation for this port 64851
should fail three times in less than four minutes, when other
allocations succeed. Why are these socket creations failing, and
consistently failing for the same port numbers?

I don't know if it's relevant, but accf_http is loaded on wordpress1.

We have seen similar behavior (TCP slowdowns) on a different machines
(4 x Xeon 5160) with a different NIC (em0) running RELENG_7, though I
haven't diagnosed it to this level of detail. All our RELENG_6 and
RELENG_4 machines seem fine.

An excerpt from the full tcpdump is available at
http://bilbo.vpop.net/~mreimer/tcpdump.bad; I'll make the full 14M
dump available upon request. I'll run whatever tests are needed, add
debugging printf's, etc.

Thanks for your help.

Matt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f383264b0711092323p5148300fu3c0883135f8fb01b>