Date: Fri, 9 Nov 2007 23:23:31 -0800 From: "Matt Reimer" <mattjreimer@gmail.com> To: "Mike Silbersack" <silby@silby.com> Cc: net@freebsd.org Subject: Re: Should syncache.count ever be negative? Message-ID: <f383264b0711092323p5148300fu3c0883135f8fb01b@mail.gmail.com> In-Reply-To: <20071109213846.O46803@odysseus.silby.com> References: <f383264b0711091609n81875b6v444055960ab0fd96@mail.gmail.com> <20071109213846.O46803@odysseus.silby.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 9, 2007 7:46 PM, Mike Silbersack <silby@silby.com> wrote: > > On Fri, 9 Nov 2007, Matt Reimer wrote: > > > On a eight core machine running RELENG_7 I'm seeing TCP stalls, > > sometimes lasting up to 60 seconds or so. While trying to track this > > down I noticed that net.inet.tcp.syncache.count is negative. Should it > > be possible for the count to go negative? Perhaps it indicates a race, > > or the counter is wrongly being decremented twice? > > I just took a look at the code, and you are correct that the count is not > locked; it looks like you're hitting the race. However, it doesn't look > like anything is checking the count, so that should not be the cause of > your TCP stalls. Yeah, that's the conclusion I came to also. > Can you install netperf and run both the TCP_STREAM and UDP_STREAM tests > just to make sure that your network card is working properly? We've > recently found that the fast interrupt handlers we use in some network > drivers act strangely when sharing interrupts. So, that's a first thing > to test before we poke at the upper layers. Ok, I've run netperf in both directions. The box I've been targeting is 66.230.193.105 aka wordpress1. [root@wordpress2 ~]# netperf -p 5000 -t TCP_STREAM -H 66.230.193.105 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 66.230.193.105 (66.230.193.105) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 32768 32768 10.03 93.00 [root@wordpress2 ~]# netperf -p 5000 -t UDP_STREAM -H 66.230.193.105 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 66.230.193.105 (66.230.193.105) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 9216 9216 10.01 13145 2588457 96.81 41600 10.01 13145 96.81 [root@wordpress1 ~]# netperf -H 66.230.193.106 -p 5000 -t TCP_STREAM TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 66.230.193.106 (66.230.193.106) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 32768 32768 10.02 94.03 [root@wordpress1 ~]# netperf -H 66.230.193.106 -p 5000 -t UDP_STREAM UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 66.230.193.106 (66.230.193.106) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 9216 9216 10.00 13151 2532692 96.95 41600 10.00 13135 96.83 > If that doesn't help, can you post more details about how you are > stressing the system? Yes, I'll get you whatever information you want. The machine is a Dell 1950 with 8 x 1.6GHz Xeon 5310s, 8G RAM, and this NIC: bce0@pci0:9:0:0: class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12 hdr=0x00 vendor = 'Broadcom Corporation' device = '5708C Broadcom NetXtreme II Gigabit Ethernet Adapter' class = network subclass = ethernet [root@wordpress1 ~]# ifconfig bce0 bce0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4> ether 00:19:b9:ea:99:a4 inet 66.230.193.105 netmask 0xffffff00 broadcast 66.230.193.255 media: Ethernet autoselect (100baseTX <full-duplex>) status: active I first noticed this problem running ab; then to simplify I used netrate/http[d]. What's strange is that it seems fine over the local network (~15800 requests/sec), but it slowed down dramatically (~150 req/sec) when tested from another network 20 ms away. Running systat -tcp and nload I saw that there was an almost complete stall with only a handful of packets being sent (probably my ssh packets) for a few seconds or sometimes even up to 60 seconds or so. Next I turned on net.inet.tcp.log_debug and ran tcpdump while running ab against netrate/http. Then I looked at the tcpdump capture and looked for a gap in the timestamps when the traffic seemed stalled. I saw that other packets were coming in and going out (e.g. multicast and ARP traffic) but little TCP traffic. From there I looked backwards for the TCP traffic leading up to the stall and saw a strange exchange on the packets with source port 64851 (and others, but I'll stick with this one for illustration): SYN SYN-ACK ACK RST RST Looking at the logs I saw: Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80; syncache_socket: Socket create failed due to limits or memory shortage Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80 tcpflags 0x10<ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80; syncache_socket: Socket create failed due to limits or memory shortage Nov 9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80 tcpflags 0x18<PUSH,ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST But it doesn't seem to be a memory shortage; vmstat -z during one of those stalls shows: socket: 696, 12330, 14, 3861, 14786330, 0 unpcb: 248, 12330, 6, 1149, 6052662, 0 ipq: 56, 819, 0, 0, 0, 0 udpcb: 280, 12334, 2, 222, 1531, 0 inpcb: 280, 12334, 2271, 2671, 8732131, 0 tcpcb: 688, 12330, 6, 2304, 8732131, 0 tcptw: 88, 2478, 2265, 213, 201553, 6438776 syncache: 112, 15378, 2, 823, 7197735, 0 hostcache: 136, 15372, 0, 112, 19, 0 tcpreass: 40, 1680, 0, 672, 50, 0 Corresponding to these error messages there are two RST packets around 14.528 seconds into the tcpdump. Looking at the code it seems that what must be happening is that syncache_expand() fails somehow, leaving 'so' NULL. Wireshark shows that: SYN SYN-ACK ACK kernel tries to set up the socket; syncache_expand() fails, so it sends... RST but the sending host already saw a SYN-ACK so it sends... "GET /short.html..." kernel replies with another RST since the socket allocation failed So I looked at the logs again and saw that the same thing happens again, 97 seconds later, and then again 103 seconds after that. The sequence of SYNs etc. is identical: Nov 9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80; syncache_socket: Socket create failed due to limits or memory shortage Nov 9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80 tcpflags 0x10<ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST Nov 9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80; syncache_socket: Socket create failed due to limits or memory shortage Nov 9 19:04:11 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80 tcpflags 0x18<PUSH,ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST And: Nov 9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80; syncache_socket: Socket create failed due to limits or memory shortage Nov 9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80 tcpflags 0x10<ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST Nov 9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80; syncache_socket: Socket create failed due to limits or memory shortage Nov 9 19:05:55 wordpress1 kernel: TCP: [207.210.67.2]:64851 to [66.230.193.105]:80 tcpflags 0x18<PUSH,ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST It seems very fishy that a socket allocation for this port 64851 should fail three times in less than four minutes, when other allocations succeed. Why are these socket creations failing, and consistently failing for the same port numbers? I don't know if it's relevant, but accf_http is loaded on wordpress1. We have seen similar behavior (TCP slowdowns) on a different machines (4 x Xeon 5160) with a different NIC (em0) running RELENG_7, though I haven't diagnosed it to this level of detail. All our RELENG_6 and RELENG_4 machines seem fine. An excerpt from the full tcpdump is available at http://bilbo.vpop.net/~mreimer/tcpdump.bad; I'll make the full 14M dump available upon request. I'll run whatever tests are needed, add debugging printf's, etc. Thanks for your help. Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f383264b0711092323p5148300fu3c0883135f8fb01b>