From owner-freebsd-net@FreeBSD.ORG Fri Aug 6 16:06:50 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F5201065680 for ; Fri, 6 Aug 2010 16:06:50 +0000 (UTC) (envelope-from sethj@greatbaysoftware.com) Received: from portcityhosting.com (edge.tidalhosting.net [64.140.243.92]) by mx1.freebsd.org (Postfix) with ESMTP id 4BB4C8FC25 for ; Fri, 6 Aug 2010 16:06:48 +0000 (UTC) Received: from sjeacopello ([173.14.128.81]) by portcityhosting.com with MailEnable ESMTP; Fri, 6 Aug 2010 11:26:47 -0400 X-WatchGuard-Mail-Exception: Allow From: "Seth Jeacopello" To: Date: Fri, 6 Aug 2010 11:26:46 -0400 Message-ID: MIME-Version: 1.0 X-WatchGuard-AntiVirus: part scanned. clean action=allow X-ME-Bayesian: 0.000000 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Server sporadically sending unexpected RST to Client X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Aug 2010 16:06:50 -0000 Hello, We have recently upgraded some productions systems to 8.0-p2 from 7.1-p5 and are having some sporadic issues with TCP connections that were not occurring previously. The problem was first noticed while running slapd with clients coming in at various times to make LDAP queries; at various points during the day we would receive the follow message in the slapd log (along with the client complaining because the connection was reset): daemon: accept(7) failed errno=53 (Software caused connection abort) We began digging into slapd, during which we were running tcpdump to catch the traffic between the client and server. After a bit tracing through the dumps, the following pattern emerged during the failure times: Client -> Server [SYN] Server -> Client [SYN, ACK] Client -> Server [ACK] Client -> Server [Request] Server -> Client [RST] Server -> Client [Response] Client -> Server [RST] Between the trace above and the accept error, it prompted the question as to why slapd would be throwing an accept error and still be trying to respond to data? It felt like a double trigger, but slapd doesn't appear to be reusing anything, so our next steps lead us to either something in pf or in the kernel as potentially causing the error. The first step was to remove pf as a variable, so we disabled pf (pfctl -d) and found that the issue continued. Eventually, we turned on net.inet.flowtable.debug and net.inet.tcp.log_debug in order to capture some more data and that lead us to the following error: kernel: TCP: [10.174.80.233]:20949 to [10.174.80.242]:389 tcpflags 0x10; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST The first thought was we were overflowing the listen queue but netstat -sp tcp showed the listen queue overflows count was not increasing and the system itself didn't have that many connections coming in at that time (though there was quite a bit of activity from various clients). We also verified that the CPU and memory usage was within a reasonable range (nothing was being stressed and there was plenty of memory to go around. For the sake of being complete, the server in question is a dual CPU (eight cores total) Intel server with 12GB of memory). All of this prompted us to build a debug version of the kernel with some additional logging and we eventually found that we were ultimately failing in in_pcb.c, in_pcbconnect_setup we got EADDRINUSE returned after the in_pcblookup_hash call. We've also noticed that net.tcp.syncache.count will go to -1 when this failure occurs. Per a similar issue (http://lists.freebsd.org/pipermail/freebsd-net/2010-May/025307.html) we were lead to trying to turn off syncookies by setting net.inet.tcp.syncookies to zero which has lead to the error not reappearing (and syncache.count never goes back to -1), however, the connection failures would still occur (shown from client logs and backed up by tcpdumps that show the same pattern as above). This would seem to indicate that there is a problem with syncache. Additionally, we've also found the following pattern while debugging the kernel for the failed connection (note: syncookies is still disabled at this point): kernel: Entering syncache_add kernel: Entering syncache_expand kernel: TCP: [10.174.50.35]:43667 to [10.174.80.242]:389 tcpflags 0x18; syncache_expand: Just before syncache_socket() kernel: Entering syncache_socket kernel: Entering syncache_expand kernel: TCP: [10.174.50.35]:43667 to [10.174.80.242]:389 tcpflags 0x10; syncache_expand: Spurious ACK, segment rejected (syncookies disabled) The above pattern appears to indicate that the PSH is coming in before the final ACK in the handshake (somehow out of order); however, the tcpdumps clearly show everything is coming across in order. Finally, we've also attempted to test with syncookies=1 and syncookies_only=1, which results in the connection failures (as well as the reappearance of the slapd accept error messages). Does anyone have any thoughts on this issue? # uname -a FreeBSD HA1 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #8: Mon Apr 19 16:31:20 UTC 2010 sethj@newcastle.greatbaysoftware.com:/usr/obj/usr/src/sys/KERNEL i386 # sysctl net.inet.tcp.syncache net.inet.tcp.syncache.rst_on_sock_fail: 1 net.inet.tcp.syncache.rexmtlimit: 3 net.inet.tcp.syncache.hashsize: 512 net.inet.tcp.syncache.count: -1 net.inet.tcp.syncache.cachelimit: 15360 net.inet.tcp.syncache.bucketlimit: 30 # netstat -Lan | grep 389 tcp4 0/0/128 *.389 # netstat -sp tcp tcp: 5596836 packets sent 2431990 data packets (3645983288 bytes) 33 data packets (31121 bytes) retransmitted 0 data packets unnecessarily retransmitted 0 resends initiated by MTU discovery 2498873 ack-only packets (48245 delayed) 0 URG only packets 0 window probe packets 588193 window update packets 77883 control packets 10770625 packets received 4611316 acks (for 3643955099 bytes) 54176 duplicate acks 0 acks for unsent data 6172078 packets (2931863382 bytes) received in-sequence 72 completely duplicate packets (10151 bytes) 46 old duplicate packets 0 packets with some dup. data (0 bytes duped) 4136 out-of-order packets (3650806 bytes) 0 packets (0 bytes) of data after window 0 window probes 486096 window update packets 153 packets received after close 0 discarded for bad checksums 0 discarded for bad header offset fields 0 discarded because packet too short 0 discarded due to memory problems 18353 connection requests 62559 connection accepts 0 bad connection attempts 0 listen queue overflows 1 ignored RSTs in the window 80905 connections established (including accepts) 80811 connections closed (including 21351 drops) 58462 connections updated cached RTT on close 58462 connections updated cached RTT variance on close 0 connections updated cached ssthresh on close 9 embryonic connections dropped 4523602 segments updated rtt (of 1677015 attempts) 11 retransmit timeouts 0 connections dropped by rexmit timeout 0 persist timeouts 0 connections dropped by persist timeout 0 Connections (fin_wait_2) dropped because of timeout 0 keepalive timeouts 0 keepalive probes sent 0 connections dropped by keepalive 207248 correct ACK header predictions 5223266 correct data packet header predictions 62537 syncache entries added 0 retransmitted 0 dupsyn 0 dropped 62559 completed 0 bucket overflow 0 cache overflow 0 reset 0 stale 2 aborted 0 badack 0 unreach 0 zone failures 17491 cookies sent 24 cookies received 6 SACK recovery episodes 21 segment rexmits in SACK recovery episodes 25132 byte rexmits in SACK recovery episodes 64 SACK options (SACK blocks) received 2627 SACK options (SACK blocks) sent 0 SACK scoreboard overflow 0 packets with ECN CE bit set 0 packets with ECN ECT(0) bit set 0 packets with ECN ECT(1) bit set 0 successful ECN handshakes 0 times ECN reduced the congestion window KERNCONF: # Inherit config (most stuff) -- this is slightly customized version... # inherits from GENERIC-cust (tweaked to disable the default scheduler) include PAE-cust # Name of this kernel ident KERNEL makeoptions NO_MODULES=yes #makeoptions MODULES_OVERRIDE=geom/geom_journal # Scheduler -- From /usr/src/sys/conf/NOTES: # # SCHED_ULE provides significant performance advantages over 4BSD on many # workloads on SMP machines. It supports cpu-affinity, per-cpu runqueues # and scheduler locks. It also has a stronger notion of interactivity # which leads to better responsiveness even on uniprocessor machines. This # will eventually become the default scheduler. # options SCHED_ULE # Note: we're compiling modules in statically since with PAE we don't want to # load KLDs. See comments in pae(4) and PAE kernel conf file. # Hardware Monitoring / Management device ipmi # Storage options GEOM_JOURNAL # Firewall device pf #PF OpenBSD packet-filter firewall device pflog #logging support interface for PF # device pfsync #synchronization interface for PF options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_CDNR options ALTQ_PRIQ options ALTQ_NOPCC # Required for SMP builds # Linux Emulation options COMPAT_LINUX options LINPROCFS options LINSYSFS # Screen saver # device green_saver ### Network perf tuning options ACCEPT_FILTER_HTTP # options ZERO_COPY_SOCKETS # See polling(4)) options DEVICE_POLLING options HZ=1000 # Avoid scrambled kernel messages during boot -- Trac #1458 options PRINTF_BUFR_SIZE=256 # End config