From owner-freebsd-net@FreeBSD.ORG Wed Aug 6 21:25:40 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 09D74AE6 for ; Wed, 6 Aug 2014 21:25:40 +0000 (UTC) Received: from mail-ob0-x233.google.com (mail-ob0-x233.google.com [IPv6:2607:f8b0:4003:c01::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C213C2969 for ; Wed, 6 Aug 2014 21:25:39 +0000 (UTC) Received: by mail-ob0-f179.google.com with SMTP id wn1so2321895obc.10 for ; Wed, 06 Aug 2014 14:25:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=averesystems.com; s=google; h=mime-version:date:message-id:subject:from:to:content-type; bh=u7vpJuOYSKkEjU1L1sT0G2CjPUdY83YnO747s7WO534=; b=B90NfRiuO0u7TzkxYX6QdSeua5/VXHcxweRRP292Hvy4a2Fo4c/AZ6iKPky9+cvBG2 6Zkdoq++gKFANe37gjzZfLEXNhCRXcI3VgiR6LMtqfXidQ/FhAQLIZ/PaS/9wFEawPUI lH+z7Cf8XH7VvGgfZL3XHGO7y3zi3KB8Z5QTM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=u7vpJuOYSKkEjU1L1sT0G2CjPUdY83YnO747s7WO534=; b=HtTtjCx66Ow8xGSfggq94SA7vsgJTD0LNBlSjTz+OXAiOwg7TAuwXn+ULF7P+GBQfq BghsqeRaLNYqG+b/TgN1RQeXc9wkvj57yQnYyWNAkZ+wjt2Mq+UTsIZeARRCt4o0qYFK G82Fp5BatoiGpYjwgFSc1Zq2OV8eg803pkZLc7L2AVHBHk6eJqPEbtzpTiv4LYJGEzMD R/UfO5pWgO5Z+Uppre1diiGIBYgpzrfra2h+0ocVJ/+QRPeDJkLWuXVKSoSaEUYgsLYM WH+SbfYPhybFqxQXm9X+jIr/HTDaiomwcM6HjzX5IhFT5R9iPRo2sIEoRmkJJwtevyae 7koQ== X-Gm-Message-State: ALoCoQnswda1ymym5dbdlqRALymTBROBHD8dEu0lLe7E3FrKggoeutt/IKtOm5xQXOkgkY4jwsLL MIME-Version: 1.0 X-Received: by 10.60.46.167 with SMTP id w7mr18708456oem.50.1407360338681; Wed, 06 Aug 2014 14:25:38 -0700 (PDT) Received: by 10.76.93.209 with HTTP; Wed, 6 Aug 2014 14:25:38 -0700 (PDT) Date: Wed, 6 Aug 2014 17:25:38 -0400 Message-ID: Subject: zero window and persist timer not set From: Jeremiah Lott To: "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2014 21:25:40 -0000 Hello, We've been seeing a problem where a tcp connection is stuck in a zero window condition and even though the client has opened more window space, our FreeBSD box never sends any more. After some analysis it appears that the FreeBSD box is not sending zero window probes, because the persist timer did not get set (we can see in kgdb that the tcpcb shows 0 window, there is data in the socket buffer, but the persist timer is not active). After looking over the code for a while, I think I see the problem. When tcp_output chooses to send a packet, it never arms the persist timer. This causes a problem in the following scenario: 1. A --> B: packet containing enough data to fill the window 2. B --> A: ACK for #1 + new data (0 window advertisement) 3. A --> B: ACK for #2, 0 len packet In this case, A will not activate the persist timer, because it chose to send a packet. Unless tcp_output is called for some other reason (delayed ack timer, another input packet from B, socket syscall), A will not send zero window probes. I was finally able to recreate this condition by setting an very small window and running programs that send very specific sequences of packets without calling recv (purposefully forcing a zero window condition). Here is a packet capture that shows the sequence: A == 10.2.15.69 == FreeBSD 9.2 B == 10.2.14.61 == FreeBSD 8.2 16:19:49.664790 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [S], seq 2362665163, win 4300, options [mss 1460,nop,wscale 6,sackOK,TS val 88804503 ecr 0], length 0 16:19:49.664821 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [S.], seq 3306387947, ack 2362665164, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 1605043666 ecr 88804503], length 0 16:19:49.664859 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], ack 1, win 67, options [nop,nop,TS val 88804503 ecr 1605043666], length 0 16:19:49.664921 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 1:101, ack 1, win 67, options [nop,nop,TS val 88804503 ecr 1605043666], length 100 16:19:49.665137 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [P.], seq 1:3001, ack 101, win 2046, options [nop,nop,TS val 1605043666 ecr 88804503], length 3000 16:19:49.665208 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 101:1321, ack 1449, win 45, options [nop,nop,TS val 88804503 ecr 1605043666], length 1220 16:19:49.666195 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq 1321:2769, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr 1605043666], length 1448 16:19:49.666205 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack 2769, win 2004, options [nop,nop,TS val 1605043667 ecr 88804503], length 0 16:19:49.666207 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 2769:2771, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr 1605043666], length 2 16:19:49.667183 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq 2771:4219, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr 1605043667], length 1448 16:19:49.667190 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], seq 3001:4345, ack 4219, win 1982, options [nop,nop,TS val 1605043668 ecr 88804504], length 1344 16:19:49.667193 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 4219:4221, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr 1605043667], length 2 16:19:49.766487 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 4221:4321, ack 4345, win 0, options [nop,nop,TS val 88804605 ecr 1605043668], length 100 16:19:49.766499 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack 4321, win 1980, options [nop,nop,TS val 1605043768 ecr 88804505], length 0 The important packets are the last four: 1. A --> B: length 1344, fills the remaining window 2. B --> A: length 2, does not ack additional data, delayed ack timer is set 3. B --> A: length 100, acks #1, immediate ack (delayed ack timer cancelled, tcp_output called with ACKNOW) 4. A --> B: length 0, acks #1 and #2, because a packet is sent tcp_output does not activate the persist timer. I would normally expect A to begin sending zero-window probes, but (since it didn't activate the persist timer) it does not. Using kgdb, I can see that the persist timer is not set, only the keep timer is set. This is kgdb on "A": (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_nxt $5 = 3306392292 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_max $6 = 3306392292 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_una $7 = 3306392292 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_wnd $8 = 0 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_cwnd $9 = 4380 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_rexmt->c_flags $11 = 16 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_persist->c_flags $12 = 16 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_keep->c_flags $13 = 22 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_2msl->c_flags $14 = 16 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_delack->c_flags $15 = 16 (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_inpcb->inp_socket.so_snd.sb_cc $16 = 1656 There is zero window, data in the socket buffer, and the persist timer is not set. My proposed fix follows. If you send a 0-length packet, but there is data is the socket buffer, and neither the rexmt or persist timer is already set, then activate the persist timer. --- sys/netinet/tcp_output.c (revision 269644) +++ sys/netinet/tcp_output.c (working copy) @@ -1290,7 +1290,12 @@ tp->t_rxtshift = 0; } tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); - } + } else if (len == 0 && so->so_snd.sb_cc && + !tcp_timer_active(tp, TT_REXMT) && + !tcp_timer_active(tp, TT_PERSIST)) { + tp->t_rxtshift = 0; + tcp_setpersist(tp); + } } else { /* * Persist case, update snd_max but since we are in Let me know any comments. Thanks, Jeremiah Lott Avere Systems