From owner-freebsd-net@FreeBSD.ORG Tue Aug 12 00:35:54 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 53F96669; Tue, 12 Aug 2014 00:35:54 +0000 (UTC) Received: from mail-qa0-x22a.google.com (mail-qa0-x22a.google.com [IPv6:2607:f8b0:400d:c00::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 02BB72A1A; Tue, 12 Aug 2014 00:35:53 +0000 (UTC) Received: by mail-qa0-f42.google.com with SMTP id j15so8598220qaq.29 for ; Mon, 11 Aug 2014 17:35:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Laqn9vwhjqIzb3bqjUxxkabMv4ItF8fdkdbparmeyiU=; b=FJX3aQ1n/0FCQhFhRumowU3R0hlRfBG+ijiQPgqjW9ZY90emymraR96XbEB48yU3ut pfX5JecWOnxdFzxwfwyhRcEBW6vKss9mGYn430Xc1vbDEbo6/AgmgpjHqz27iH8LN61I XNDS2hURNyx4zKPaDPhNmVVHRSXmh5Mdjkmb1i6+gwowLQQhq0IJ8l5XeuPSXNzrMYzx f3qAWhZ9BY/PqJk+HFDPNvn9d8YXtnel2KAU3XYSv7cRGAay0a6/jjJGW7U26bHFQSk5 61vCzQYyHbehCNHkVxTArG0mDFL6F9jtUNs1w99SPpuf5FvVmRF+Nfv1rpM2tk0JBsVa 6xYQ== MIME-Version: 1.0 X-Received: by 10.224.15.195 with SMTP id l3mr1309061qaa.98.1407803752666; Mon, 11 Aug 2014 17:35:52 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.41.6 with HTTP; Mon, 11 Aug 2014 17:35:52 -0700 (PDT) In-Reply-To: References: <201408111720.18544.jhb@freebsd.org> Date: Mon, 11 Aug 2014 17:35:52 -0700 X-Google-Sender-Auth: B5HrG-VyE303YFGZps8uAlZz2jM Message-ID: Subject: Re: zero window and persist timer not set From: Adrian Chadd To: hiren panchasara Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-net@freebsd.org" , John Baldwin , Jeremiah Lott X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Aug 2014 00:35:54 -0000 Sweet, I can trigger this at home when doing high connection rate TCP tests. Lemme give this a go tonight/tomorrow and see if it changes the behaviour. Thanks! And yes ,please do file a PR! -a On 11 August 2014 17:05, hiren panchasara wrote: > On Mon, Aug 11, 2014 at 2:20 PM, John Baldwin wrote: >> On Wednesday, August 06, 2014 5:25:38 pm Jeremiah Lott wrote: >>> Hello, >>> >>> We've been seeing a problem where a tcp connection is stuck in a zero >>> window condition and even though the client has opened more window space, >>> our FreeBSD box never sends any more. After some analysis it appears that >>> the FreeBSD box is not sending zero window probes, because the persist >>> timer did not get set (we can see in kgdb that the tcpcb shows 0 window, >>> there is data in the socket buffer, but the persist timer is not active). >>> >>> After looking over the code for a while, I think I see the problem. When >>> tcp_output chooses to send a packet, it never arms the persist timer. This >>> causes a problem in the following scenario: >>> >>> 1. A --> B: packet containing enough data to fill the window >>> 2. B --> A: ACK for #1 + new data (0 window advertisement) >>> 3. A --> B: ACK for #2, 0 len packet >>> >>> In this case, A will not activate the persist timer, because it chose to >>> send a packet. Unless tcp_output is called for some other reason (delayed >>> ack timer, another input packet from B, socket syscall), A will not send >>> zero window probes. I was finally able to recreate this condition by >>> setting an very small window and running programs that send very specific >>> sequences of packets without calling recv (purposefully forcing a zero >>> window condition). Here is a packet capture that shows the sequence: >>> >>> A == 10.2.15.69 == FreeBSD 9.2 >>> B == 10.2.14.61 == FreeBSD 8.2 >>> >>> 16:19:49.664790 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [S], seq >>> 2362665163, win 4300, options [mss 1460,nop,wscale 6,sackOK,TS val 88804503 >>> ecr 0], length 0 >>> 16:19:49.664821 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [S.], seq >>> 3306387947, ack 2362665164, win 65535, options [mss 1460,nop,wscale >>> 6,sackOK,TS val 1605043666 ecr 88804503], length 0 >>> 16:19:49.664859 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], ack 1, >>> win 67, options [nop,nop,TS val 88804503 ecr 1605043666], length 0 >>> 16:19:49.664921 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq >>> 1:101, ack 1, win 67, options [nop,nop,TS val 88804503 ecr 1605043666], >>> length 100 >>> 16:19:49.665137 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [P.], seq >>> 1:3001, ack 101, win 2046, options [nop,nop,TS val 1605043666 ecr >>> 88804503], length 3000 >>> 16:19:49.665208 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq >>> 101:1321, ack 1449, win 45, options [nop,nop,TS val 88804503 ecr >>> 1605043666], length 1220 >>> 16:19:49.666195 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq >>> 1321:2769, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr >>> 1605043666], length 1448 >>> 16:19:49.666205 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack >>> 2769, win 2004, options [nop,nop,TS val 1605043667 ecr 88804503], length 0 >>> 16:19:49.666207 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq >>> 2769:2771, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr >>> 1605043666], length 2 >>> 16:19:49.667183 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq >>> 2771:4219, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr >>> 1605043667], length 1448 >>> 16:19:49.667190 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], seq >>> 3001:4345, ack 4219, win 1982, options [nop,nop,TS val 1605043668 ecr >>> 88804504], length 1344 >>> 16:19:49.667193 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq >>> 4219:4221, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr >>> 1605043667], length 2 >>> 16:19:49.766487 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq >>> 4221:4321, ack 4345, win 0, options [nop,nop,TS val 88804605 ecr >>> 1605043668], length 100 >>> 16:19:49.766499 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack >>> 4321, win 1980, options [nop,nop,TS val 1605043768 ecr 88804505], length 0 >>> >>> The important packets are the last four: >>> >>> 1. A --> B: length 1344, fills the remaining window >>> 2. B --> A: length 2, does not ack additional data, delayed ack timer is set >>> 3. B --> A: length 100, acks #1, immediate ack (delayed ack timer >>> cancelled, tcp_output called with ACKNOW) >>> 4. A --> B: length 0, acks #1 and #2, because a packet is sent tcp_output >>> does not activate the persist timer. >>> >>> I would normally expect A to begin sending zero-window probes, but (since >>> it didn't activate the persist timer) it does not. Using kgdb, I can see >>> that the persist timer is not set, only the keep timer is set. This is >>> kgdb on "A": >>> >>> (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_nxt >>> $5 = 3306392292 >>> (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_max >>> $6 = 3306392292 >>> (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_una >>> $7 = 3306392292 >>> (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_wnd >>> $8 = 0 >>> (kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_cwnd >>> $9 = 4380 >>> (kgdb) print ((struct >>> tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_rexmt->c_flags >>> $11 = 16 >>> (kgdb) print ((struct >>> tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_persist->c_flags >>> $12 = 16 >>> (kgdb) print ((struct >>> tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_keep->c_flags >>> $13 = 22 >>> (kgdb) print ((struct >>> tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_2msl->c_flags >>> $14 = 16 >>> (kgdb) print ((struct >>> tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_delack->c_flags >>> $15 = 16 >>> (kgdb) print ((struct >>> tcpcb*)(0xfffffe02ae289b70))->t_inpcb->inp_socket.so_snd.sb_cc >>> $16 = 1656 >>> >>> There is zero window, data in the socket buffer, and the persist timer is >>> not set. >>> >>> My proposed fix follows. If you send a 0-length packet, but there is data >>> is the socket buffer, and neither the rexmt or persist timer is already >>> set, then activate the persist timer. >>> >>> --- sys/netinet/tcp_output.c (revision 269644) >>> +++ sys/netinet/tcp_output.c (working copy) >>> @@ -1290,7 +1290,12 @@ >>> tp->t_rxtshift = 0; >>> } >>> tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); >>> - } >>> + } else if (len == 0 && so->so_snd.sb_cc && >>> + !tcp_timer_active(tp, TT_REXMT) && >>> + !tcp_timer_active(tp, TT_PERSIST)) { >>> + tp->t_rxtshift = 0; >>> + tcp_setpersist(tp); >>> + } >>> >>> } else { >>> /* >>> * Persist case, update snd_max but since we are in >>> >>> Let me know any comments. Thanks, >> >> I think your patch is correct, but please file this as a bug report so we can >> hopefully wrangle another person to review this. > > Looks okay to me also from the looks of it. > > cheers, > Hiren > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"