From owner-freebsd-current@FreeBSD.ORG Sun Nov 4 13:57:42 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7F94B9A4 for ; Sun, 4 Nov 2012 13:57:42 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id BF70C8FC08 for ; Sun, 4 Nov 2012 13:57:40 +0000 (UTC) Received: (qmail 39982 invoked from network); 4 Nov 2012 15:33:29 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 4 Nov 2012 15:33:29 -0000 Message-ID: <50967453.5090503@freebsd.org> Date: Sun, 04 Nov 2012 14:57:39 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: Kim Culhan Subject: Re: weird network problems on current since 10/28/2012 References: <201211031740.qA3HeqVX001622@pozo.com> <201211040113.qA41DfLn001577@pozo.com> <50964FBB.4010600@andric.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Dimitry Andric , freebsd-current@freebsd.org, Adrian Chadd X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Nov 2012 13:57:42 -0000 On 04.11.2012 13:11, Kim Culhan wrote: > On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: >> On 2012-11-04 02:13, Manfred Antar wrote: >>> At 03:29 PM 11/3/2012, Adrian Chadd wrote: >>>> On 3 November 2012 10:40, Manfred Antar wrote: >>>>> i have problem connecting to freebsd box on local network since last sunday. >>>>> the last kernel that works: >>>>> FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 >>>>> anything after that, sometimes i can connect, other times just hangs. >>>>> any network connection hangs ===== pop httpd ssh etc etc. >>>>> anyone have any ideas ? >>>>> i can checkout different sources and see if i can locate the changes that cause >>>>> this. >>>> >>>> Please do! >> ... >>> Here is what I found doing : >>> setenv CVSROOT /usr/home/ncvs >>> >>> cvs co -D"October 28, 2012 12:14:38 PDT" sys >>> >>> A kernel from that time works fine. >>> >>> doing: >>> >>> cvs up -D"October 28, 2012 13:14:38 PDT" sys 1 hour later >>> the following files were changed: >>> sys/netinet/tcp_input.c >>> sys/netinet/tcp_timer.c >>> sys/netinet/tcp_var.h >>> >>> Building a kernel from these new files is when the problem starts. >> >> So, your problems seem to have been introduced by this commit by Andre: >> >> http://svn.freebsd.org/changeset/base/242266 >> >> Increase the initial CWND to 10 segments as defined in IETF TCPM >> draft-ietf-tcpm-initcwnd-05. It explains why the increased initial >> window improves the overall performance of many web services without >> risking congestion collapse. >> >> As long as it remains a draft it is placed under a sysctl marking it >> as experimental: >> net.inet.tcp.experimental.initcwnd10 = 1 >> When it becomes an official RFC soon the sysctl will be changed to >> the RFC number and moved to net.inet.tcp. >> >> This implementation differs from the RFC draft in that it is a bit >> more conservative in the case of packet loss on SYN or SYN|ACK because >> we haven't reduced the default RTO to 1 second yet. Also the restart >> window isn't yet increased as allowed. Both will be adjusted with >> upcoming changes. >> >> Is is enabled by default. In Linux it is enabled since kernel 3.0. >> >> After the commit, there was a small discussion thread on svn-src-head@ >> about the possible problems with the approach. Maybe you are >> experiencing those? >> >> As the commit message says, you should be able to turn the feature off >> using: >> >> sysctl net.inet.tcp.experimental.initcwnd10=0 >> >> Can you please try that, and see if the problems go away? > > FWIW this did not make the problem go away on 2 machines. Yes, this very much looks like the same problem as in PR/173309. Please try the attached patch. It fixes the connection hang issue. There may be a second issue I debugging currently base on the feedback from Fabian Keil. -- Andre Index: tcp_input.c =================================================================== --- tcp_input.c (revision 242494) +++ tcp_input.c (working copy) @@ -2650,10 +2652,12 @@ SOCKBUF_LOCK(&so->so_snd); if (acked > so->so_snd.sb_cc) { + tp->snd_wnd -= so->so_snd.sb_cc; sbdrop_locked(&so->so_snd, (int)so->so_snd.sb_cc); ourfinisacked = 1; } else { sbdrop_locked(&so->so_snd, acked); + tp->snd_wnd -= acked; ourfinisacked = 0; } /* NB: sowwakeup_locked() does an implicit unlock. */