From owner-freebsd-net@FreeBSD.ORG Tue Mar 16 18:23:59 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A55F106564A; Tue, 16 Mar 2010 18:23:59 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-fx0-f215.google.com (mail-fx0-f215.google.com [209.85.220.215]) by mx1.freebsd.org (Postfix) with ESMTP id 721FA8FC1B; Tue, 16 Mar 2010 18:23:58 +0000 (UTC) Received: by fxm7 with SMTP id 7so244410fxm.3 for ; Tue, 16 Mar 2010 11:23:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=DxhbioBgx3oYl5Os1Nycx0MbdDHXgqIic1KM1xdSUIo=; b=I15VS28WjtwXEEXlxP/L4PJNEIX+Or0zLJ7uYKofo0axlM8Mn137kc5LVKM5+7PLkj gSMNC3zV/vvofpRep8pyr/zozld34xNzFeYoojCliYS6CnKtdbOV84swIKqPQywsYqlv dE0cmgST254gdXQeeTlaTxkZSs3JOtGCP8b6E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=gnOKdPM+1rnKcg4bppgoZC4w0yPNJoJRGQs0IbUlOu0e9PMAZ6CO0/lAOcLU3AKyLg yy/L9avPJJ0ZHu+5gDQMONbV3Cx/OKSTkNW8C4o1rvXv8imeLesZmP38kbL29iIoNu1z 3NI1asVC0W/JBijQi7yCIKVEeAfwqx/1V/9X8= Received: by 10.87.2.15 with SMTP id e15mr1922281fgi.22.1268763836924; Tue, 16 Mar 2010 11:23:56 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id e11sm3426600fga.11.2010.03.16.11.23.52 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 16 Mar 2010 11:23:54 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Tue, 16 Mar 2010 11:23:22 -0700 From: Pyun YongHyeon Date: Tue, 16 Mar 2010 11:23:22 -0700 To: Steven Noonan Message-ID: <20100316182322.GF2001@michelle.cdnetworks.com> References: <201003121754.o2CHsH7V065932@freefall.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, bug-followup@FreeBSD.org, yongari@freebsd.org Subject: Re: kern/144689: [re] TCP transfer corruption using if_re X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Mar 2010 18:23:59 -0000 On Sat, Mar 13, 2010 at 04:18:30AM -0800, Steven Noonan wrote: > On Fri, Mar 12, 2010 at 9:57 PM, Steven Noonan wrote: > > On Fri, Mar 12, 2010 at 4:24 PM, Steven Noonan wrote: > >> On Fri, Mar 12, 2010 at 4:19 PM, Steven Noonan wrote: > >>> On Fri, Mar 12, 2010 at 9:54 AM, ?? wrote: > >>>> Synopsis: [re] TCP transfer corruption using if_re > >>>> > >>>> State-Changed-From-To: open->feedback > >>>> State-Changed-By: yongari > >>>> State-Changed-When: Fri Mar 12 17:53:37 UTC 2010 > >>>> State-Changed-Why: > >>>> This looks like Rx checksum offloading issue. Would you try > >>>> disabling Rx checksum offloading and test it again? > >>>> #ifconfig re0 -rxcsum > >>>> Also show me dmesg output(re(4) related part). > >>>> > >>>> > >>>> Responsible-Changed-From-To: freebsd-net->yongari > >>>> Responsible-Changed-By: yongari > >>>> Responsible-Changed-When: Fri Mar 12 17:53:37 UTC 2010 > >>>> Responsible-Changed-Why: > >>>> Mine. > >>>> > >>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=144689 > >>>> > >>> > >>> Hmm. Disabling Rx checksum offloading helped for one clone process, > >>> but then this showed up in dmesg during my second test (it seems to be > >>> doing this regularly for some reason): > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> > >>> And no, the cable isn't loose or something. It just decides to take > >>> the interface down and put it back up. > >>> > >>> Here's the rest of 'dmesg | grep re0': > >>> > >>> firewire0: on fwohci0 > >>> dcons_crom0: on firewire0 > >>> fwe0: on firewire0 > >>> fwip0: on firewire0 > >>> firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0) ??(me) > >>> firewire0: bus manager 0 > >>> re0: > >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on > >>> cardbus0 > >>> re0: Chip rev. 0x10000000 > >>> re0: MAC rev. 0x00000000 > >>> miibus1: on re0 > >>> re0: Ethernet address: 00:18:4d:6e:c0:29 > >>> re0: [FILTER] > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: detached > >>> re0: > >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on > >>> cardbus0 > >>> re0: Chip rev. 0x10000000 > >>> re0: MAC rev. 0x00000000 > >>> miibus1: on re0 > >>> re0: Ethernet address: 00:18:4d:6e:c0:29 > >>> re0: [FILTER] > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: PHY read failed > >>> re0: link state changed to DOWN > >>> re0: link state changed to UP > >>> re0: PHY read failed > >>> > >>> - Steven > >>> > >> > >> I should note that the connection was _lost_ during the second test above. > >> > >> I also tested again, and it looks like it added another "re0: PHY read > >> failed" before silently dropping the connection. > >> > >> - Steven > >> > > > > I did a couple captures with Wireshark on the client end. One is with > > rxcsum enabled on the machine running git-daemon, one is without > > rxcsum. > > > > http://www.uplinklabs.net/~tycho/files/git-cap-norxcsum.bz2 > > http://www.uplinklabs.net/~tycho/files/git-cap.bz2 > > > > Obviously, you can look at the data yourself and make more sense of > > it, but here are things I noticed in the captures: > > > > With rxcsum: > > - There are some silent problems that occur in the middle of the > > capture. Client-to-server: 'TCP ACKed lost segment' a few times, then > > 'TCP previous segment lost'. This happens multiple times during the > > capture (before 'git-upload-pack' starts sending data). > > - Occasional 'TCP window update's. These are highlighted in black for > > reasons unknown to me. It seems like this would be normal. > > - The server calls 'git-upload-pack' and we start seeing a large > > number of client-to-server TCP RST flags being sent and then the > > connection gets closed due to some detected data corruption in the > > transfer. > > > > Without rxcsum: > > - About the same amount of client-to-server 'TCP ACKed lost segment's. > > - 'git-upload-pack' kicks in and things get _really_ hairy. 'TCP Dup > > ACK' detected by the client many many times. > > - Finally, a series of 'TCP retransmission's from server to client > > happen (which is where the connection hangs). > > - I closed the connection which triggered the final two 'TCP RST's. > > > > Also, I forgot to note in my original report that I checked if there > > was packet loss by using a ping flood, and one packet in the 1.5 > > million packets sent was lost. But I'm not sure whether it's > > checksumming the data of these packets, so they could be coming back > > with perfectly valid ICMP headers but corrupted data. > > > > Also, hilariously horrible hack: > > - On the server machine, start git-daemon listening on 127.0.0.1:9418. > - On the server machine, run 'ssh -L :9418:127.0.0.1:9418 > user@localhost'. > > Then remote git clones work as expected. Very strange. It will have to > do until I get a less insane solution. > The real issue looks like PHY read failure which can result in unexpected behavior. I don't see rgephy(4) related message here, would you show me the output of "devinfo -rv | grep phy"? By chance are you using PCMCIA ethernet controller? > I don't understand why it makes a difference. Is git-daemon using TCP > socket options that causes this network interface driver to > malfunction? > No, I don't think so. It would be a bug in driver. > - Steven