From owner-freebsd-net@FreeBSD.ORG Tue Mar 16 19:32:36 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7266D106564A; Tue, 16 Mar 2010 19:32:36 +0000 (UTC) (envelope-from steven@uplinklabs.net) Received: from mail-pz0-f196.google.com (mail-pz0-f196.google.com [209.85.222.196]) by mx1.freebsd.org (Postfix) with ESMTP id 3D4008FC1F; Tue, 16 Mar 2010 19:32:35 +0000 (UTC) Received: by pzk34 with SMTP id 34so239950pzk.3 for ; Tue, 16 Mar 2010 12:32:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.141.1.6 with SMTP id d6mr14137rvi.175.1268767883152; Tue, 16 Mar 2010 12:31:23 -0700 (PDT) In-Reply-To: <20100316182322.GF2001@michelle.cdnetworks.com> References: <201003121754.o2CHsH7V065932@freefall.freebsd.org> <20100316182322.GF2001@michelle.cdnetworks.com> Date: Tue, 16 Mar 2010 12:31:22 -0700 Message-ID: From: Steven Noonan To: pyunyh@gmail.com Content-Type: text/plain; charset=UTF-8 Cc: freebsd-net@freebsd.org, bug-followup@freebsd.org, yongari@freebsd.org Subject: Re: kern/144689: [re] TCP transfer corruption using if_re X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Mar 2010 19:32:36 -0000 On Tue, Mar 16, 2010 at 11:23 AM, Pyun YongHyeon wrote: > On Sat, Mar 13, 2010 at 04:18:30AM -0800, Steven Noonan wrote: >> On Fri, Mar 12, 2010 at 9:57 PM, Steven Noonan wrote: >> > On Fri, Mar 12, 2010 at 4:24 PM, Steven Noonan wrote: >> >> On Fri, Mar 12, 2010 at 4:19 PM, Steven Noonan wrote: >> >>> On Fri, Mar 12, 2010 at 9:54 AM, ?? wrote: >> >>>> Synopsis: [re] TCP transfer corruption using if_re >> >>>> >> >>>> State-Changed-From-To: open->feedback >> >>>> State-Changed-By: yongari >> >>>> State-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >> >>>> State-Changed-Why: >> >>>> This looks like Rx checksum offloading issue. Would you try >> >>>> disabling Rx checksum offloading and test it again? >> >>>> #ifconfig re0 -rxcsum >> >>>> Also show me dmesg output(re(4) related part). >> >>>> >> >>>> >> >>>> Responsible-Changed-From-To: freebsd-net->yongari >> >>>> Responsible-Changed-By: yongari >> >>>> Responsible-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >> >>>> Responsible-Changed-Why: >> >>>> Mine. >> >>>> >> >>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=144689 >> >>>> >> >>> >> >>> Hmm. Disabling Rx checksum offloading helped for one clone process, >> >>> but then this showed up in dmesg during my second test (it seems to be >> >>> doing this regularly for some reason): >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> >> >>> And no, the cable isn't loose or something. It just decides to take >> >>> the interface down and put it back up. >> >>> >> >>> Here's the rest of 'dmesg | grep re0': >> >>> >> >>> firewire0: on fwohci0 >> >>> dcons_crom0: on firewire0 >> >>> fwe0: on firewire0 >> >>> fwip0: on firewire0 >> >>> firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0) ??(me) >> >>> firewire0: bus manager 0 >> >>> re0: >> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >> >>> cardbus0 >> >>> re0: Chip rev. 0x10000000 >> >>> re0: MAC rev. 0x00000000 >> >>> miibus1: on re0 >> >>> re0: Ethernet address: 00:18:4d:6e:c0:29 >> >>> re0: [FILTER] >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: detached >> >>> re0: >> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >> >>> cardbus0 >> >>> re0: Chip rev. 0x10000000 >> >>> re0: MAC rev. 0x00000000 >> >>> miibus1: on re0 >> >>> re0: Ethernet address: 00:18:4d:6e:c0:29 >> >>> re0: [FILTER] >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> >> >>> - Steven >> >>> >> >> >> >> I should note that the connection was _lost_ during the second test above. >> >> >> >> I also tested again, and it looks like it added another "re0: PHY read >> >> failed" before silently dropping the connection. >> >> >> >> - Steven >> >> >> > >> > I did a couple captures with Wireshark on the client end. One is with >> > rxcsum enabled on the machine running git-daemon, one is without >> > rxcsum. >> > >> > http://www.uplinklabs.net/~tycho/files/git-cap-norxcsum.bz2 >> > http://www.uplinklabs.net/~tycho/files/git-cap.bz2 >> > >> > Obviously, you can look at the data yourself and make more sense of >> > it, but here are things I noticed in the captures: >> > >> > With rxcsum: >> > - There are some silent problems that occur in the middle of the >> > capture. Client-to-server: 'TCP ACKed lost segment' a few times, then >> > 'TCP previous segment lost'. This happens multiple times during the >> > capture (before 'git-upload-pack' starts sending data). >> > - Occasional 'TCP window update's. These are highlighted in black for >> > reasons unknown to me. It seems like this would be normal. >> > - The server calls 'git-upload-pack' and we start seeing a large >> > number of client-to-server TCP RST flags being sent and then the >> > connection gets closed due to some detected data corruption in the >> > transfer. >> > >> > Without rxcsum: >> > - About the same amount of client-to-server 'TCP ACKed lost segment's. >> > - 'git-upload-pack' kicks in and things get _really_ hairy. 'TCP Dup >> > ACK' detected by the client many many times. >> > - Finally, a series of 'TCP retransmission's from server to client >> > happen (which is where the connection hangs). >> > - I closed the connection which triggered the final two 'TCP RST's. >> > >> > Also, I forgot to note in my original report that I checked if there >> > was packet loss by using a ping flood, and one packet in the 1.5 >> > million packets sent was lost. But I'm not sure whether it's >> > checksumming the data of these packets, so they could be coming back >> > with perfectly valid ICMP headers but corrupted data. >> > >> >> Also, hilariously horrible hack: >> >> - On the server machine, start git-daemon listening on 127.0.0.1:9418. >> - On the server machine, run 'ssh -L :9418:127.0.0.1:9418 >> user@localhost'. >> >> Then remote git clones work as expected. Very strange. It will have to >> do until I get a less insane solution. >> > > The real issue looks like PHY read failure which can result in > unexpected behavior. I don't see rgephy(4) related message here, > would you show me the output of "devinfo -rv | grep phy"? > By chance are you using PCMCIA ethernet controller? I am. It's a Netgear GA511. I think I said in my original post that it was connected via cardbus. xerxes ~ # devinfo -rv | grep phy rgephy0 pnpinfo oui=0x732 model=0x11 rev=0x3 at phyno=1 inphy0 pnpinfo oui=0xaa00 model=0x33 rev=0x0 at phyno=1 > >> I don't understand why it makes a difference. Is git-daemon using TCP >> socket options that causes this network interface driver to >> malfunction? >> > > No, I don't think so. It would be a bug in driver. > >> - Steven >