Date: Tue, 16 Mar 2010 12:31:22 -0700 From: Steven Noonan <steven@uplinklabs.net> To: pyunyh@gmail.com Cc: freebsd-net@freebsd.org, bug-followup@freebsd.org, yongari@freebsd.org Subject: Re: kern/144689: [re] TCP transfer corruption using if_re Message-ID: <f488382f1003161231s2fbd7d39yf615941d028c18e8@mail.gmail.com> In-Reply-To: <20100316182322.GF2001@michelle.cdnetworks.com> References: <201003121754.o2CHsH7V065932@freefall.freebsd.org> <f488382f1003121619y17780ed9x52765b9a9133fb2@mail.gmail.com> <f488382f1003121624j34a8aee8kc127e82c08c3fe37@mail.gmail.com> <f488382f1003122157i12968043h31c8020007f7e8a1@mail.gmail.com> <f488382f1003130418s116e9c1frfd210db4127b4a9@mail.gmail.com> <20100316182322.GF2001@michelle.cdnetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 16, 2010 at 11:23 AM, Pyun YongHyeon <pyunyh@gmail.com> wrote: > On Sat, Mar 13, 2010 at 04:18:30AM -0800, Steven Noonan wrote: >> On Fri, Mar 12, 2010 at 9:57 PM, Steven Noonan <steven@uplinklabs.net> wrote: >> > On Fri, Mar 12, 2010 at 4:24 PM, Steven Noonan <steven@uplinklabs.net> wrote: >> >> On Fri, Mar 12, 2010 at 4:19 PM, Steven Noonan <steven@uplinklabs.net> wrote: >> >>> On Fri, Mar 12, 2010 at 9:54 AM, ??<yongari@freebsd.org> wrote: >> >>>> Synopsis: [re] TCP transfer corruption using if_re >> >>>> >> >>>> State-Changed-From-To: open->feedback >> >>>> State-Changed-By: yongari >> >>>> State-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >> >>>> State-Changed-Why: >> >>>> This looks like Rx checksum offloading issue. Would you try >> >>>> disabling Rx checksum offloading and test it again? >> >>>> #ifconfig re0 -rxcsum >> >>>> Also show me dmesg output(re(4) related part). >> >>>> >> >>>> >> >>>> Responsible-Changed-From-To: freebsd-net->yongari >> >>>> Responsible-Changed-By: yongari >> >>>> Responsible-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >> >>>> Responsible-Changed-Why: >> >>>> Mine. >> >>>> >> >>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=144689 >> >>>> >> >>> >> >>> Hmm. Disabling Rx checksum offloading helped for one clone process, >> >>> but then this showed up in dmesg during my second test (it seems to be >> >>> doing this regularly for some reason): >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> >> >>> And no, the cable isn't loose or something. It just decides to take >> >>> the interface down and put it back up. >> >>> >> >>> Here's the rest of 'dmesg | grep re0': >> >>> >> >>> firewire0: <IEEE1394(FireWire) bus> on fwohci0 >> >>> dcons_crom0: <dcons configuration ROM> on firewire0 >> >>> fwe0: <Ethernet over FireWire> on firewire0 >> >>> fwip0: <IP over FireWire> on firewire0 >> >>> firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0) ??(me) >> >>> firewire0: bus manager 0 >> >>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> >> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >> >>> cardbus0 >> >>> re0: Chip rev. 0x10000000 >> >>> re0: MAC rev. 0x00000000 >> >>> miibus1: <MII bus> on re0 >> >>> re0: Ethernet address: 00:18:4d:6e:c0:29 >> >>> re0: [FILTER] >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: detached >> >>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> >> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >> >>> cardbus0 >> >>> re0: Chip rev. 0x10000000 >> >>> re0: MAC rev. 0x00000000 >> >>> miibus1: <MII bus> on re0 >> >>> re0: Ethernet address: 00:18:4d:6e:c0:29 >> >>> re0: [FILTER] >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: PHY read failed >> >>> re0: link state changed to DOWN >> >>> re0: link state changed to UP >> >>> re0: PHY read failed >> >>> >> >>> - Steven >> >>> >> >> >> >> I should note that the connection was _lost_ during the second test above. >> >> >> >> I also tested again, and it looks like it added another "re0: PHY read >> >> failed" before silently dropping the connection. >> >> >> >> - Steven >> >> >> > >> > I did a couple captures with Wireshark on the client end. One is with >> > rxcsum enabled on the machine running git-daemon, one is without >> > rxcsum. >> > >> > http://www.uplinklabs.net/~tycho/files/git-cap-norxcsum.bz2 >> > http://www.uplinklabs.net/~tycho/files/git-cap.bz2 >> > >> > Obviously, you can look at the data yourself and make more sense of >> > it, but here are things I noticed in the captures: >> > >> > With rxcsum: >> > - There are some silent problems that occur in the middle of the >> > capture. Client-to-server: 'TCP ACKed lost segment' a few times, then >> > 'TCP previous segment lost'. This happens multiple times during the >> > capture (before 'git-upload-pack' starts sending data). >> > - Occasional 'TCP window update's. These are highlighted in black for >> > reasons unknown to me. It seems like this would be normal. >> > - The server calls 'git-upload-pack' and we start seeing a large >> > number of client-to-server TCP RST flags being sent and then the >> > connection gets closed due to some detected data corruption in the >> > transfer. >> > >> > Without rxcsum: >> > - About the same amount of client-to-server 'TCP ACKed lost segment's. >> > - 'git-upload-pack' kicks in and things get _really_ hairy. 'TCP Dup >> > ACK' detected by the client many many times. >> > - Finally, a series of 'TCP retransmission's from server to client >> > happen (which is where the connection hangs). >> > - I closed the connection which triggered the final two 'TCP RST's. >> > >> > Also, I forgot to note in my original report that I checked if there >> > was packet loss by using a ping flood, and one packet in the 1.5 >> > million packets sent was lost. But I'm not sure whether it's >> > checksumming the data of these packets, so they could be coming back >> > with perfectly valid ICMP headers but corrupted data. >> > >> >> Also, hilariously horrible hack: >> >> - On the server machine, start git-daemon listening on 127.0.0.1:9418. >> - On the server machine, run 'ssh -L <public IP>:9418:127.0.0.1:9418 >> user@localhost'. >> >> Then remote git clones work as expected. Very strange. It will have to >> do until I get a less insane solution. >> > > The real issue looks like PHY read failure which can result in > unexpected behavior. I don't see rgephy(4) related message here, > would you show me the output of "devinfo -rv | grep phy"? > By chance are you using PCMCIA ethernet controller? I am. It's a Netgear GA511. I think I said in my original post that it was connected via cardbus. xerxes ~ # devinfo -rv | grep phy rgephy0 pnpinfo oui=0x732 model=0x11 rev=0x3 at phyno=1 inphy0 pnpinfo oui=0xaa00 model=0x33 rev=0x0 at phyno=1 > >> I don't understand why it makes a difference. Is git-daemon using TCP >> socket options that causes this network interface driver to >> malfunction? >> > > No, I don't think so. It would be a bug in driver. > >> - Steven >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f488382f1003161231s2fbd7d39yf615941d028c18e8>