Date: Sat, 13 Mar 2010 04:18:30 -0800 From: Steven Noonan <steven@uplinklabs.net> To: yongari@freebsd.org Cc: freebsd-net@freebsd.org Subject: Re: kern/144689: [re] TCP transfer corruption using if_re Message-ID: <f488382f1003130418s116e9c1frfd210db4127b4a9@mail.gmail.com> In-Reply-To: <f488382f1003122157i12968043h31c8020007f7e8a1@mail.gmail.com> References: <201003121754.o2CHsH7V065932@freefall.freebsd.org> <f488382f1003121619y17780ed9x52765b9a9133fb2@mail.gmail.com> <f488382f1003121624j34a8aee8kc127e82c08c3fe37@mail.gmail.com> <f488382f1003122157i12968043h31c8020007f7e8a1@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 12, 2010 at 9:57 PM, Steven Noonan <steven@uplinklabs.net> wrot= e: > On Fri, Mar 12, 2010 at 4:24 PM, Steven Noonan <steven@uplinklabs.net> wr= ote: >> On Fri, Mar 12, 2010 at 4:19 PM, Steven Noonan <steven@uplinklabs.net> w= rote: >>> On Fri, Mar 12, 2010 at 9:54 AM, =C2=A0<yongari@freebsd.org> wrote: >>>> Synopsis: [re] TCP transfer corruption using if_re >>>> >>>> State-Changed-From-To: open->feedback >>>> State-Changed-By: yongari >>>> State-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >>>> State-Changed-Why: >>>> This looks like Rx checksum offloading issue. Would you try >>>> disabling Rx checksum offloading and test it again? >>>> #ifconfig re0 -rxcsum >>>> Also show me dmesg output(re(4) related part). >>>> >>>> >>>> Responsible-Changed-From-To: freebsd-net->yongari >>>> Responsible-Changed-By: yongari >>>> Responsible-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >>>> Responsible-Changed-Why: >>>> Mine. >>>> >>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D144689 >>>> >>> >>> Hmm. Disabling Rx checksum offloading helped for one clone process, >>> but then this showed up in dmesg during my second test (it seems to be >>> doing this regularly for some reason): >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> >>> And no, the cable isn't loose or something. It just decides to take >>> the interface down and put it back up. >>> >>> Here's the rest of 'dmesg | grep re0': >>> >>> firewire0: <IEEE1394(FireWire) bus> on fwohci0 >>> dcons_crom0: <dcons configuration ROM> on firewire0 >>> fwe0: <Ethernet over FireWire> on firewire0 >>> fwip0: <IP over FireWire> on firewire0 >>> firewire0: 1 nodes, maxhop <=3D 0 cable IRM irm(0) =C2=A0(me) >>> firewire0: bus manager 0 >>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >>> cardbus0 >>> re0: Chip rev. 0x10000000 >>> re0: MAC rev. 0x00000000 >>> miibus1: <MII bus> on re0 >>> re0: Ethernet address: 00:18:4d:6e:c0:29 >>> re0: [FILTER] >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: detached >>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >>> cardbus0 >>> re0: Chip rev. 0x10000000 >>> re0: MAC rev. 0x00000000 >>> miibus1: <MII bus> on re0 >>> re0: Ethernet address: 00:18:4d:6e:c0:29 >>> re0: [FILTER] >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: PHY read failed >>> re0: link state changed to DOWN >>> re0: link state changed to UP >>> re0: PHY read failed >>> >>> - Steven >>> >> >> I should note that the connection was _lost_ during the second test abov= e. >> >> I also tested again, and it looks like it added another "re0: PHY read >> failed" before silently dropping the connection. >> >> - Steven >> > > I did a couple captures with Wireshark on the client end. One is with > rxcsum enabled on the machine running git-daemon, one is without > rxcsum. > > http://www.uplinklabs.net/~tycho/files/git-cap-norxcsum.bz2 > http://www.uplinklabs.net/~tycho/files/git-cap.bz2 > > Obviously, you can look at the data yourself and make more sense of > it, but here are things I noticed in the captures: > > With rxcsum: > - There are some silent problems that occur in the middle of the > capture. Client-to-server: 'TCP ACKed lost segment' a few times, then > 'TCP previous segment lost'. This happens multiple times during the > capture (before 'git-upload-pack' starts sending data). > - Occasional 'TCP window update's. These are highlighted in black for > reasons unknown to me. It seems like this would be normal. > - The server calls 'git-upload-pack' and we start seeing a large > number of client-to-server TCP RST flags being sent and then the > connection gets closed due to some detected data corruption in the > transfer. > > Without rxcsum: > - About the same amount of client-to-server 'TCP ACKed lost segment's. > - 'git-upload-pack' kicks in and things get _really_ hairy. 'TCP Dup > ACK' detected by the client many many times. > - Finally, a series of 'TCP retransmission's from server to client > happen (which is where the connection hangs). > - I closed the connection which triggered the final two 'TCP RST's. > > Also, I forgot to note in my original report that I checked if there > was packet loss by using a ping flood, and one packet in the 1.5 > million packets sent was lost. But I'm not sure whether it's > checksumming the data of these packets, so they could be coming back > with perfectly valid ICMP headers but corrupted data. > Also, hilariously horrible hack: - On the server machine, start git-daemon listening on 127.0.0.1:9418. - On the server machine, run 'ssh -L <public IP>:9418:127.0.0.1:9418 user@localhost'. Then remote git clones work as expected. Very strange. It will have to do until I get a less insane solution. I don't understand why it makes a difference. Is git-daemon using TCP socket options that causes this network interface driver to malfunction? - Steven
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f488382f1003130418s116e9c1frfd210db4127b4a9>