Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Mar 2010 11:23:22 -0700
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Steven Noonan <steven@uplinklabs.net>
Cc:        freebsd-net@freebsd.org, bug-followup@FreeBSD.org, yongari@freebsd.org
Subject:   Re: kern/144689: [re] TCP transfer corruption using if_re
Message-ID:  <20100316182322.GF2001@michelle.cdnetworks.com>
In-Reply-To: <f488382f1003130418s116e9c1frfd210db4127b4a9@mail.gmail.com>
References:  <201003121754.o2CHsH7V065932@freefall.freebsd.org> <f488382f1003121619y17780ed9x52765b9a9133fb2@mail.gmail.com> <f488382f1003121624j34a8aee8kc127e82c08c3fe37@mail.gmail.com> <f488382f1003122157i12968043h31c8020007f7e8a1@mail.gmail.com> <f488382f1003130418s116e9c1frfd210db4127b4a9@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 13, 2010 at 04:18:30AM -0800, Steven Noonan wrote:
> On Fri, Mar 12, 2010 at 9:57 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> > On Fri, Mar 12, 2010 at 4:24 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> >> On Fri, Mar 12, 2010 at 4:19 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> >>> On Fri, Mar 12, 2010 at 9:54 AM, ??<yongari@freebsd.org> wrote:
> >>>> Synopsis: [re] TCP transfer corruption using if_re
> >>>>
> >>>> State-Changed-From-To: open->feedback
> >>>> State-Changed-By: yongari
> >>>> State-Changed-When: Fri Mar 12 17:53:37 UTC 2010
> >>>> State-Changed-Why:
> >>>> This looks like Rx checksum offloading issue. Would you try
> >>>> disabling Rx checksum offloading and test it again?
> >>>> #ifconfig re0 -rxcsum
> >>>> Also show me dmesg output(re(4) related part).
> >>>>
> >>>>
> >>>> Responsible-Changed-From-To: freebsd-net->yongari
> >>>> Responsible-Changed-By: yongari
> >>>> Responsible-Changed-When: Fri Mar 12 17:53:37 UTC 2010
> >>>> Responsible-Changed-Why:
> >>>> Mine.
> >>>>
> >>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=144689
> >>>>
> >>>
> >>> Hmm. Disabling Rx checksum offloading helped for one clone process,
> >>> but then this showed up in dmesg during my second test (it seems to be
> >>> doing this regularly for some reason):
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>>
> >>> And no, the cable isn't loose or something. It just decides to take
> >>> the interface down and put it back up.
> >>>
> >>> Here's the rest of 'dmesg | grep re0':
> >>>
> >>> firewire0: <IEEE1394(FireWire) bus> on fwohci0
> >>> dcons_crom0: <dcons configuration ROM> on firewire0
> >>> fwe0: <Ethernet over FireWire> on firewire0
> >>> fwip0: <IP over FireWire> on firewire0
> >>> firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0) ??(me)
> >>> firewire0: bus manager 0
> >>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet>
> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on
> >>> cardbus0
> >>> re0: Chip rev. 0x10000000
> >>> re0: MAC rev. 0x00000000
> >>> miibus1: <MII bus> on re0
> >>> re0: Ethernet address: 00:18:4d:6e:c0:29
> >>> re0: [FILTER]
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: detached
> >>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet>
> >>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on
> >>> cardbus0
> >>> re0: Chip rev. 0x10000000
> >>> re0: MAC rev. 0x00000000
> >>> miibus1: <MII bus> on re0
> >>> re0: Ethernet address: 00:18:4d:6e:c0:29
> >>> re0: [FILTER]
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: PHY read failed
> >>> re0: link state changed to DOWN
> >>> re0: link state changed to UP
> >>> re0: PHY read failed
> >>>
> >>> - Steven
> >>>
> >>
> >> I should note that the connection was _lost_ during the second test above.
> >>
> >> I also tested again, and it looks like it added another "re0: PHY read
> >> failed" before silently dropping the connection.
> >>
> >> - Steven
> >>
> >
> > I did a couple captures with Wireshark on the client end. One is with
> > rxcsum enabled on the machine running git-daemon, one is without
> > rxcsum.
> >
> > http://www.uplinklabs.net/~tycho/files/git-cap-norxcsum.bz2
> > http://www.uplinklabs.net/~tycho/files/git-cap.bz2
> >
> > Obviously, you can look at the data yourself and make more sense of
> > it, but here are things I noticed in the captures:
> >
> > With rxcsum:
> > - There are some silent problems that occur in the middle of the
> > capture. Client-to-server: 'TCP ACKed lost segment' a few times, then
> > 'TCP previous segment lost'. This happens multiple times during the
> > capture (before 'git-upload-pack' starts sending data).
> > - Occasional 'TCP window update's. These are highlighted in black for
> > reasons unknown to me. It seems like this would be normal.
> > - The server calls 'git-upload-pack' and we start seeing a large
> > number of client-to-server TCP RST flags being sent and then the
> > connection gets closed due to some detected data corruption in the
> > transfer.
> >
> > Without rxcsum:
> > - About the same amount of client-to-server 'TCP ACKed lost segment's.
> > - 'git-upload-pack' kicks in and things get _really_ hairy. 'TCP Dup
> > ACK' detected by the client many many times.
> > - Finally, a series of 'TCP retransmission's from server to client
> > happen (which is where the connection hangs).
> > - I closed the connection which triggered the final two 'TCP RST's.
> >
> > Also, I forgot to note in my original report that I checked if there
> > was packet loss by using a ping flood, and one packet in the 1.5
> > million packets sent was lost. But I'm not sure whether it's
> > checksumming the data of these packets, so they could be coming back
> > with perfectly valid ICMP headers but corrupted data.
> >
> 
> Also, hilariously horrible hack:
> 
> - On the server machine, start git-daemon listening on 127.0.0.1:9418.
> - On the server machine, run 'ssh -L <public IP>:9418:127.0.0.1:9418
> user@localhost'.
> 
> Then remote git clones work as expected. Very strange. It will have to
> do until I get a less insane solution.
> 

The real issue looks like PHY read failure which can result in
unexpected behavior. I don't see rgephy(4) related message here,
would you show me the output of "devinfo -rv | grep phy"?
By chance are you using PCMCIA ethernet controller?

> I don't understand why it makes a difference. Is git-daemon using TCP
> socket options that causes this network interface driver to
> malfunction?
> 

No, I don't think so. It would be a bug in driver.

> - Steven



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100316182322.GF2001>