Date: Fri, 12 Mar 2010 21:57:45 -0800 From: Steven Noonan <steven@uplinklabs.net> To: yongari@freebsd.org Cc: freebsd-net@freebsd.org Subject: Re: kern/144689: [re] TCP transfer corruption using if_re Message-ID: <f488382f1003122157i12968043h31c8020007f7e8a1@mail.gmail.com> In-Reply-To: <f488382f1003121624j34a8aee8kc127e82c08c3fe37@mail.gmail.com> References: <201003121754.o2CHsH7V065932@freefall.freebsd.org> <f488382f1003121619y17780ed9x52765b9a9133fb2@mail.gmail.com> <f488382f1003121624j34a8aee8kc127e82c08c3fe37@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 12, 2010 at 4:24 PM, Steven Noonan <steven@uplinklabs.net> wrot= e: > On Fri, Mar 12, 2010 at 4:19 PM, Steven Noonan <steven@uplinklabs.net> wr= ote: >> On Fri, Mar 12, 2010 at 9:54 AM, =C2=A0<yongari@freebsd.org> wrote: >>> Synopsis: [re] TCP transfer corruption using if_re >>> >>> State-Changed-From-To: open->feedback >>> State-Changed-By: yongari >>> State-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >>> State-Changed-Why: >>> This looks like Rx checksum offloading issue. Would you try >>> disabling Rx checksum offloading and test it again? >>> #ifconfig re0 -rxcsum >>> Also show me dmesg output(re(4) related part). >>> >>> >>> Responsible-Changed-From-To: freebsd-net->yongari >>> Responsible-Changed-By: yongari >>> Responsible-Changed-When: Fri Mar 12 17:53:37 UTC 2010 >>> Responsible-Changed-Why: >>> Mine. >>> >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D144689 >>> >> >> Hmm. Disabling Rx checksum offloading helped for one clone process, >> but then this showed up in dmesg during my second test (it seems to be >> doing this regularly for some reason): >> re0: link state changed to DOWN >> re0: link state changed to UP >> >> And no, the cable isn't loose or something. It just decides to take >> the interface down and put it back up. >> >> Here's the rest of 'dmesg | grep re0': >> >> firewire0: <IEEE1394(FireWire) bus> on fwohci0 >> dcons_crom0: <dcons configuration ROM> on firewire0 >> fwe0: <Ethernet over FireWire> on firewire0 >> fwip0: <IP over FireWire> on firewire0 >> firewire0: 1 nodes, maxhop <=3D 0 cable IRM irm(0) =C2=A0(me) >> firewire0: bus manager 0 >> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> >> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >> cardbus0 >> re0: Chip rev. 0x10000000 >> re0: MAC rev. 0x00000000 >> miibus1: <MII bus> on re0 >> re0: Ethernet address: 00:18:4d:6e:c0:29 >> re0: [FILTER] >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: detached >> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> >> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on >> cardbus0 >> re0: Chip rev. 0x10000000 >> re0: MAC rev. 0x00000000 >> miibus1: <MII bus> on re0 >> re0: Ethernet address: 00:18:4d:6e:c0:29 >> re0: [FILTER] >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: PHY read failed >> re0: link state changed to DOWN >> re0: link state changed to UP >> re0: PHY read failed >> >> - Steven >> > > I should note that the connection was _lost_ during the second test above= . > > I also tested again, and it looks like it added another "re0: PHY read > failed" before silently dropping the connection. > > - Steven > I did a couple captures with Wireshark on the client end. One is with rxcsum enabled on the machine running git-daemon, one is without rxcsum. http://www.uplinklabs.net/~tycho/files/git-cap-norxcsum.bz2 http://www.uplinklabs.net/~tycho/files/git-cap.bz2 Obviously, you can look at the data yourself and make more sense of it, but here are things I noticed in the captures: With rxcsum: - There are some silent problems that occur in the middle of the capture. Client-to-server: 'TCP ACKed lost segment' a few times, then 'TCP previous segment lost'. This happens multiple times during the capture (before 'git-upload-pack' starts sending data). - Occasional 'TCP window update's. These are highlighted in black for reasons unknown to me. It seems like this would be normal. - The server calls 'git-upload-pack' and we start seeing a large number of client-to-server TCP RST flags being sent and then the connection gets closed due to some detected data corruption in the transfer. Without rxcsum: - About the same amount of client-to-server 'TCP ACKed lost segment's. - 'git-upload-pack' kicks in and things get _really_ hairy. 'TCP Dup ACK' detected by the client many many times. - Finally, a series of 'TCP retransmission's from server to client happen (which is where the connection hangs). - I closed the connection which triggered the final two 'TCP RST's. Also, I forgot to note in my original report that I checked if there was packet loss by using a ping flood, and one packet in the 1.5 million packets sent was lost. But I'm not sure whether it's checksumming the data of these packets, so they could be coming back with perfectly valid ICMP headers but corrupted data. - Steven
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f488382f1003122157i12968043h31c8020007f7e8a1>