Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Mar 2008 14:12:03 +0900
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Ian FREISLICH <ianf@clue.co.za>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>, Robert Backhaus <robbak@robbak.com>
Subject:   Re: Packet corruption in re0
Message-ID:  <20080317051203.GC2503@cdnetworks.co.kr>
In-Reply-To: <E1JSTV4-0000l2-EE@clue.co.za>
References:  <20080222042700.GB30497@cdnetworks.co.kr> <E1JSTV4-0000l2-EE@clue.co.za>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Feb 22, 2008 at 10:43:22AM +0200, Ian FREISLICH wrote:
 > Pyun YongHyeon wrote:
 > > On Thu, Feb 21, 2008 at 01:18:18PM +0200, Ian FREISLICH wrote:
 > >  > Pyun YongHyeon wrote:
 > >  > > On Thu, Feb 21, 2008 at 02:47:43PM +1000, Robert Backhaus wrote:
 > >  > >  > On Thu, Feb 21, 2008 at 1:50 PM, Pyun YongHyeon <pyunyh@gmail.com> wr
 > ote:
 > >  > >  > > On Thu, Feb 21, 2008 at 11:03:02AM +1000, Robert Backhaus wrote:
 > >  > >  > >   > I am experiencing roughly 15% packet corruption on the re inter
 > face 
 > >  > on
 > >  > >  > >   > my freebsd 7/amd64  box.
 > >  > >  > >   >
 > >  > >  > >   > FreeBSD gw.flexi.robbak.com 7.0-PRERELEASE FreeBSD 7.0-PRERELEA
 > SE #8
 > >  > :
 > >  > >  > >   > Tue Feb  5 09:49:55 EST 2008
 > >  > >  > >   > root@gw.flexi.robbak.com:/usr/obj/usr/src/sys/GW  amd64
 > >  > >  > >   >
 > >  > >  > >   > Just to make troubleshooting difficult, this problem only shows
 >  up
 > >  > >  > >   > after the system has been up for roughly 36 hours, depending on
 >  the
 > >  > >  > >   > amount of traffic.
 > >  > >  > >   >
 > >  > >  > >
 > >  > >  > >  I didn't take a look attached tcpdump files but I guess the
 > >  > >  > >  instability issue was fixed in HEAD. It's not yet MFCed but
 > >  > >  > >  I'll handle it in a week.
 > >  > >  > >
 > >  > >  > >  Would you try re(4) in HEAD?
 > >  > >  > >
 > >  > >  > 
 > >  > >  > OK, I'll do that. What is the best way to do that? csupping to "." se
 > ems a
 > >  > >  > bit drastic, and I don't do much with cvs proper. I take it that I sh
 > ould 
 > >  > use
 > >  > >  > anon-cvs to grab the directory, but I don't quite know how.
 > >  > >  > 
 > >  > > 
 > >  > > Copy sys/dev/re/if_re.c, sys/pci/if_rlreg.h in HEAD to your box.
 > >  > > Due to lack of m_defrag(9) in 7-PRERELEASE/RC, you also have to add
 > >  > > that function to if_re.c(Copy m_defrag() in sys/kern/uipc_mbuf.c on
 > >  > > HEAD/RELENG_7 to if_re.c). That would make it build on your box.
 > >  > 
 > >  > This doesn't solve the problem that I'm seeing on re(4) interfaces.
 > >  > It basically shows up as quagga establishing OSPF neighours as
 > >  > "Exchange/DR" when VLAN hardware tagging is enabled.  I'm running
 > >  > OSPF over 802.1Q vlans.  Neighbours are correctly negotiated once
 > >  > VLAN hardware tagging is disabled on the interface.
 > >  > 
 > >  > I'll do more debugging.
 > >  > 
 > > 
 > > Hmm. That sounds like different issue to me. I guess I din't change
 > > any semantics in VLAN H/W tagging. Do you still the same VLAN H/W
 > > tagging related issues on RELENG_7?
 > > 
 > > To narrow down the issue it would be even better to know which parts
 > > of H/W assistance was broken. For example,
 > >  - Disable checksum offload for VLAN interface first and check
 > >    whether quagga works.
 > 
 > You can only disable offload on the parent interface.
 > 
 > >  - Disable checksum offload for parent interface and check again.
 > > If you can post tcpdump output for broken conntection it may help a
 > > lot to diagnose the issue.
 > 
 > The only flag affecting this behaviour is vlanhwtag.  Various
 > permutations of the interface flags make no difference to this
 > behaviour as long as hardware tagging is enabled.
 > 
 > It seems like it's corrupting large packets on transmit when vlanhwtag
 > is enabled.  From the tcpdump output it looks like a padding or
 > packet length issue.
 > 
 > Here's what tcpdump on the re(4) device thinks it's transmitting:
 > 
 > 00:08:a1:3c:32:9c > 00:90:fb:0c:89:7d, ethertype 802.1Q (0x8100), length 1510: vlan 1000, p 0, ethertype IPv4, 196.22.138.92 > 196.22.138.89: OSPFv2, Database Description, length: 1472
 > 
 > Here's what was actually recieved by the em(4) device on the
 > neighbour.  Note the absense of the 801.1Q header:
 > 
 > 00:08:a1:3c:32:9c > 00:90:fb:0c:89:7d, ethertype IPv4 (0x0800), length 1506: 196.22.138.92 > 196.22.138.89: OSPFv2, Database Description, length: 1472
 > 
 > When vlanhwtagging is disabled, the re(4) device transmits:
 > 
 > 00:90:fb:0c:89:7d > 00:08:a1:3c:32:9c, ethertype 802.1Q (0x8100), length 1510: vlan 1000, p 0, ethertype IPv4, 196.22.138.89 > 196.22.138.92: OSPFv2, Database Description, length: 1472
 > 
 > and the em(4) device recieves:
 > 
 > 00:08:a1:3c:32:9c > 00:90:fb:0c:89:7d, ethertype 802.1Q (0x8100), length 1510: vlan 1000, p 0, ethertype IPv4, 196.22.138.92 > 196.22.138.89: OSPFv2, Database Description, length: 1472
 > 
 > Let me know if you need more detailed tcpdump output than I've provided.
 > 

I guess I've found a VLAN hardware tagging bug in re(4).
Please try this one and let me know the result.
http://people.freebsd.org/~yongari/re/if_re.c
http://people.freebsd.org/~yongari/re/if_rlreg.h

 > Ian
 > 
 > --
 > Ian Freislich
 > 

-- 
Regards,
Pyun YongHyeon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080317051203.GC2503>