Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 26 Oct 2012 17:53:54 +0400
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Sebastian Kuzminsky <seb@lineratesystems.com>
Cc:        freebsd-net@FreeBSD.org
Subject:   Re: fragmentation problem in FreeBSD 7
Message-ID:  <20121026135354.GD70741@FreeBSD.org>
In-Reply-To: <CAN=597Rb-ToBQuJ%2BYet9e25Hbt-QmLJPKUXGf1pFEbVsRvFONg@mail.gmail.com>
References:  <CAN=597Rb-ToBQuJ%2BYet9e25Hbt-QmLJPKUXGf1pFEbVsRvFONg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--jCrbxBqMcLqd4mOl
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline

  Sebastian,

On Tue, Oct 23, 2012 at 12:25:59PM -0600, Sebastian Kuzminsky wrote:
S> The root of the problem is two-fold:
S> 
S> 1.  ip_output.c:ip_fragment() does not clear the CSUM_IP flag in the mbuf
S> when it does software IP checksum computation, so the mbuf still looks like
S> it needs IP checksumming.
S> 
S> 2. The em driver does not advertise IP checksum offloading, but still
S> checks the CSUM_IP flag in the mbuf and modifies the packet when that flag
S> is set (this is in em_transmit_checksum_setup(), called by em_xmit()).
S>  Unfortunately the em driver gets the checksum wrong in this case, i guess
S> that's why it doesn't advertise this capability in its if_hwassist!
S> 
S> So the fragments that ip_fastfwd.c:ip_fastforward() gets from
S> ip_output.c:ip_fragment() have ip->ip_sum set correctly, but the
S> mbuf->m_pkthdr.csum_flags incorrectly has CSUM_IP still set, and this
S> causes the em driver to emit incorrect packets.
S> 
S> There are some other callers of ip_fragment(), notably ip_output().
S>  ip_output() clears CSUM_IP in the mbuf csum_flags itself if it's not in
S> if_hwassist, so avoids this problem.
S> 
S> So, the fix is simple: clear the mbuf's CSUM_IP when computing ip->ip_sum
S> in ip_fragment().  The first attached patch (against
S> gitorious/svn_stable_7) does this.
S> 
S> In looking at this issue, I noticed that ip_output()'s use of sw_csum is
S> inconsistent.  ip_output() splits the mbuf's csum_flags into two parts: the
S> stuff that hardware will assist with (these flags get left in the mbuf) and
S> the stuff that software needs to do (these get moved to sw_csum).  But
S> later ip_output() calls functions that don't get sw_csum, or that don't
S> know to look in it and look in the mbuf instead.  My second patch fixes
S> these kinds of issues and (IMO) simplifies the code by leaving all the
S> packet's checksumming needs in the mbuf, getting rid of sw_csum entirely.

Thanks for submission!

I'm about to commit the attached patch to head. Can you please review it?
Haven't I missed anything important?

I have tested it in a 3-box setup similar to yours, except the middle box
doesn't have em(4) NIC, it has igb(4). I've tested it with rxcsum/txcsum on
and off on both NICs.

I have also moved from CSUM_DELAY_IP to CSUM_IP. AFAIU, the alias CSUM_DELAY_IP
was made to match CSUM_DELAY_DATA. But to my point of view it makes it more
difficult to understand code, because a person reading code sees different
constants in the stack and in drivers. Since your change touches every line
in the stack, that utilizes CSUM_DELAY_IP, I decided to consistently use
CSUM_IP constant.

-- 
Totus tuus, Glebius.

--jCrbxBqMcLqd4mOl
Content-Type: text/x-diff; charset=koi8-r
Content-Disposition: attachment; filename="csum.diff"

Index: netinet/ip_output.c
===================================================================
--- netinet/ip_output.c	(revision 242127)
+++ netinet/ip_output.c	(working copy)
@@ -125,7 +125,7 @@
 	struct sockaddr_in *dst;
 	struct in_ifaddr *ia;
 	int isbroadcast;
-	uint16_t ip_len, ip_off, sw_csum;
+	uint16_t ip_len, ip_off;
 	struct route iproute;
 	struct rtentry *rte;	/* cache for ro->ro_rt */
 	struct in_addr odst;
@@ -583,18 +583,16 @@
 	}
 
 	m->m_pkthdr.csum_flags |= CSUM_IP;
-	sw_csum = m->m_pkthdr.csum_flags & ~ifp->if_hwassist;
-	if (sw_csum & CSUM_DELAY_DATA) {
+	if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA & ~ifp->if_hwassist) {
 		in_delayed_cksum(m);
-		sw_csum &= ~CSUM_DELAY_DATA;
+		m->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
 	}
 #ifdef SCTP
-	if (sw_csum & CSUM_SCTP) {
+	if (m->m_pkthdr.csum_flags & CSUM_SCTP & ~ifp->if_hwassist) {
 		sctp_delayed_cksum(m, (uint32_t)(ip->ip_hl << 2));
-		sw_csum &= ~CSUM_SCTP;
+		m->m_pkthdr.csum_flags &= ~CSUM_SCTP;
 	}
 #endif
-	m->m_pkthdr.csum_flags &= ifp->if_hwassist;
 
 	/*
 	 * If small enough for interface, or the interface will take
@@ -604,8 +602,10 @@
 	    (m->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 ||
 	    ((ip_off & IP_DF) == 0 && (ifp->if_hwassist & CSUM_FRAGMENT))) {
 		ip->ip_sum = 0;
-		if (sw_csum & CSUM_DELAY_IP)
+		if (m->m_pkthdr.csum_flags & CSUM_IP & ~ifp->if_hwassist) {
 			ip->ip_sum = in_cksum(m, hlen);
+			m->m_pkthdr.csum_flags &= ~CSUM_IP;
+		}
 
 		/*
 		 * Record statistics for this interface address.
@@ -646,7 +646,7 @@
 	 * Too large for interface; fragment if possible. If successful,
 	 * on return, m will point to a list of packets to be sent.
 	 */
-	error = ip_fragment(ip, &m, mtu, ifp->if_hwassist, sw_csum);
+	error = ip_fragment(ip, &m, mtu, ifp->if_hwassist);
 	if (error)
 		goto bad;
 	for (; m; m = m0) {
@@ -691,11 +691,10 @@
  * chain of fragments that should be freed by the caller.
  *
  * if_hwassist_flags is the hw offload capabilities (see if_data.ifi_hwassist)
- * sw_csum contains the delayed checksums flags (e.g., CSUM_DELAY_IP).
  */
 int
 ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu,
-    u_long if_hwassist_flags, int sw_csum)
+    u_long if_hwassist_flags)
 {
 	int error = 0;
 	int hlen = ip->ip_hl << 2;
@@ -833,8 +832,10 @@
 		m->m_pkthdr.csum_flags = m0->m_pkthdr.csum_flags;
 		mhip->ip_off = htons(mhip->ip_off);
 		mhip->ip_sum = 0;
-		if (sw_csum & CSUM_DELAY_IP)
+		if (m->m_pkthdr.csum_flags & CSUM_IP & ~if_hwassist_flags) {
 			mhip->ip_sum = in_cksum(m, mhlen);
+			m->m_pkthdr.csum_flags &= ~CSUM_IP;
+		}
 		*mnext = m;
 		mnext = &m->m_nextpkt;
 	}
@@ -853,8 +854,10 @@
 	ip->ip_len = htons((u_short)m0->m_pkthdr.len);
 	ip->ip_off = htons(ip_off | IP_MF);
 	ip->ip_sum = 0;
-	if (sw_csum & CSUM_DELAY_IP)
+	if (m0->m_pkthdr.csum_flags & CSUM_IP & ~if_hwassist_flags) {
 		ip->ip_sum = in_cksum(m0, hlen);
+		m0->m_pkthdr.csum_flags &= ~CSUM_IP;
+	}
 
 done:
 	*m_frag = m0;
Index: netinet/ip_mroute.c
===================================================================
--- netinet/ip_mroute.c	(revision 242127)
+++ netinet/ip_mroute.c	(working copy)
@@ -2404,7 +2404,8 @@
 	ip->ip_sum = in_cksum(mb_copy, ip->ip_hl << 2);
     } else {
 	/* Fragment the packet */
-	if (ip_fragment(ip, &mb_copy, mtu, 0, CSUM_DELAY_IP) != 0) {
+	mb_copy->m_pkthdr.csum_flags |= CSUM_IP;
+	if (ip_fragment(ip, &mb_copy, mtu, 0) != 0) {
 	    m_freem(mb_copy);
 	    return NULL;
 	}
Index: netinet/ip_var.h
===================================================================
--- netinet/ip_var.h	(revision 242127)
+++ netinet/ip_var.h	(working copy)
@@ -210,7 +210,7 @@
 int	ip_ctloutput(struct socket *, struct sockopt *sopt);
 void	ip_drain(void);
 int	ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu,
-	    u_long if_hwassist_flags, int sw_csum);
+	    u_long if_hwassist_flags);
 void	ip_forward(struct mbuf *m, int srcrt);
 void	ip_init(void);
 #ifdef VIMAGE
Index: netinet/ip_fastfwd.c
===================================================================
--- netinet/ip_fastfwd.c	(revision 242127)
+++ netinet/ip_fastfwd.c	(working copy)
@@ -542,10 +542,8 @@
 			 * We have to fragment the packet
 			 */
 			m->m_pkthdr.csum_flags |= CSUM_IP;
-			if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
-					(~ifp->if_hwassist & CSUM_DELAY_IP))) {
+			if (ip_fragment(ip, &m, mtu, ifp->if_hwassist))
 				goto drop;
-			}
 			KASSERT(m != NULL, ("null mbuf and no error"));
 			/*
 			 * Send off the fragments via outgoing interface
Index: netpfil/pf/pf.c
===================================================================
--- netpfil/pf/pf.c	(revision 242127)
+++ netpfil/pf/pf.c	(working copy)
@@ -5140,7 +5140,7 @@
 	struct pf_addr		 naddr;
 	struct pf_src_node	*sn = NULL;
 	int			 error = 0;
-	uint16_t		 ip_len, ip_off, sw_csum;
+	uint16_t		 ip_len, ip_off;
 
 	KASSERT(m && *m && r && oifp, ("%s: invalid parameters", __func__));
 	KASSERT(dir == PF_IN || dir == PF_OUT, ("%s: invalid direction",
@@ -5240,18 +5240,16 @@
 
 	/* Copied from FreeBSD 10.0-CURRENT ip_output. */
 	m0->m_pkthdr.csum_flags |= CSUM_IP;
-	sw_csum = m0->m_pkthdr.csum_flags & ~ifp->if_hwassist;
-	if (sw_csum & CSUM_DELAY_DATA) {
+	if (m0->m_pkthdr.csum_flags & CSUM_DELAY_DATA & ~ifp->if_hwassist) {
 		in_delayed_cksum(m0);
-		sw_csum &= ~CSUM_DELAY_DATA;
+		m0->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
 	}
 #ifdef SCTP
-	if (sw_csum & CSUM_SCTP) {
+	if (m0->m_pkthdr.csum_flags & CSUM_SCTP & ~ifp->if_hwassist) {
 		sctp_delayed_cksum(m, (uint32_t)(ip->ip_hl << 2));
-		sw_csum &= ~CSUM_SCTP;
+		m0->m_pkthdr.csum_flags &= ~CSUM_SCTP;
 	}
 #endif
-	m0->m_pkthdr.csum_flags &= ifp->if_hwassist;
 
 	/*
 	 * If small enough for interface, or the interface will take
@@ -5261,8 +5259,10 @@
 	    (m0->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 ||
 	    ((ip_off & IP_DF) == 0 && (ifp->if_hwassist & CSUM_FRAGMENT))) {
 		ip->ip_sum = 0;
-		if (sw_csum & CSUM_DELAY_IP)
+		if (m0->m_pkthdr.csum_flags & CSUM_IP & ~ifp->if_hwassist) {
 			ip->ip_sum = in_cksum(m0, ip->ip_hl << 2);
+			m0->m_pkthdr.csum_flags &= ~CSUM_IP;
+		}
 		m0->m_flags &= ~(M_PROTOFLAGS);
 		error = (*ifp->if_output)(ifp, m0, sintosa(&dst), NULL);
 		goto done;
@@ -5280,7 +5280,7 @@
 			goto bad;
 	}
 
-	error = ip_fragment(ip, &m0, ifp->if_mtu, ifp->if_hwassist, sw_csum);
+	error = ip_fragment(ip, &m0, ifp->if_mtu, ifp->if_hwassist);
 	if (error)
 		goto bad;
 
Index: net/if_bridge.c
===================================================================
--- net/if_bridge.c	(revision 242127)
+++ net/if_bridge.c	(working copy)
@@ -3379,8 +3379,8 @@
 		goto out;
 	ip = mtod(m, struct ip *);
 
-	error = ip_fragment(ip, &m, ifp->if_mtu, ifp->if_hwassist,
-		    CSUM_DELAY_IP);
+	m->m_pkthdr.csum_flags |= CSUM_IP;
+	error = ip_fragment(ip, &m, ifp->if_mtu, ifp->if_hwassist);
 	if (error)
 		goto out;
 

--jCrbxBqMcLqd4mOl--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121026135354.GD70741>