From owner-freebsd-net@freebsd.org Wed Jan 13 20:23:04 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2272BA81759; Wed, 13 Jan 2016 20:23:04 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-ig0-x231.google.com (mail-ig0-x231.google.com [IPv6:2607:f8b0:4001:c05::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E4BD21619; Wed, 13 Jan 2016 20:23:03 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by mail-ig0-x231.google.com with SMTP id z14so160032336igp.1; Wed, 13 Jan 2016 12:23:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=vTf4yu8hLzQst0drsSbRfIo2LRx2XhELZkmJYlvufPs=; b=ueHgo7AM8FiO53G/64yHRRv8b5m9Mj4XG5YrdO3A1xSeTY2tR3rjE0+sOnkOtLlJt1 dVkg4mVscjKN4LCWaMyyrRSpt/CVNs52LCbyIpbBTk62NpBljGZOyOehqM3jSUADMNgu CcxqzAL0x4UyjHx4MCUkhr4DtKhWyQR51waYIW2tanSGT4MdqFDhvxCJ7fqQI9II/+rD d8wlstBXclFRUoHWKGnc3MgwNNyF8i3gon12bpPM4V3p2pFZ+C+ARxSosaw/P3T0BZE2 t1j25+S5hGl1CiIXQ9okx231b5Wdnai47QOdtb2Fvcda+kxWEQam8UGueG5HH8gmXRWD j1og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=vTf4yu8hLzQst0drsSbRfIo2LRx2XhELZkmJYlvufPs=; b=P2HRa4abUz7SAcOBJYqgBUc3WoBTUxh6HEJF+7ma+9RVYEbAdR6OWsrzhHk3y2N3SF OLlfWXcYaheX4bGR8MTBJClmNKLJ4HDb9uObLprSGN8BtjE/Pg86J97ns7RIHJ22mgSI 0G5HIWP8rKzBaqL0fDGKUwgsYH2wNUAbwuNXQmNG1OjI0fuvNDpza9hWA47VBIMz2riY z/DRLQZBzhB8HCpf/E6QuJn8WphbiAxditi0yNYjn4o201K7ivMFc0oJILkpo4hjAt6X qeFsK6Metd5OsuKhogvbiAErMYtqTk5Nb0ILBaeHzKKN/o6jmfu+6R1yj5TeghukJj3p dRww== X-Gm-Message-State: ALoCoQmVQl/sFGU5iiCR9JZTPhZqmcDBbziAMuK+0Kd7t+gQsylsu28ULluWeIyN4SFLL5QTPt+cHmU5k0dAQLaEONXqKMi0Vw== MIME-Version: 1.0 X-Received: by 10.50.122.100 with SMTP id lr4mr14271562igb.37.1452716583368; Wed, 13 Jan 2016 12:23:03 -0800 (PST) Received: by 10.36.121.202 with HTTP; Wed, 13 Jan 2016 12:23:03 -0800 (PST) In-Reply-To: <5696ABBE.4050709@gmail.com> References: <5696ABBE.4050709@gmail.com> Date: Wed, 13 Jan 2016 12:23:03 -0800 Message-ID: Subject: Re: ipfw NAT, igb and hardware checksums From: Adrian Chadd To: Karim Fodil-Lemelin Cc: freebsd-ipfw@freebsd.org, freebsd-net Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2016 20:23:04 -0000 This looks mostly sensible. hm! -a On 13 January 2016 at 11:55, Karim Fodil-Lemelin wrote: > Hi, > > I've hit a very interesting problem with ipfw-nat and local TCP traffic that > has enough TCP options to hit a special case in m_megapullup(). Here is the > story: > > I am using the following NIC: > > igb0@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x150e8086 > rev=0x01 hdr=0x00 > > And when I do ipfw nat to locally emitted packets I see packets not being > processed in the igb driver for HW checksum. Now a quick search for m_pullup > in the igb driver code will show that our igb driver expects a contiguous > ethernet + ip header in igb_tx_ctx_setup(). Now the friendly m_megapullup() > in alias.c doesn't reserve any space before the ip header for the ethernet > header after its call to m_getcl like tcp_output.c (see m->m_data += > max_linkhdr in tcp_output.c). > > So the call to M_PREPEND() in ether_output() is forced to prepend a new mbuf > for the ethernet header, leading to a non contiguous ether + ip. This in > turn leads to a failure to properly read the IP protocol in the igb driver > and apply the proper HW checksum function. Particularly this call in > igb_tcp_ctx_setup(): ip = (struct ip *)(mp->m_data + ehdrlen); > > To reproduce the issue I simply create a NAT rule for an igb interface and > initiate a TCP connection locally going out through that interface (it > should go through NAT obviously) something like: > > ipfw nat 1 config igb0 reset > ipfw add 10 nat 1 via igb0 > > Although you need to make sure you fill enough of the SYN packet to trigger > the allocation of new memory in m_megapullup. You can do this by using > enough TCP options so its filling up almost all of the 256 mbuf or make > RESERVE something like 300 bytes in alias.c. > > The fix I propose is very simple and faster for all drivers, including the > ones that do perform a check for ether + ip to be contiguous upon accessing > the IP header. If the leading space is available it doesn't allocate any > extra space (as it should for most cases) but if for some reason the mbuf > used doesn't have 100 bytes (RESERVE in megapullup) of free space it will > reserve some at the front too. If the leading space isn't necessary then it > won't cause any harm. > > > -Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c > +Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c-dirty > diff --git a/freebsd/sys/netinet/libalias/alias.c > b/freebsd/sys/netinet/libalias/alias.c > index 876e958..dc424a6 100644 > --- a/freebsd/sys/netinet/libalias/alias.c > +++ b/freebsd/sys/netinet/libalias/alias.c > @@ -1757,7 +1757,8 @@ m_megapullup(struct mbuf *m, int len) { > * writable and has some extra space for expansion. > * XXX: Constant 100bytes is completely empirical. */ > #define RESERVE 100 > - if (m->m_next == NULL && M_WRITABLE(m) && M_TRAILINGSPACE(m) >= RESERVE) > + if (m->m_next == NULL && M_WRITABLE(m) && > + M_TRAILINGSPACE(m) >= RESERVE && M_LEADINGSPACE(m) >= > max_linkhdr) > return (m); > > if (len <= MCLBYTES - RESERVE) { > @@ -1779,6 +1780,7 @@ m_megapullup(struct mbuf *m, int len) { > goto bad; > > m_move_pkthdr(mcl, m); > + mcl->m_data += max_linkhdr; > m_copydata(m, 0, len, mtod(mcl, caddr_t)); > mcl->m_len = mcl->m_pkthdr.len = len; > m_freem(m); > > It would be nice if some FBSD comitter could review and hopefully add this > patch to FBSD. > > Thank you, > > Karim. > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"