From owner-freebsd-net@freebsd.org Wed Jan 13 19:55:48 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3C7E9A81C23; Wed, 13 Jan 2016 19:55:48 +0000 (UTC) (envelope-from fodillemlinkarim@gmail.com) Received: from mail-qg0-x235.google.com (mail-qg0-x235.google.com [IPv6:2607:f8b0:400d:c04::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EEFD71822; Wed, 13 Jan 2016 19:55:47 +0000 (UTC) (envelope-from fodillemlinkarim@gmail.com) Received: by mail-qg0-x235.google.com with SMTP id b35so333313488qge.0; Wed, 13 Jan 2016 11:55:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:subject:to:message-id:date:user-agent:mime-version :content-type:content-transfer-encoding; bh=qPvXWG8esyUaJCoz31MnAjocoz76NoB0VSRaEsTG01s=; b=nZqkraSXxFoyxXuXs1z3roVSRqyLVQAE+LN3e7k6xXaw1CuDYeW2UrMo5/CfY8ERvy K5jB1C7uzLFSYey7WsAI4QtsI9c8m5++5pnwufNtopHr7CtjYUIZgMx1XNp9A0lrUv6b KRbJg1v59PQuxe2WV4PvhLzD9/pgetEgAlh2Q59dVmXo2tsQE/S84/MI73X2RH7z6lTH s7DPue4VsiyMo5Z8iBzuVlpGPOutOCMAO+HNiZO5J97ETwx0GnJPdCgWang2Nukepng1 8plirT3wg5EpPFF4NnpKliu9F49KrLGbTXE9dkw1jwEj2mCENbmpfKVZ7R34iV80hYxy tONQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:subject:to:message-id:date:user-agent :mime-version:content-type:content-transfer-encoding; bh=qPvXWG8esyUaJCoz31MnAjocoz76NoB0VSRaEsTG01s=; b=BBmWSEz7WVSlDcGE4aVQBr+zoI2U5F0oJW60ogI1yYWKUcKEVS1sItz8myFpS2MU8i +C6vBh3DP9CLbLLXwTIMnw3cZu/ATfQDcaEy3vzyQHDFtZVvn6zQ09baqh7YM+ethcNS p/6k913jXWv54jrcdHLCw0KQSPOFTRFc5fcDQo0/eXbAlsuov2GKwRA0msNffN7ECOTN BeM/mXJtq8Fu7lCYWBr0veXfoufMMGGeC10rlEpZp7B+BmrXnjkdnZnY9of1Xqz9vgHO oAQplnxERQJKYv03WqNhpS7vxj9qgxFuw/IVtSej76F9Ocyga0Bftmhoik0hZiTisv7W KnWw== X-Gm-Message-State: ALoCoQmOawuewJVKDxhXAaKcQ4zS3l4iSVQwOudqnqZatKPsQPWIL3sQoJpZhx2NsjySqEK1oizalDl0rAX9ut1uW66eCzv68A== X-Received: by 10.140.172.84 with SMTP id s81mr88848qhs.40.1452714947101; Wed, 13 Jan 2016 11:55:47 -0800 (PST) Received: from [10.10.1.47] ([192.252.130.194]) by smtp.googlemail.com with ESMTPSA id v187sm1092968qhb.27.2016.01.13.11.55.46 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 13 Jan 2016 11:55:46 -0800 (PST) From: Karim Fodil-Lemelin Subject: ipfw NAT, igb and hardware checksums To: freebsd-ipfw@freebsd.org, freebsd-net Message-ID: <5696ABBE.4050709@gmail.com> Date: Wed, 13 Jan 2016 14:55:42 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2016 19:55:48 -0000 Hi, I've hit a very interesting problem with ipfw-nat and local TCP traffic that has enough TCP options to hit a special case in m_megapullup(). Here is the story: I am using the following NIC: igb0@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x150e8086 rev=0x01 hdr=0x00 And when I do ipfw nat to locally emitted packets I see packets not being processed in the igb driver for HW checksum. Now a quick search for m_pullup in the igb driver code will show that our igb driver expects a contiguous ethernet + ip header in igb_tx_ctx_setup(). Now the friendly m_megapullup() in alias.c doesn't reserve any space before the ip header for the ethernet header after its call to m_getcl like tcp_output.c (see m->m_data += max_linkhdr in tcp_output.c). So the call to M_PREPEND() in ether_output() is forced to prepend a new mbuf for the ethernet header, leading to a non contiguous ether + ip. This in turn leads to a failure to properly read the IP protocol in the igb driver and apply the proper HW checksum function. Particularly this call in igb_tcp_ctx_setup(): ip = (struct ip *)(mp->m_data + ehdrlen); To reproduce the issue I simply create a NAT rule for an igb interface and initiate a TCP connection locally going out through that interface (it should go through NAT obviously) something like: ipfw nat 1 config igb0 reset ipfw add 10 nat 1 via igb0 Although you need to make sure you fill enough of the SYN packet to trigger the allocation of new memory in m_megapullup. You can do this by using enough TCP options so its filling up almost all of the 256 mbuf or make RESERVE something like 300 bytes in alias.c. The fix I propose is very simple and faster for all drivers, including the ones that do perform a check for ether + ip to be contiguous upon accessing the IP header. If the leading space is available it doesn't allocate any extra space (as it should for most cases) but if for some reason the mbuf used doesn't have 100 bytes (RESERVE in megapullup) of free space it will reserve some at the front too. If the leading space isn't necessary then it won't cause any harm. -Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c +Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c-dirty diff --git a/freebsd/sys/netinet/libalias/alias.c b/freebsd/sys/netinet/libalias/alias.c index 876e958..dc424a6 100644 --- a/freebsd/sys/netinet/libalias/alias.c +++ b/freebsd/sys/netinet/libalias/alias.c @@ -1757,7 +1757,8 @@ m_megapullup(struct mbuf *m, int len) { * writable and has some extra space for expansion. * XXX: Constant 100bytes is completely empirical. */ #define RESERVE 100 - if (m->m_next == NULL && M_WRITABLE(m) && M_TRAILINGSPACE(m) >= RESERVE) + if (m->m_next == NULL && M_WRITABLE(m) && + M_TRAILINGSPACE(m) >= RESERVE && M_LEADINGSPACE(m) >= max_linkhdr) return (m); if (len <= MCLBYTES - RESERVE) { @@ -1779,6 +1780,7 @@ m_megapullup(struct mbuf *m, int len) { goto bad; m_move_pkthdr(mcl, m); + mcl->m_data += max_linkhdr; m_copydata(m, 0, len, mtod(mcl, caddr_t)); mcl->m_len = mcl->m_pkthdr.len = len; m_freem(m); It would be nice if some FBSD comitter could review and hopefully add this patch to FBSD. Thank you, Karim.