From owner-svn-src-stable-10@freebsd.org Fri Oct 13 02:26:41 2017 Return-Path: Delivered-To: svn-src-stable-10@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ACC87E3AB69; Fri, 13 Oct 2017 02:26:41 +0000 (UTC) (envelope-from sephe@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 679F7704DA; Fri, 13 Oct 2017 02:26:41 +0000 (UTC) (envelope-from sephe@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id v9D2QejK028441; Fri, 13 Oct 2017 02:26:40 GMT (envelope-from sephe@FreeBSD.org) Received: (from sephe@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id v9D2QeEH028436; Fri, 13 Oct 2017 02:26:40 GMT (envelope-from sephe@FreeBSD.org) Message-Id: <201710130226.v9D2QeEH028436@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: sephe set sender to sephe@FreeBSD.org using -f From: Sepherosa Ziehau Date: Fri, 13 Oct 2017 02:26:40 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r324574 - stable/10/sys/dev/hyperv/netvsc X-SVN-Group: stable-10 X-SVN-Commit-Author: sephe X-SVN-Commit-Paths: stable/10/sys/dev/hyperv/netvsc X-SVN-Commit-Revision: 324574 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable-10@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for only the 10-stable src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Oct 2017 02:26:41 -0000 Author: sephe Date: Fri Oct 13 02:26:39 2017 New Revision: 324574 URL: https://svnweb.freebsd.org/changeset/base/324574 Log: MFC 324489,324516 324489 hyperv/hn: Workaround erroneous hash type observed on WS2016. Background: - UDP 4-tuple hash type is unconditionally enabled in Hyper-V on WS2016, which is _not_ affected by NDIS_OBJTYPE_RSS_PARAMS. - Non-fragment UDP/IPv4 datagrams' hash type is delivered to VM as TCP_IPV4. Currently this erroneous behavior only applies to WS2016/Windows10. Force l3/l4 protocol check, if the RXed packet's hash type is TCP_IPV4, and the Hyper-V is running on WS2016/Windows10. If the RXed packet is UDP datagram, adjust mbuf hash type to UDP_IPV4. Sponsored by: Microsoft 324516 hyperv/hn: Workaround erroneous hash type observed on WS2016 for VF. The background was described in r324489. Sponsored by: Microsoft Modified: stable/10/sys/dev/hyperv/netvsc/hn_nvs.c stable/10/sys/dev/hyperv/netvsc/hn_rndis.c stable/10/sys/dev/hyperv/netvsc/if_hn.c stable/10/sys/dev/hyperv/netvsc/if_hnvar.h stable/10/sys/dev/hyperv/netvsc/ndis.h Directory Properties: stable/10/ (props changed) Modified: stable/10/sys/dev/hyperv/netvsc/hn_nvs.c ============================================================================== --- stable/10/sys/dev/hyperv/netvsc/hn_nvs.c Fri Oct 13 02:16:35 2017 (r324573) +++ stable/10/sys/dev/hyperv/netvsc/hn_nvs.c Fri Oct 13 02:26:39 2017 (r324574) @@ -601,6 +601,11 @@ hn_nvs_attach(struct hn_softc *sc, int mtu) { int error; + if (hyperv_ver_major >= 10) { + /* UDP 4-tuple hash is enforced. */ + sc->hn_caps |= HN_CAP_UDPHASH; + } + /* * Initialize NVS. */ Modified: stable/10/sys/dev/hyperv/netvsc/hn_rndis.c ============================================================================== --- stable/10/sys/dev/hyperv/netvsc/hn_rndis.c Fri Oct 13 02:16:35 2017 (r324573) +++ stable/10/sys/dev/hyperv/netvsc/hn_rndis.c Fri Oct 13 02:26:39 2017 (r324574) @@ -522,6 +522,10 @@ hn_rndis_query_rsscaps(struct hn_softc *sc, int *rxr_c /* Commit! */ sc->hn_rss_ind_size = indsz; sc->hn_rss_hcap = hash_func | hash_types; + if (sc->hn_caps & HN_CAP_UDPHASH) { + /* UDP 4-tuple hash is unconditionally enabled. */ + sc->hn_rss_hcap |= NDIS_HASH_UDP_IPV4_X; + } *rxr_cnt0 = rxr_cnt; return (0); } @@ -761,8 +765,10 @@ hn_rndis_conf_rss(struct hn_softc *sc, uint16_t flags) ("NDIS 6.20+ is required, NDIS version 0x%08x", sc->hn_ndis_ver)); /* XXX only one can be specified through, popcnt? */ - KASSERT((sc->hn_rss_hash & NDIS_HASH_FUNCTION_MASK), ("no hash func")); - KASSERT((sc->hn_rss_hash & NDIS_HASH_TYPE_MASK), ("no hash types")); + KASSERT((sc->hn_rss_hash & NDIS_HASH_FUNCTION_MASK), + ("no hash func %08x", sc->hn_rss_hash)); + KASSERT((sc->hn_rss_hash & NDIS_HASH_STD), + ("no standard hash types %08x", sc->hn_rss_hash)); KASSERT(sc->hn_rss_ind_size > 0, ("no indirect table size")); if (bootverbose) { @@ -781,7 +787,8 @@ hn_rndis_conf_rss(struct hn_softc *sc, uint16_t flags) prm->ndis_hdr.ndis_rev = NDIS_RSS_PARAMS_REV_2; prm->ndis_hdr.ndis_size = rss_size; prm->ndis_flags = flags; - prm->ndis_hash = sc->hn_rss_hash; + prm->ndis_hash = sc->hn_rss_hash & + (NDIS_HASH_FUNCTION_MASK | NDIS_HASH_STD); prm->ndis_indsize = sizeof(rss->rss_ind[0]) * sc->hn_rss_ind_size; prm->ndis_indoffset = __offsetof(struct ndis_rssprm_toeplitz, rss_ind[0]); Modified: stable/10/sys/dev/hyperv/netvsc/if_hn.c ============================================================================== --- stable/10/sys/dev/hyperv/netvsc/if_hn.c Fri Oct 13 02:16:35 2017 (r324573) +++ stable/10/sys/dev/hyperv/netvsc/if_hn.c Fri Oct 13 02:26:39 2017 (r324574) @@ -119,6 +119,11 @@ __FBSDID("$FreeBSD$"); #define HN_IFSTART_SUPPORT +/* NOTE: M_HASHTYPE_RSS_UDP_IPV4 is not available on stable/10. */ +#ifndef M_HASHTYPE_RSS_UDP_IPV4 +#define M_HASHTYPE_RSS_UDP_IPV4 M_HASHTYPE_OPAQUE +#endif + #define HN_RING_CNT_DEF_MAX 8 #define HN_VFMAP_SIZE_DEF 8 @@ -378,6 +383,7 @@ static void hn_link_status(struct hn_softc *); static int hn_create_rx_data(struct hn_softc *, int); static void hn_destroy_rx_data(struct hn_softc *); static int hn_check_iplen(const struct mbuf *, int); +static void hn_rxpkt_proto(const struct mbuf *, int *, int *); static int hn_set_rxfilter(struct hn_softc *, uint32_t); static int hn_rxfilter_config(struct hn_softc *); static int hn_rss_reconfig(struct hn_softc *); @@ -392,6 +398,7 @@ static int hn_tx_ring_create(struct hn_softc *, int) static void hn_tx_ring_destroy(struct hn_tx_ring *); static int hn_create_tx_data(struct hn_softc *, int); static void hn_fixup_tx_data(struct hn_softc *); +static void hn_fixup_rx_data(struct hn_softc *); static void hn_destroy_tx_data(struct hn_softc *); static void hn_txdesc_dmamap_destroy(struct hn_txdesc *); static void hn_txdesc_gc(struct hn_tx_ring *, @@ -1413,6 +1420,8 @@ hn_rss_type_fromndis(uint32_t rss_hash) types |= RSS_TYPE_TCP_IPV6; if (rss_hash & NDIS_HASH_TCP_IPV6_EX) types |= RSS_TYPE_TCP_IPV6_EX; + if (rss_hash & NDIS_HASH_UDP_IPV4_X) + types |= RSS_TYPE_UDP_IPV4; return (types); } @@ -1421,9 +1430,8 @@ hn_rss_type_tondis(uint32_t types) { uint32_t rss_hash = 0; - KASSERT((types & - (RSS_TYPE_UDP_IPV4 | RSS_TYPE_UDP_IPV6 | RSS_TYPE_UDP_IPV6_EX)) == 0, - ("UDP4, UDP6 and UDP6EX are not supported")); + KASSERT((types & (RSS_TYPE_UDP_IPV6 | RSS_TYPE_UDP_IPV6_EX)) == 0, + ("UDP6 and UDP6EX are not supported")); if (types & RSS_TYPE_IPV4) rss_hash |= NDIS_HASH_IPV4; @@ -1437,6 +1445,8 @@ hn_rss_type_tondis(uint32_t types) rss_hash |= NDIS_HASH_TCP_IPV6; if (types & RSS_TYPE_TCP_IPV6_EX) rss_hash |= NDIS_HASH_TCP_IPV6_EX; + if (types & RSS_TYPE_UDP_IPV4) + rss_hash |= NDIS_HASH_UDP_IPV4_X; return (rss_hash); } @@ -1535,6 +1545,13 @@ hn_vf_rss_fixup(struct hn_softc *sc, bool reconf) * NOTE: * We don't disable the hash type, but stop delivery the hash * value/type through mbufs on RX path. + * + * XXX If HN_CAP_UDPHASH is set in hn_caps, then UDP 4-tuple + * hash is delivered with type of TCP_IPV4. This means if + * UDP_IPV4 is enabled, then TCP_IPV4 should be forced, at + * least to hn_mbuf_hash. However, given that _all_ of the + * NICs implement TCP_IPV4, this will _not_ impose any issues + * here. */ if ((my_types & RSS_TYPE_IPV4) && (diff_types & ifrh.ifrh_types & @@ -2225,9 +2242,10 @@ hn_attach(device_t dev) #endif /* - * Fixup TX stuffs after synthetic parts are attached. + * Fixup TX/RX stuffs after synthetic parts are attached. */ hn_fixup_tx_data(sc); + hn_fixup_rx_data(sc); ctx = device_get_sysctl_ctx(dev); child = SYSCTL_CHILDREN(device_get_sysctl_tree(dev)); @@ -3371,6 +3389,7 @@ hn_rxpkt(struct hn_rx_ring *rxr, const void *data, int struct mbuf *m_new; int size, do_lro = 0, do_csum = 1, is_vf = 0; int hash_type = M_HASHTYPE_NONE; + int l3proto = ETHERTYPE_MAX, l4proto = IPPROTO_DONE; ifp = hn_ifp; if (rxr->hn_rxvf_ifp != NULL) { @@ -3470,31 +3489,9 @@ hn_rxpkt(struct hn_rx_ring *rxr, const void *data, int (NDIS_RXCSUM_INFO_TCPCS_OK | NDIS_RXCSUM_INFO_IPCS_OK)) do_lro = 1; } else { - const struct ether_header *eh; - uint16_t etype; - int hoff; - - hoff = sizeof(*eh); - /* Checked at the beginning of this function. */ - KASSERT(m_new->m_len >= hoff, ("not ethernet frame")); - - eh = mtod(m_new, struct ether_header *); - etype = ntohs(eh->ether_type); - if (etype == ETHERTYPE_VLAN) { - const struct ether_vlan_header *evl; - - hoff = sizeof(*evl); - if (m_new->m_len < hoff) - goto skip; - evl = mtod(m_new, struct ether_vlan_header *); - etype = ntohs(evl->evl_proto); - } - - if (etype == ETHERTYPE_IP) { - int pr; - - pr = hn_check_iplen(m_new, hoff); - if (pr == IPPROTO_TCP) { + hn_rxpkt_proto(m_new, &l3proto, &l4proto); + if (l3proto == ETHERTYPE_IP) { + if (l4proto == IPPROTO_TCP) { if (do_csum && (rxr->hn_trust_hcsum & HN_TRUST_HCSUM_TCP)) { @@ -3505,7 +3502,7 @@ hn_rxpkt(struct hn_rx_ring *rxr, const void *data, int m_new->m_pkthdr.csum_data = 0xffff; } do_lro = 1; - } else if (pr == IPPROTO_UDP) { + } else if (l4proto == IPPROTO_UDP) { if (do_csum && (rxr->hn_trust_hcsum & HN_TRUST_HCSUM_UDP)) { @@ -3515,7 +3512,7 @@ hn_rxpkt(struct hn_rx_ring *rxr, const void *data, int CSUM_DATA_VALID | CSUM_PSEUDO_HDR); m_new->m_pkthdr.csum_data = 0xffff; } - } else if (pr != IPPROTO_DONE && do_csum && + } else if (l4proto != IPPROTO_DONE && do_csum && (rxr->hn_trust_hcsum & HN_TRUST_HCSUM_IP)) { rxr->hn_csum_trusted++; m_new->m_pkthdr.csum_flags |= @@ -3523,7 +3520,7 @@ hn_rxpkt(struct hn_rx_ring *rxr, const void *data, int } } } -skip: + if (info->vlan_info != HN_NDIS_VLAN_INFO_INVALID) { m_new->m_pkthdr.ether_vtag = EVL_MAKETAG( NDIS_VLAN_INFO_ID(info->vlan_info), @@ -3578,6 +3575,37 @@ skip: case NDIS_HASH_TCP_IPV4: hash_type = M_HASHTYPE_RSS_TCP_IPV4; + if (rxr->hn_rx_flags & HN_RX_FLAG_UDP_HASH) { + int def_htype = M_HASHTYPE_OPAQUE; + + if (is_vf) + def_htype = M_HASHTYPE_NONE; + + /* + * UDP 4-tuple hash is delivered as + * TCP 4-tuple hash. + */ + if (l3proto == ETHERTYPE_MAX) { + hn_rxpkt_proto(m_new, + &l3proto, &l4proto); + } + if (l3proto == ETHERTYPE_IP) { + if (l4proto == IPPROTO_UDP && + (rxr->hn_mbuf_hash & + NDIS_HASH_UDP_IPV4_X)) { + hash_type = + M_HASHTYPE_RSS_UDP_IPV4; + do_lro = 0; + } else if (l4proto != + IPPROTO_TCP) { + hash_type = def_htype; + do_lro = 0; + } + } else { + hash_type = def_htype; + do_lro = 0; + } + } break; case NDIS_HASH_IPV6: @@ -4823,6 +4851,36 @@ hn_check_iplen(const struct mbuf *m, int hoff) return ip->ip_p; } +static void +hn_rxpkt_proto(const struct mbuf *m_new, int *l3proto, int *l4proto) +{ + const struct ether_header *eh; + uint16_t etype; + int hoff; + + hoff = sizeof(*eh); + /* Checked at the beginning of this function. */ + KASSERT(m_new->m_len >= hoff, ("not ethernet frame")); + + eh = mtod(m_new, const struct ether_header *); + etype = ntohs(eh->ether_type); + if (etype == ETHERTYPE_VLAN) { + const struct ether_vlan_header *evl; + + hoff = sizeof(*evl); + if (m_new->m_len < hoff) + return; + evl = mtod(m_new, const struct ether_vlan_header *); + etype = ntohs(evl->evl_proto); + } + *l3proto = etype; + + if (etype == ETHERTYPE_IP) + *l4proto = hn_check_iplen(m_new, hoff); + else + *l4proto = IPPROTO_DONE; +} + static int hn_create_rx_data(struct hn_softc *sc, int ring_cnt) { @@ -5531,6 +5589,18 @@ hn_fixup_tx_data(struct hn_softc *sc) if_printf(sc->hn_ifp, "support HASHVAL pktinfo\n"); for (i = 0; i < sc->hn_tx_ring_cnt; ++i) sc->hn_tx_ring[i].hn_tx_flags |= HN_TX_FLAG_HASHVAL; + } +} + +static void +hn_fixup_rx_data(struct hn_softc *sc) +{ + + if (sc->hn_caps & HN_CAP_UDPHASH) { + int i; + + for (i = 0; i < sc->hn_rx_ring_cnt; ++i) + sc->hn_rx_ring[i].hn_rx_flags |= HN_RX_FLAG_UDP_HASH; } } Modified: stable/10/sys/dev/hyperv/netvsc/if_hnvar.h ============================================================================== --- stable/10/sys/dev/hyperv/netvsc/if_hnvar.h Fri Oct 13 02:16:35 2017 (r324573) +++ stable/10/sys/dev/hyperv/netvsc/if_hnvar.h Fri Oct 13 02:26:39 2017 (r324574) @@ -97,6 +97,7 @@ struct hn_rx_ring { #define HN_RX_FLAG_ATTACHED 0x0001 #define HN_RX_FLAG_BR_REF 0x0002 #define HN_RX_FLAG_XPNT_VF 0x0004 +#define HN_RX_FLAG_UDP_HASH 0x0008 struct hn_tx_ring { #ifndef HN_USE_TXDESC_BUFRING @@ -305,11 +306,12 @@ do { \ #define HN_CAP_TSO4 0x0080 #define HN_CAP_TSO6 0x0100 #define HN_CAP_HASHVAL 0x0200 +#define HN_CAP_UDPHASH 0x0400 /* Capability description for use with printf(9) %b identifier. */ #define HN_CAP_BITS \ "\020\1VLAN\2MTU\3IPCS\4TCP4CS\5TCP6CS" \ - "\6UDP4CS\7UDP6CS\10TSO4\11TSO6\12HASHVAL" + "\6UDP4CS\7UDP6CS\10TSO4\11TSO6\12HASHVAL\13UDPHASH" #define HN_LINK_FLAG_LINKUP 0x0001 #define HN_LINK_FLAG_NETCHG 0x0002 Modified: stable/10/sys/dev/hyperv/netvsc/ndis.h ============================================================================== --- stable/10/sys/dev/hyperv/netvsc/ndis.h Fri Oct 13 02:16:35 2017 (r324573) +++ stable/10/sys/dev/hyperv/netvsc/ndis.h Fri Oct 13 02:26:39 2017 (r324574) @@ -56,17 +56,26 @@ #define NDIS_HASH_IPV6_EX 0x00000800 #define NDIS_HASH_TCP_IPV6 0x00001000 #define NDIS_HASH_TCP_IPV6_EX 0x00002000 +#define NDIS_HASH_UDP_IPV4_X 0x00004000 /* XXX non-standard */ #define NDIS_HASH_ALL (NDIS_HASH_IPV4 | \ NDIS_HASH_TCP_IPV4 | \ NDIS_HASH_IPV6 | \ NDIS_HASH_IPV6_EX | \ NDIS_HASH_TCP_IPV6 | \ + NDIS_HASH_TCP_IPV6_EX |\ + NDIS_HASH_UDP_IPV4_X) + +#define NDIS_HASH_STD (NDIS_HASH_IPV4 | \ + NDIS_HASH_TCP_IPV4 | \ + NDIS_HASH_IPV6 | \ + NDIS_HASH_IPV6_EX | \ + NDIS_HASH_TCP_IPV6 | \ NDIS_HASH_TCP_IPV6_EX) /* Hash description for use with printf(9) %b identifier. */ #define NDIS_HASH_BITS \ - "\20\1TOEPLITZ\11IP4\12TCP4\13IP6\14IP6EX\15TCP6\16TCP6EX" + "\20\1TOEPLITZ\11IP4\12TCP4\13IP6\14IP6EX\15TCP6\16TCP6EX\17UDP4_X" #define NDIS_HASH_KEYSIZE_TOEPLITZ 40 #define NDIS_HASH_INDCNT 128