From owner-freebsd-net@FreeBSD.ORG Thu Aug 29 21:51:44 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9D964230 for ; Thu, 29 Aug 2013 21:51:44 +0000 (UTC) (envelope-from btv1==9536076923e==tgubatayao@barracuda.com) Received: from bsf03.barracuda.com (bsf03.barracuda.com [64.235.145.83]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1AC782CB6 for ; Thu, 29 Aug 2013 21:51:44 +0000 (UTC) X-ASG-Debug-ID: 1377813102-05b9635b395de7c0001-oFaieN Received: from BN-SCL-FE02.Cudanet.local (bn-scl-fe02.cudanet.local [10.8.96.69]) by bsf03.barracuda.com with ESMTP id IGzyeeDVpxjoVdSZ (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Thu, 29 Aug 2013 14:51:42 -0700 (PDT) X-Barracuda-Envelope-From: tgubatayao@barracuda.com Received: from BN-SCL-FE04.Cudanet.local (10.8.96.204) by BN-SCL-FE02.Cudanet.local (10.8.96.69) with Microsoft SMTP Server (TLS) id 8.3.298.1; Thu, 29 Aug 2013 14:51:42 -0700 Received: from BN-SCL-MBX03.Cudanet.local ([fe80::e5b6:9fef:a4d2:a5ba]) by BN-SCL-FE04.Cudanet.local ([fe80::7443:fe71:7539:9156%10]) with mapi; Thu, 29 Aug 2013 14:51:42 -0700 From: "T.C. Gubatayao" To: Alan Somers Date: Thu, 29 Aug 2013 14:51:41 -0700 Subject: Re: Flow ID, LACP, and igb Thread-Topic: Flow ID, LACP, and igb X-ASG-Orig-Subj: Re: Flow ID, LACP, and igb Thread-Index: Ac6lAfIUZwEezkr9Tj603bLm0m2f8g== Message-ID: <49170157-EFC7-44A3-B881-12B4F2644F59@barracuda.com> References: <521BBD21.4070304@freebsd.org> <521EE8DA.3060107@freebsd.org> <0771FC4F-BCDD-4985-A33F-09951806AD99@barracuda.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-universally-unique-identifier: 2f974b20-2ce7-4994-a8de-d34babbf2057 x-apple-mail-remote-attachments: YES x-apple-base-url: x-msg://4455/ x-apple-windows-friendly: 1 x-apple-mail-signature: x-uniform-type-identifier: com.apple.mail-draft acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: bn-scl-fe02.cudanet.local[10.8.96.69] X-Barracuda-Start-Time: 1377813102 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.8.98.66:8000/cgi-mod/mark.cgi X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at barracuda.com X-Barracuda-Spam-Score: 0.62 X-Barracuda-Spam-Status: No, SCORE=0.62 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=COMMA_SUBJECT, THREAD_INDEX, THREAD_TOPIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.139776 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 THREAD_INDEX thread-index: AcO7Y8iR61tzADqsRmmc5wNiFHEOig== 0.01 THREAD_TOPIC Thread-Topic: ...(Japanese Subject)... 0.60 COMMA_SUBJECT Subject is like 'Re: FDSDS, this is a subject' Cc: Jack F Vogel , "Justin T. Gibbs" , Andre Oppermann , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 21:51:44 -0000 On Aug 29, 2013, at 5:40 PM, T.C. Gubatayao wrot= e: > On Aug 29, 2013, at 4:21 PM, Alan Somers wrote: > >> They're faster, but even with this change, jenkins_hash is still 6 times >> slower than FNV hash. > > Actually, I think your test isn't accurately simulating memory access, wh= ich > might be skewing the results. > > For example, from net/if_lagg.c: > > p =3D hash32_buf(&eh->ether_shost, ETHER_ADDR_LEN, p); > p =3D hash32_buf(&eh->ether_dhost, ETHER_ADDR_LEN, p); > > These two calls can't both be aligned, since ETHER_ADDR_LEN is 6 octets. = The > same is true for the other hashed fields in the IP and TCP/UDP headers. > Assuming the mbuf data pointer is aligned, the IP addresses and ports are= both > on 2-byte alignments (without VLAN or IP options). In your test, they're= all > aligned and in the same cache line. > > When I modify the test to simulate an mbuf, lookup3 beats FNV and hash32,= and > SipHash is only 2-3 times slower. > >> Also, your technique of copying the hashable fields into a separate buff= er >> would need modification to work with different types of packet and diffe= rent >> LAGG_F_HASH[234] flags. Because different packets have different hashab= le >> fields, struct key would need to be expanded to include the vlan tag, IP= V6 >> addresses, and IPv6 flowid. lagg_hashmbuf would then have to zero the u= nused >> fields. > > Agreed, but this is relatively simple with a buffer on the stack, and doe= s not > require zeroes or padding. See my modified test, attached. > > T.C. Attachment was stripped. --- a/lagg_hash.c 2013-08-29 14:21:17.255307349 -0400 +++ b/lagg_hash.c 2013-08-29 17:26:14.055404918 -0400 @@ -7,35 +7,63 @@ #include #include #include - -uint32_t jenkins_hash32(const uint32_t *, size_t, uint32_t); +#include +#include +#include +#include =20 #define ITERATIONS 100000000 =20 -typedef uint32_t do_hash_t(void); +typedef uint32_t do_hash_t(uint32_t); + +/* + * Simulate mbuf data for a packet. + * No VLAN tagging and no IP options. + */ +struct _mbuf { + struct ether_header eh; + struct ip ip; + struct tcphdr th; +} __attribute__((packed)) m =3D { + { + .ether_dhost =3D { 181, 16, 73, 9, 219, 22 }, + .ether_shost =3D { 69, 170, 210, 11, 24, 120 }, + .ether_type =3D 0x008 + }, + { + .ip_src.s_addr =3D 1329258245, + .ip_dst.s_addr =3D 1319097119, + .ip_p =3D 0x06 + }, + { + .th_sport =3D 12506, + .th_dport =3D 47804 + } +}; =20 -// Pad the MACs with 0s because jenkins_hash operates on 32-bit inputs -const uint8_t ether_shost[] =3D {181, 16, 73, 9, 219, 22, 0, 0}; -const uint8_t ether_dhost[] =3D {69, 170, 210, 111, 24, 120, 0, 0}; -const struct in_addr ip_src =3D {.s_addr =3D 1329258245}; -const struct in_addr ip_dst =3D {.s_addr =3D 1319097119}; -const uint32_t ports =3D 3132895450; const uint8_t sipkey[16] =3D {7, 239, 255, 43, 68, 53, 56, 225, 98, 81, 177, 80, 92, 235, 242, 39}; =20 +#define LAGG_F_HASHL2 0x1 +#define LAGG_F_HASHL3 0x2 +#define LAGG_F_HASHL4 0x4 +#define LAGG_F_HASHALL (LAGG_F_HASHL2|LAGG_F_HASHL3|LAGG_F_HASHL4) + /* * Simulate how lagg_hashmbuf uses FNV hash for a TCP/IP packet * No VLAN tagging */ -uint32_t do_fnv(void) +uint32_t do_fnv(uint32_t flags) { uint32_t p =3D FNV1_32_INIT; =20 - p =3D fnv_32_buf(ether_shost, 6, p); - p =3D fnv_32_buf(ether_dhost, 6, p); - p =3D fnv_32_buf(&ip_src, sizeof(struct in_addr), p); - p =3D fnv_32_buf(&ip_dst, sizeof(struct in_addr), p); - p =3D fnv_32_buf(&ports, sizeof(ports), p); + if (flags & LAGG_F_HASHL2) + p =3D fnv_32_buf(&m.eh.ether_dhost, 12, p); + if (flags & LAGG_F_HASHL3) + p =3D fnv_32_buf(&m.ip.ip_src, 8, p); + if (flags & LAGG_F_HASHL4) + p =3D fnv_32_buf(&m.th.th_sport, 4, p); + return (p); } =20 @@ -43,59 +71,74 @@ * Simulate how lagg_hashmbuf uses hash32 for a TCP/IP packet * No VLAN tagging */ -uint32_t do_hash32(void) +uint32_t do_hash32(uint32_t flags) { // Actually, if_lagg used a pseudorandom number determined at inter= face // creation time. But this should have the same timing // characteristics. uint32_t p =3D HASHINIT; =20 - p =3D hash32_buf(ether_shost, 6, p); - p =3D hash32_buf(ether_dhost, 6, p); - p =3D hash32_buf(&ip_src, sizeof(struct in_addr), p); - p =3D hash32_buf(&ip_dst, sizeof(struct in_addr), p); - p =3D hash32_buf(&ports, sizeof(ports), p); + if (flags & LAGG_F_HASHL2) + p =3D hash32_buf(&m.eh.ether_dhost, 12, p); + if (flags & LAGG_F_HASHL3) + p =3D hash32_buf(&m.ip.ip_src, 8, p); + if (flags & LAGG_F_HASHL4) + p =3D hash32_buf(&m.th.th_sport, 4, p); + return (p); } =20 +/* Simulate copying the info out of the mbuf. */ +static __inline size_t init_key(char *key, uint32_t flags) +{ + uint16_t etype; + size_t len =3D 0; + + if (flags & LAGG_F_HASHL2) { + memcpy(key + len, &m.eh.ether_dhost, 12); + len +=3D 12; + } + + if (flags & LAGG_F_HASHL3) { + memcpy(key + len, &m.ip.ip_src, 8); + len +=3D 8; + } + + if (flags & LAGG_F_HASHL4) { + memcpy(key + len, &m.th.th_sport, 4); + len +=3D 4; + } + + return (len); +} + /* * Simulate how lagg_hashmbuf would use siphash24 for a TCP/IP packet * No VLAN tagging */ -uint32_t do_siphash24(void) +uint32_t do_siphash24(uint32_t flags) { SIPHASH_CTX ctx; + char key[26]; + size_t len; =20 - SipHash24_Init(&ctx); - SipHash_SetKey(&ctx, sipkey); + len =3D init_key(key, flags); =20 - SipHash_Update(&ctx, ether_shost, 6); - SipHash_Update(&ctx, ether_dhost, 6); - SipHash_Update(&ctx, &ip_src, sizeof(struct in_addr)); - SipHash_Update(&ctx, &ip_dst, sizeof(struct in_addr)); - SipHash_Update(&ctx, &ports, sizeof(ports)); - return (SipHash_End(&ctx) & 0xFFFFFFFF); + return (SipHash24(&ctx, sipkey, key, len) & 0xFFFFFFFF); } =20 /* * Simulate how lagg_hashmbuf would use lookup3 aka jenkins_hash * No VLAN tagging */ -uint32_t do_jenkins(void) +uint32_t do_jenkins(uint32_t flags) { - /* Jenkins hash does not recommend any specific initializer */ - uint32_t p =3D FNV1_32_INIT; + char key[26]; + size_t len; =20 - /*=20 - * jenkins_hash uses 32-bit inputs, so we need to present the MACs = as - * arrays of 2 32-bit values - */ - p =3D jenkins_hash32((uint32_t*)ether_shost, 2, p); - p =3D jenkins_hash32((uint32_t*)ether_dhost, 2, p); - p =3D jenkins_hash32((uint32_t*)&ip_src, sizeof(struct in_addr) / 4= , p); - p =3D jenkins_hash32((uint32_t*)&ip_dst, sizeof(struct in_addr) / 4= , p); - p =3D jenkins_hash32(&ports, sizeof(ports) / 4, p); - return (p); + len =3D init_key(key, flags); + + return (jenkins_hash(key, len, FNV1_32_INIT)); } =20 =20 @@ -120,7 +163,7 @@ =20 gettimeofday(&tv_old, NULL); for (j=3D0; j