Date: Thu, 16 Nov 2006 12:01:09 -0800 From: Garrett Cooper <youshi10@u.washington.edu> To: freebsd-hackers@freebsd.org Subject: Re: ipv6 connection hash function wanted ... Message-ID: <455CC385.10309@u.washington.edu> In-Reply-To: <y7v64dfixin.wl%jinmei@isl.rdc.toshiba.co.jp> References: <200611141709.26644.max@love2party.net> <20061114190930.GA23096@walton.maths.tcd.ie> <200611142020.53178.max@love2party.net> <y7v64dfixin.wl%jinmei@isl.rdc.toshiba.co.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
JINMEI Tatuya / ???? wrote: >>>>>> On Tue, 14 Nov 2006 20:20:47 +0100, >>>>>> Max Laier <max@love2party.net> said: >>>>>> > > >>>> Any ideas? Any papers that deal with this problem? >>>> >>> Assuming you don't want to use one of the standard cryptographic >>> ones (which I can imagine being a bit slow for something done >>> per-packet), then one option might be to use a simpler hash that >>> is keyed. Choose the key at boot/module load time and make it hard >>> to produce collisions unless you know the key. >>> > > >> That's exactly what I am looking for ... now I need someone[tm] - with >> better Math-Knowledge than mine - to write such a thing down in a simple >> formula :-) i.e. take those bits from there and there and XOR them with >> your canary yada-yada-yada ... >> > > If you want something whose behavior is mathematically guaranteed, I'd > recommend universal hashing as already suggested in this thread. > > One example implementation is given below, assuming the hash key is > source and destination IPv6 addresses and transport layer ports(*). > As shown in the implementation, it's pretty simple and should run > reasonably fast. Also, as long as the random values are reasonably > unpredictable, the expected probability of producing the same hash > value for arbitrarily-chosen two different keys is 1/65537. In > general, for any prime number p as the value of LARGE_PRIME, this > probability is 1/p (so if 65537 is too large, we can use a smaller > number, e.g., 1009, depending on the desired balance between collision > risk and memory footprint for hash buckets). > > (*)Technically, using in6_addr to represent an IPv6 address may not be > enough; we may want to take into account zone indices (sin6_scope_id) > as part of hash keys. > > JINMEI, Tatuya > Communication Platform Lab. > Corporate R&D Center, Toshiba Corp. > jinmei@isl.rdc.toshiba.co.jp > > #define HASHKEYLEN 38 /* sizeof(in6_addr) * 2 + sizeof(in_port_t) * 2 */ > #define LARGE_PRIME 65537 > > /* > * This should be filled with unpredictable random values between 0 > * and LARGE_PRIME-1 at startup time. > */ > u_int32_t random_parameters[HASHKEYLEN]; > > u_int32_t > hash(struct in6_addr *saddr, struct in6_addr *daddr, > in_port_t sport, in_port_t dport) > { > int i, j = 0; > u_int32_t accumulated = 0; > u_char *cp; > > for (i = 0; i < sizeof(*saddr); i++) > accumulated += saddr->s6_addr[i] * random_parameters[j++]; > for (i = 0; i < sizeof(*saddr); i++) > accumulated += daddr->s6_addr[i] * random_parameters[j++]; > cp = (u_char *)&sport; > for (i = 0; i < sizeof(sport); i++) > accumulated += cp[i] * random_parameters[j++]; > cp = (u_char *)&dport; > for (i = 0; i < sizeof(dport); i++) > accumulated += cp[i] * random_parameters[j++]; > > return (accumulated % LARGE_PRIME); > } Jinmei, Wouldn't you get some overflow if (pardon my set notation) "INT_MAX < {saddr,daddr}->s6_addr[i] * random_parameters[j++]", which would implicitly introduce collision into your algorithm. Or is the overall set size sufficiently large not to worry about this particular issue? -Garrett
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?455CC385.10309>