Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 07 Dec 2012 16:06:04 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        Alfred Perlstein <bright@mu.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Alfred Perlstein <alfred@FreeBSD.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r243594 - head/sys/netinet
Message-ID:  <50C205DC.1040701@freebsd.org>
In-Reply-To: <50B57F46.1060207@mu.org>
References:  <201211270304.qAR34PfD056691@svn.freebsd.org> <50B53ABC.1010304@freebsd.org> <50B57F46.1060207@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28.11.2012 04:04, Alfred Perlstein wrote:
> On 11/27/12 2:12 PM, Andre Oppermann wrote:
>> On 27.11.2012 04:04, Alfred Perlstein wrote:
>>> Author: alfred
>>> Date: Tue Nov 27 03:04:24 2012
>>> New Revision: 243594
>>> URL: http://svnweb.freebsd.org/changeset/base/243594
>>>
>>> Log:
>>>    Auto size the tcbhashsize structure based on max sockets.
>>>
>>>    While here, also make the code that enforces power-of-two more
>>>    forgiving, instead of just resetting to 512, graciously round-down
>>>    to the next lower power of two.
>>
>> The porthash (also a parameter to in_pcbinfo_init()) should not be
>> scaled the same as tcbhashsize.  It's unlikely that anyone is going
>> to have more than a thousand listen ports.
>>
>> The default tcbhashsize scaling factor of maxsockets / 4 is IMHO
>> excessive.  If someone is running at the limit he's going to up
>> maxsockets to double the target load.  Dividing by 8 is a better
>> balance for normal and well used large memory machines.
>> The hashsize calculation logic is done a bit in a roundabout way.
>>
>> What do think of the attached patch to fix the porthash and to simplify
>> the hashsize logic?
>>
> I like the porthash fix.

After tracking down how the porthash works in in_pcb.c it became
apparent that every connection ends up there too.  So in the end
the port hash should be the same size as the main hash for now.
Having every connection on the port hash doesn't make a lot of
sense, at least for TCP.  Only ports from listen sockets should
end up there.  Changing that seems to be non-trivial though.

> However your change:
> 1) removes explanation of what the code is doing. (comments/clarity)

Added your comments back in, in a more condensed form.  The comment
about the perfect hash is nice but applies to every hash usage and
is in every book about data structures.  The problem with the inpcb
hash is that it can never be perfect because the input in not known.
Even with an 1:1 mapping of maxsockets to the inpcb hash there will
be collisions based on pure chance since the hash is derived from
srcip:dstip:srcport:dstport.

> 2) removes the more forgiving code that rounds-down the tcp_tcbhashsize.
 >
> Can you please maintain the comments (probably keeping the logic of maketcp_hashsize() in a separate
> function).

We have a quite a number of places where this power-of-2 requirement
applies.  I think it's better to add a generic macro for this rounding
instead of re-inventing the wheel every time we have the same situation.

The comment about being more forgiving is sufficient in a commit
message as the reader of the code wouldn't know that it previously
was clamped down to TCBMINHASHSIZE (after printing a warning though).

> Can you please maintain the more forgiving code for dealing with non-power of 2 tunable?  I like
> this, it makes it easier for admins and less error prone.
>
> -       if (!powerof2(hashsize)) {
> -               int oldhashsize = hashsize;
>
> -               hashsize = maketcp_hashsize(hashsize);
> -               /* prevent absurdly low value */
> -               if (hashsize < 16)
> -                       hashsize = 16;
> -               printf("%s: WARNING: TCB hash size not a power of 2, "
> -                   "clipped from %d to %d.\n", __func__, oldhashsize,
> -                   hashsize);

Please see revised patch below.

-- 
Andre

Index: netinet/tcp_subr.c
===================================================================
--- netinet/tcp_subr.c	(revision 243985)
+++ netinet/tcp_subr.c	(working copy)
@@ -229,13 +229,13 @@
  		    void *ip4hdr, const void *ip6hdr);

  /*
- * Target size of TCP PCB hash tables. Must be a power of two.
+ * Minimal size of TCP PCB hash tables. Must be a power of two.
   *
   * Note that this can be overridden by the kernel environment
   * variable net.inet.tcp.tcbhashsize
   */
-#ifndef TCBHASHSIZE
-#define TCBHASHSIZE	0
+#ifndef TCBMINHASHSIZE
+#define TCBMINHASHSIZE	512
  #endif

  /*
@@ -282,35 +282,10 @@
  	return (0);
  }

-/*
- * Take a value and get the next power of 2 that doesn't overflow.
- * Used to size the tcp_inpcb hash buckets.
- */
-static int
-maketcp_hashsize(int size)
-{
-	int hashsize;
-
-	/*
-	 * auto tune.
-	 * get the next power of 2 higher than maxsockets.
-	 */
-	hashsize = 1 << fls(size);
-	/* catch overflow, and just go one power of 2 smaller */
-	if (hashsize < size) {
-		hashsize = 1 << (fls(size) - 1);
-	}
-	return (hashsize);
-}
-
  void
  tcp_init(void)
  {
-	const char *tcbhash_tuneable;
-	int hashsize;

-	tcbhash_tuneable = "net.inet.tcp.tcbhashsize";
-
  	if (hhook_head_register(HHOOK_TYPE_TCP, HHOOK_TCP_EST_IN,
  	    &V_tcp_hhh[HHOOK_TCP_EST_IN], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0)
  		printf("%s: WARNING: unable to register helper hook\n", __func__);
@@ -318,48 +293,21 @@
  	    &V_tcp_hhh[HHOOK_TCP_EST_OUT], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0)
  		printf("%s: WARNING: unable to register helper hook\n", __func__);

-	hashsize = TCBHASHSIZE;
-	TUNABLE_INT_FETCH(tcbhash_tuneable, &hashsize);
-	if (hashsize == 0) {
-		/*
-		 * Auto tune the hash size based on maxsockets.
-		 * A perfect hash would have a 1:1 mapping
-		 * (hashsize = maxsockets) however it's been
-		 * suggested that O(2) average is better.
-		 */
-		hashsize = maketcp_hashsize(maxsockets / 4);
-		/*
-		 * Our historical default is 512,
-		 * do not autotune lower than this.
-		 */
-		if (hashsize < 512)
-			hashsize = 512;
-		if (bootverbose)
-			printf("%s: %s auto tuned to %d\n", __func__,
-			    tcbhash_tuneable, hashsize);
-	}
  	/*
-	 * We require a hashsize to be a power of two.
-	 * Previously if it was not a power of two we would just reset it
-	 * back to 512, which could be a nasty surprise if you did not notice
-	 * the error message.
-	 * Instead what we do is clip it to the closest power of two lower
-	 * than the specified hash value.
+	 * Auto tune the hash size based on maxsockets.
+	 * It is set to 1/8 of maxsockets and rounded down
+	 * to the next power of two with a lower bound of
+	 * TCBMINHASHSIZE.
  	 */
-	if (!powerof2(hashsize)) {
-		int oldhashsize = hashsize;
-
-		hashsize = maketcp_hashsize(hashsize);
-		/* prevent absurdly low value */
-		if (hashsize < 16)
-			hashsize = 16;
-		printf("%s: WARNING: TCB hash size not a power of 2, "
-		    "clipped from %d to %d.\n", __func__, oldhashsize,
-		    hashsize);
+	tcp_tcbhashsize = powerof2rdown(maxsockets / 8);
+	TUNABLE_INT_FETCH("net.inet.tcp.tcbhashsize", &tcp_tcbhashsize);
+	if (!powerof2(tcp_tcbhashsize)) {
+		tcp_tcbhashsize = powerof2rdfloor(tcp_tcbhashsize, TCBMINHASHSIZE);
+		printf("WARNING: TCB hash size not a power of 2, rounded down\n");
  	}
-	in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, hashsize, hashsize,
-	    "tcp_inpcb", tcp_inpcb_init, NULL, UMA_ZONE_NOFREE,
-	    IPI_HASHFIELDS_4TUPLE);
+	in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, tcp_tcbhashsize,
+	    tcp_tcbhashsize, "tcp_inpcb", tcp_inpcb_init, NULL,
+	    UMA_ZONE_NOFREE, IPI_HASHFIELDS_4TUPLE);

  	/*
  	 * These have to be type stable for the benefit of the timers.
@@ -393,7 +341,6 @@
  		tcp_rexmit_min = 1;
  	tcp_rexmit_slop = TCPTV_CPU_VAR;
  	tcp_finwait2_timeout = TCPTV_FINWAIT2_TIMEOUT;
-	tcp_tcbhashsize = hashsize;

  	TUNABLE_INT_FETCH("net.inet.tcp.soreceive_stream", &tcp_soreceive_stream);
  	if (tcp_soreceive_stream) {
Index: sys/param.h
===================================================================
--- sys/param.h	(revision 243985)
+++ sys/param.h	(working copy)
@@ -284,6 +284,9 @@
  #define	MIN(a,b) (((a)<(b))?(a):(b))
  #define	MAX(a,b) (((a)>(b))?(a):(b))

+#define	powerof2rdown(x)	(powerof2(x) ? (x) : (0x1 << fls((x) - 1) - 1))
+#define	powerof2rdfloor(x, y)	(MAX(powerof2rdown(x), (y)))
+
  #ifdef _KERNEL
  /*
   * Basic byte order function prototypes for non-inline functions.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50C205DC.1040701>