From owner-svn-src-all@FreeBSD.ORG Fri Dec 7 15:06:13 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8A999CF5 for ; Fri, 7 Dec 2012 15:06:13 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id D01808FC18 for ; Fri, 7 Dec 2012 15:06:11 +0000 (UTC) Received: (qmail 36231 invoked from network); 7 Dec 2012 16:36:00 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 7 Dec 2012 16:36:00 -0000 Message-ID: <50C205DC.1040701@freebsd.org> Date: Fri, 07 Dec 2012 16:06:04 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Alfred Perlstein Subject: Re: svn commit: r243594 - head/sys/netinet References: <201211270304.qAR34PfD056691@svn.freebsd.org> <50B53ABC.1010304@freebsd.org> <50B57F46.1060207@mu.org> In-Reply-To: <50B57F46.1060207@mu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, Alfred Perlstein , src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 15:06:13 -0000 On 28.11.2012 04:04, Alfred Perlstein wrote: > On 11/27/12 2:12 PM, Andre Oppermann wrote: >> On 27.11.2012 04:04, Alfred Perlstein wrote: >>> Author: alfred >>> Date: Tue Nov 27 03:04:24 2012 >>> New Revision: 243594 >>> URL: http://svnweb.freebsd.org/changeset/base/243594 >>> >>> Log: >>> Auto size the tcbhashsize structure based on max sockets. >>> >>> While here, also make the code that enforces power-of-two more >>> forgiving, instead of just resetting to 512, graciously round-down >>> to the next lower power of two. >> >> The porthash (also a parameter to in_pcbinfo_init()) should not be >> scaled the same as tcbhashsize. It's unlikely that anyone is going >> to have more than a thousand listen ports. >> >> The default tcbhashsize scaling factor of maxsockets / 4 is IMHO >> excessive. If someone is running at the limit he's going to up >> maxsockets to double the target load. Dividing by 8 is a better >> balance for normal and well used large memory machines. >> The hashsize calculation logic is done a bit in a roundabout way. >> >> What do think of the attached patch to fix the porthash and to simplify >> the hashsize logic? >> > I like the porthash fix. After tracking down how the porthash works in in_pcb.c it became apparent that every connection ends up there too. So in the end the port hash should be the same size as the main hash for now. Having every connection on the port hash doesn't make a lot of sense, at least for TCP. Only ports from listen sockets should end up there. Changing that seems to be non-trivial though. > However your change: > 1) removes explanation of what the code is doing. (comments/clarity) Added your comments back in, in a more condensed form. The comment about the perfect hash is nice but applies to every hash usage and is in every book about data structures. The problem with the inpcb hash is that it can never be perfect because the input in not known. Even with an 1:1 mapping of maxsockets to the inpcb hash there will be collisions based on pure chance since the hash is derived from srcip:dstip:srcport:dstport. > 2) removes the more forgiving code that rounds-down the tcp_tcbhashsize. > > Can you please maintain the comments (probably keeping the logic of maketcp_hashsize() in a separate > function). We have a quite a number of places where this power-of-2 requirement applies. I think it's better to add a generic macro for this rounding instead of re-inventing the wheel every time we have the same situation. The comment about being more forgiving is sufficient in a commit message as the reader of the code wouldn't know that it previously was clamped down to TCBMINHASHSIZE (after printing a warning though). > Can you please maintain the more forgiving code for dealing with non-power of 2 tunable? I like > this, it makes it easier for admins and less error prone. > > - if (!powerof2(hashsize)) { > - int oldhashsize = hashsize; > > - hashsize = maketcp_hashsize(hashsize); > - /* prevent absurdly low value */ > - if (hashsize < 16) > - hashsize = 16; > - printf("%s: WARNING: TCB hash size not a power of 2, " > - "clipped from %d to %d.\n", __func__, oldhashsize, > - hashsize); Please see revised patch below. -- Andre Index: netinet/tcp_subr.c =================================================================== --- netinet/tcp_subr.c (revision 243985) +++ netinet/tcp_subr.c (working copy) @@ -229,13 +229,13 @@ void *ip4hdr, const void *ip6hdr); /* - * Target size of TCP PCB hash tables. Must be a power of two. + * Minimal size of TCP PCB hash tables. Must be a power of two. * * Note that this can be overridden by the kernel environment * variable net.inet.tcp.tcbhashsize */ -#ifndef TCBHASHSIZE -#define TCBHASHSIZE 0 +#ifndef TCBMINHASHSIZE +#define TCBMINHASHSIZE 512 #endif /* @@ -282,35 +282,10 @@ return (0); } -/* - * Take a value and get the next power of 2 that doesn't overflow. - * Used to size the tcp_inpcb hash buckets. - */ -static int -maketcp_hashsize(int size) -{ - int hashsize; - - /* - * auto tune. - * get the next power of 2 higher than maxsockets. - */ - hashsize = 1 << fls(size); - /* catch overflow, and just go one power of 2 smaller */ - if (hashsize < size) { - hashsize = 1 << (fls(size) - 1); - } - return (hashsize); -} - void tcp_init(void) { - const char *tcbhash_tuneable; - int hashsize; - tcbhash_tuneable = "net.inet.tcp.tcbhashsize"; - if (hhook_head_register(HHOOK_TYPE_TCP, HHOOK_TCP_EST_IN, &V_tcp_hhh[HHOOK_TCP_EST_IN], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0) printf("%s: WARNING: unable to register helper hook\n", __func__); @@ -318,48 +293,21 @@ &V_tcp_hhh[HHOOK_TCP_EST_OUT], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0) printf("%s: WARNING: unable to register helper hook\n", __func__); - hashsize = TCBHASHSIZE; - TUNABLE_INT_FETCH(tcbhash_tuneable, &hashsize); - if (hashsize == 0) { - /* - * Auto tune the hash size based on maxsockets. - * A perfect hash would have a 1:1 mapping - * (hashsize = maxsockets) however it's been - * suggested that O(2) average is better. - */ - hashsize = maketcp_hashsize(maxsockets / 4); - /* - * Our historical default is 512, - * do not autotune lower than this. - */ - if (hashsize < 512) - hashsize = 512; - if (bootverbose) - printf("%s: %s auto tuned to %d\n", __func__, - tcbhash_tuneable, hashsize); - } /* - * We require a hashsize to be a power of two. - * Previously if it was not a power of two we would just reset it - * back to 512, which could be a nasty surprise if you did not notice - * the error message. - * Instead what we do is clip it to the closest power of two lower - * than the specified hash value. + * Auto tune the hash size based on maxsockets. + * It is set to 1/8 of maxsockets and rounded down + * to the next power of two with a lower bound of + * TCBMINHASHSIZE. */ - if (!powerof2(hashsize)) { - int oldhashsize = hashsize; - - hashsize = maketcp_hashsize(hashsize); - /* prevent absurdly low value */ - if (hashsize < 16) - hashsize = 16; - printf("%s: WARNING: TCB hash size not a power of 2, " - "clipped from %d to %d.\n", __func__, oldhashsize, - hashsize); + tcp_tcbhashsize = powerof2rdown(maxsockets / 8); + TUNABLE_INT_FETCH("net.inet.tcp.tcbhashsize", &tcp_tcbhashsize); + if (!powerof2(tcp_tcbhashsize)) { + tcp_tcbhashsize = powerof2rdfloor(tcp_tcbhashsize, TCBMINHASHSIZE); + printf("WARNING: TCB hash size not a power of 2, rounded down\n"); } - in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, hashsize, hashsize, - "tcp_inpcb", tcp_inpcb_init, NULL, UMA_ZONE_NOFREE, - IPI_HASHFIELDS_4TUPLE); + in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, tcp_tcbhashsize, + tcp_tcbhashsize, "tcp_inpcb", tcp_inpcb_init, NULL, + UMA_ZONE_NOFREE, IPI_HASHFIELDS_4TUPLE); /* * These have to be type stable for the benefit of the timers. @@ -393,7 +341,6 @@ tcp_rexmit_min = 1; tcp_rexmit_slop = TCPTV_CPU_VAR; tcp_finwait2_timeout = TCPTV_FINWAIT2_TIMEOUT; - tcp_tcbhashsize = hashsize; TUNABLE_INT_FETCH("net.inet.tcp.soreceive_stream", &tcp_soreceive_stream); if (tcp_soreceive_stream) { Index: sys/param.h =================================================================== --- sys/param.h (revision 243985) +++ sys/param.h (working copy) @@ -284,6 +284,9 @@ #define MIN(a,b) (((a)<(b))?(a):(b)) #define MAX(a,b) (((a)>(b))?(a):(b)) +#define powerof2rdown(x) (powerof2(x) ? (x) : (0x1 << fls((x) - 1) - 1)) +#define powerof2rdfloor(x, y) (MAX(powerof2rdown(x), (y))) + #ifdef _KERNEL /* * Basic byte order function prototypes for non-inline functions.