Date: Fri, 7 Dec 2012 09:31:34 -0800 From: Alfred Perlstein <bright@mu.org> To: Andre Oppermann <andre@freebsd.org> Cc: "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, Alfred Perlstein <alfred@FreeBSD.org>, "src-committers@freebsd.org" <src-committers@freebsd.org> Subject: Re: svn commit: r243594 - head/sys/netinet Message-ID: <AC499CCF-3812-4CA9-929C-05AA4C39F223@mu.org> In-Reply-To: <50C205DC.1040701@freebsd.org> References: <201211270304.qAR34PfD056691@svn.freebsd.org> <50B53ABC.1010304@freebsd.org> <50B57F46.1060207@mu.org> <50C205DC.1040701@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
It's good to put the power or two thing in its own function.=20 That said, about the comment changes, hash size changes and auto tuning cha= nges respectively.... Comments: Dude... It's not like removing the helpful comments is going to sp= eed up either compiling this code or how fast it can process packets. Pleas= e leave those in.=20 The hash: Your comment about it not being perfect is incorrect. By making it= 1/8 you guarantee that it can never be perfect when fully loaded. If it is s= o important for you to guarantee that the minimum hash traversal on this has= h is at LEAST FOUR when fully loaded then by all means make the change. I t= hink it's unbelievably wrong, penny wise/dollar foolish or maybe bit wise/CP= U foolish but for some reason others agree with you for reasons that still d= o not make sense to me. So again by all means pessimize the hash table to sa= ve a few bytes.=20 The tuning: Additionally you've removed the informational output that shows w= hat the input was and what it was changed to. It's as if you want to make it= harder for my techs to figure out what had gone wrong.=20 Tuning 2: Additionally it appears you've removed the safety net in this code= for clipping it down to min 512. I don't get why this is such an issue, you do realize this code is run only o= nce and is not in the critical path? We should be optimizing those code for= utility, user-friendliness and general readability not as if it was part of= TCP's fast path.=20 Sent from my iPhone On Dec 7, 2012, at 7:06 AM, Andre Oppermann <andre@freebsd.org> wrote: > On 28.11.2012 04:04, Alfred Perlstein wrote: >> On 11/27/12 2:12 PM, Andre Oppermann wrote: >>> On 27.11.2012 04:04, Alfred Perlstein wrote: >>>> Author: alfred >>>> Date: Tue Nov 27 03:04:24 2012 >>>> New Revision: 243594 >>>> URL: http://svnweb.freebsd.org/changeset/base/243594 >>>>=20 >>>> Log: >>>> Auto size the tcbhashsize structure based on max sockets. >>>>=20 >>>> While here, also make the code that enforces power-of-two more >>>> forgiving, instead of just resetting to 512, graciously round-down >>>> to the next lower power of two. >>>=20 >>> The porthash (also a parameter to in_pcbinfo_init()) should not be >>> scaled the same as tcbhashsize. It's unlikely that anyone is going >>> to have more than a thousand listen ports. >>>=20 >>> The default tcbhashsize scaling factor of maxsockets / 4 is IMHO >>> excessive. If someone is running at the limit he's going to up >>> maxsockets to double the target load. Dividing by 8 is a better >>> balance for normal and well used large memory machines. >>> The hashsize calculation logic is done a bit in a roundabout way. >>>=20 >>> What do think of the attached patch to fix the porthash and to simplify >>> the hashsize logic? >> I like the porthash fix. >=20 > After tracking down how the porthash works in in_pcb.c it became > apparent that every connection ends up there too. So in the end > the port hash should be the same size as the main hash for now. > Having every connection on the port hash doesn't make a lot of > sense, at least for TCP. Only ports from listen sockets should > end up there. Changing that seems to be non-trivial though. >=20 >> However your change: >> 1) removes explanation of what the code is doing. (comments/clarity) >=20 > Added your comments back in, in a more condensed form. The comment > about the perfect hash is nice but applies to every hash usage and > is in every book about data structures. The problem with the inpcb > hash is that it can never be perfect because the input in not known. > Even with an 1:1 mapping of maxsockets to the inpcb hash there will > be collisions based on pure chance since the hash is derived from > srcip:dstip:srcport:dstport. >=20 >> 2) removes the more forgiving code that rounds-down the tcp_tcbhashsize. > > >> Can you please maintain the comments (probably keeping the logic of maket= cp_hashsize() in a separate >> function). >=20 > We have a quite a number of places where this power-of-2 requirement > applies. I think it's better to add a generic macro for this rounding > instead of re-inventing the wheel every time we have the same situation. >=20 > The comment about being more forgiving is sufficient in a commit > message as the reader of the code wouldn't know that it previously > was clamped down to TCBMINHASHSIZE (after printing a warning though). >=20 >> Can you please maintain the more forgiving code for dealing with non-powe= r of 2 tunable? I like >> this, it makes it easier for admins and less error prone. >>=20 >> - if (!powerof2(hashsize)) { >> - int oldhashsize =3D hashsize; >>=20 >> - hashsize =3D maketcp_hashsize(hashsize); >> - /* prevent absurdly low value */ >> - if (hashsize < 16) >> - hashsize =3D 16; >> - printf("%s: WARNING: TCB hash size not a power of 2, " >> - "clipped from %d to %d.\n", __func__, oldhashsize, >> - hashsize); >=20 > Please see revised patch below. >=20 > --=20 > Andre >=20 > Index: netinet/tcp_subr.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- netinet/tcp_subr.c (revision 243985) > +++ netinet/tcp_subr.c (working copy) > @@ -229,13 +229,13 @@ > void *ip4hdr, const void *ip6hdr); >=20 > /* > - * Target size of TCP PCB hash tables. Must be a power of two. > + * Minimal size of TCP PCB hash tables. Must be a power of two. > * > * Note that this can be overridden by the kernel environment > * variable net.inet.tcp.tcbhashsize > */ > -#ifndef TCBHASHSIZE > -#define TCBHASHSIZE 0 > +#ifndef TCBMINHASHSIZE > +#define TCBMINHASHSIZE 512 > #endif >=20 > /* > @@ -282,35 +282,10 @@ > return (0); > } >=20 > -/* > - * Take a value and get the next power of 2 that doesn't overflow. > - * Used to size the tcp_inpcb hash buckets. > - */ > -static int > -maketcp_hashsize(int size) > -{ > - int hashsize; > - > - /* > - * auto tune. > - * get the next power of 2 higher than maxsockets. > - */ > - hashsize =3D 1 << fls(size); > - /* catch overflow, and just go one power of 2 smaller */ > - if (hashsize < size) { > - hashsize =3D 1 << (fls(size) - 1); > - } > - return (hashsize); > -} > - > void > tcp_init(void) > { > - const char *tcbhash_tuneable; > - int hashsize; >=20 > - tcbhash_tuneable =3D "net.inet.tcp.tcbhashsize"; > - > if (hhook_head_register(HHOOK_TYPE_TCP, HHOOK_TCP_EST_IN, > &V_tcp_hhh[HHOOK_TCP_EST_IN], HHOOK_NOWAIT|HHOOK_HEADISINVNET) !=3D= 0) > printf("%s: WARNING: unable to register helper hook\n", __func__); > @@ -318,48 +293,21 @@ > &V_tcp_hhh[HHOOK_TCP_EST_OUT], HHOOK_NOWAIT|HHOOK_HEADISINVNET) !=3D= 0) > printf("%s: WARNING: unable to register helper hook\n", __func__); >=20 > - hashsize =3D TCBHASHSIZE; > - TUNABLE_INT_FETCH(tcbhash_tuneable, &hashsize); > - if (hashsize =3D=3D 0) { > - /* > - * Auto tune the hash size based on maxsockets. > - * A perfect hash would have a 1:1 mapping > - * (hashsize =3D maxsockets) however it's been > - * suggested that O(2) average is better. > - */ > - hashsize =3D maketcp_hashsize(maxsockets / 4); > - /* > - * Our historical default is 512, > - * do not autotune lower than this. > - */ > - if (hashsize < 512) > - hashsize =3D 512; > - if (bootverbose) > - printf("%s: %s auto tuned to %d\n", __func__, > - tcbhash_tuneable, hashsize); > - } > /* > - * We require a hashsize to be a power of two. > - * Previously if it was not a power of two we would just reset it > - * back to 512, which could be a nasty surprise if you did not notice= > - * the error message. > - * Instead what we do is clip it to the closest power of two lower > - * than the specified hash value. > + * Auto tune the hash size based on maxsockets. > + * It is set to 1/8 of maxsockets and rounded down > + * to the next power of two with a lower bound of > + * TCBMINHASHSIZE. > */ > - if (!powerof2(hashsize)) { > - int oldhashsize =3D hashsize; > - > - hashsize =3D maketcp_hashsize(hashsize); > - /* prevent absurdly low value */ > - if (hashsize < 16) > - hashsize =3D 16; > - printf("%s: WARNING: TCB hash size not a power of 2, " > - "clipped from %d to %d.\n", __func__, oldhashsize, > - hashsize); > + tcp_tcbhashsize =3D powerof2rdown(maxsockets / 8); > + TUNABLE_INT_FETCH("net.inet.tcp.tcbhashsize", &tcp_tcbhashsize); > + if (!powerof2(tcp_tcbhashsize)) { > + tcp_tcbhashsize =3D powerof2rdfloor(tcp_tcbhashsize, TCBMINHASHSI= ZE); > + printf("WARNING: TCB hash size not a power of 2, rounded down\n")= ; > } > - in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, hashsize, hashsize, > - "tcp_inpcb", tcp_inpcb_init, NULL, UMA_ZONE_NOFREE, > - IPI_HASHFIELDS_4TUPLE); > + in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, tcp_tcbhashsize, > + tcp_tcbhashsize, "tcp_inpcb", tcp_inpcb_init, NULL, > + UMA_ZONE_NOFREE, IPI_HASHFIELDS_4TUPLE); >=20 > /* > * These have to be type stable for the benefit of the timers. > @@ -393,7 +341,6 @@ > tcp_rexmit_min =3D 1; > tcp_rexmit_slop =3D TCPTV_CPU_VAR; > tcp_finwait2_timeout =3D TCPTV_FINWAIT2_TIMEOUT; > - tcp_tcbhashsize =3D hashsize; >=20 > TUNABLE_INT_FETCH("net.inet.tcp.soreceive_stream", &tcp_soreceive_strea= m); > if (tcp_soreceive_stream) { > Index: sys/param.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/param.h (revision 243985) > +++ sys/param.h (working copy) > @@ -284,6 +284,9 @@ > #define MIN(a,b) (((a)<(b))?(a):(b)) > #define MAX(a,b) (((a)>(b))?(a):(b)) >=20 > +#define powerof2rdown(x) (powerof2(x) ? (x) : (0x1 << fls((x) - 1) -= 1)) > +#define powerof2rdfloor(x, y) (MAX(powerof2rdown(x), (y))) > + > #ifdef _KERNEL > /* > * Basic byte order function prototypes for non-inline functions. >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AC499CCF-3812-4CA9-929C-05AA4C39F223>