Date: Sat, 8 Dec 2001 19:59:13 -0500 From: Bosko Milekic <bmilekic@technokratis.com> To: Mike Silbersack <silby@silby.com> Cc: net@FreeBSD.ORG, David Xu <davidx@viasoft.com.cn>, Mike Barcroft <mike@FreeBSD.ORG>, Leo Bicknell <bicknell@ufp.org> Subject: Re: mbuf / maxfiles / maxsockets / etc autoscaling patch Message-ID: <20011208195913.A55885@technokratis.com> In-Reply-To: <Pine.BSF.4.30.0112081756370.61906-200000@niwun.pair.com>; from silby@silby.com on Sat, Dec 08, 2001 at 06:30:26PM -0500 References: <Pine.BSF.4.30.0112081756370.61906-200000@niwun.pair.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Looks good to me. Right on!
On Sat, Dec 08, 2001 at 06:30:26PM -0500, Mike Silbersack wrote:
>
> Here's the autoscaling patch I was mumbling about earlier this week.
> With this patch applied, the necessity of tuning maxusers when one
> upgrades to a machine with more ram should be removed in most cases.
> (This patch is only to -current, the mbuf changes will make it not apply
> cleanly to -stable patch if there is sufficient demand right now.)
>
> Here's a quick look at the size of various memory allocations with various
> maxusers sizes and with the autoscaling patch:
>
> With maxusers:
>
> musers mproc mfiles msocket callout nmbcl nsfbuf tcp hash size
> 32 532 1064 1064 1612 1024 1024 512
> 64 1044 2088 2088 3148 1536 1536 512
> 128 2068 4136 4136 6220 2560 2560 512
> 256 4116 8232 8232 12364 4608 4608 512
>
> With autoscaling:
>
> MB ram mproc mfiles msocket callout nmbcl nsfbuf tcp hash size
> 32 512 4096 2048 4624 1024 1024 512
> 64 1024 8192 4096 9232 2048 1024 512
> 128 2048 16384 8192 18448 4096 2048 1024
> 256 4096 32768 16384 36880 8192 4096 2048
> 384 6144 49152 24576 55312 12288 6144 3072
> 512 8192 65536 32767 73744 16384 8192 4096
> (Values above this start to flatten out due to #defined maximums)
>
> Note that in general calculations are of the following form:
>
> value = max(maxusers-derived value, autoscale-derived value);
> value = loader tuned value if present
>
> As such, under no circumstances will people suddenly see a decrease in
> various parameters when they upgrade to an autoscaling kernel; only
> increases.
>
> I'm sure that there will be much commotion about what scaling factors are
> correct. To make changes to these easy, I have grouped all the mins,
> scaling factors, and maxes in param.h - tweaking them is quite simple.
>
> I included mins and maxes to make sure that autoscaling doesn't cause
> problems by creating low values on small memory machines and also so that
> it does not specify really high values on 2GB+ machines. The high case is
> what worries me; I have not heard much about how well maxsockets /
> nmbclusters > 32767 really works. If people running high volume systems
> that actively use that many simultaneous sockets + clusters + files, I'd
> be glad to bump up the maxes.
>
> Oh, there's one more kicker thrown in; I changed maxfilesperproc to equal
> 9/10ths of maxfiles, and maxprocperuid to equal 9/10 maxproc; this'll help
> to prevent a single process or user from forkbombing the system or running
> it out of file handles with a default configuration.
>
> Please review.
>
> Thanks,
>
> Mike "Silby" Silbersack
> diff -u -r sys.old/alpha/alpha/machdep.c sys/alpha/alpha/machdep.c
> --- sys.old/alpha/alpha/machdep.c Sat Dec 8 16:05:15 2001
> +++ sys/alpha/alpha/machdep.c Sat Dec 8 16:05:28 2001
> @@ -556,7 +556,7 @@
> kern_envp = bootinfo.envp;
>
> /* Do basic tuning, hz etc */
> - init_param();
> + init_hz();
>
> /*
> * Initalize the (temporary) bootstrap console interface, so
> @@ -861,6 +861,9 @@
> physmem -= (sz - nsz);
> }
> }
> +
> + /* Init basic tunables */
> + init_param(alpha_ptob(physmem));
>
> /*
> * Initialize error message buffer (at end of core).
> diff -u -r sys.old/i386/i386/machdep.c sys/i386/i386/machdep.c
> --- sys.old/i386/i386/machdep.c Sat Dec 8 16:04:54 2001
> +++ sys/i386/i386/machdep.c Sat Dec 8 16:43:20 2001
> @@ -1691,8 +1691,8 @@
> else if (bootinfo.bi_envp)
> kern_envp = (caddr_t)bootinfo.bi_envp + KERNBASE;
>
> - /* Init basic tunables, hz etc */
> - init_param();
> + /* Init hz */
> + init_hz();
>
> /*
> * make gdt memory segments, the code segment goes up to end of the
> @@ -1871,6 +1871,9 @@
> getmemsize(first);
>
> /* now running on new page tables, configured,and u/iom is accessible */
> +
> + /* Init basic tunables */
> + init_param(ptoa(Maxmem));
>
> /* Map the message buffer. */
> for (off = 0; off < round_page(MSGBUF_SIZE); off += PAGE_SIZE)
> diff -u -r sys.old/ia64/ia64/machdep.c sys/ia64/ia64/machdep.c
> --- sys.old/ia64/ia64/machdep.c Sat Dec 8 16:04:52 2001
> +++ sys/ia64/ia64/machdep.c Sat Dec 8 16:05:28 2001
> @@ -522,8 +522,8 @@
> /* get fpswa interface */
> fpswa_interface = (FPSWA_INTERFACE*)IA64_PHYS_TO_RR7(bootinfo.bi_fpswa);
>
> - /* Init basic tunables, including hz */
> - init_param();
> + /* Init hz */
> + init_hz();
>
> p = getenv("kernelname");
> if (p)
> @@ -623,6 +623,9 @@
> phys_avail[phys_avail_cnt] = 0;
>
> Maxmem = physmem;
> +
> + /* Init basic tunables */
> + init_param(ia64_ptob(physmem));
>
> /*
> * Initialize error message buffer (at end of core).
> diff -u -r sys.old/kern/subr_mbuf.c sys/kern/subr_mbuf.c
> --- sys.old/kern/subr_mbuf.c Sat Dec 8 16:04:51 2001
> +++ sys/kern/subr_mbuf.c Sat Dec 8 16:09:17 2001
> @@ -151,15 +151,21 @@
> static void
> tunable_mbinit(void *dummy)
> {
> + int automcls, autosfbuf;
>
> + /* Calculate autoscaled values, choose if greater. */
> +
> + automcls = min(MAXAUTOMCLS, max(MINAUTOMCLS, MCLSPERMB * physmemMB));
> + nmbclusters = max(automcls, NMBCLUSTERS);
> + autosfbuf = min(MAXAUTOSFBUF, max(MINAUTOSFBUF, SFBUFPERMB * physmemMB));
> + nsfbufs = max(autosfbuf, NSFBUFS);
> +
> /*
> * This has to be done before VM init.
> */
> - nmbclusters = NMBCLUSTERS;
> TUNABLE_INT_FETCH("kern.ipc.nmbclusters", &nmbclusters);
> nmbufs = NMBUFS;
> TUNABLE_INT_FETCH("kern.ipc.nmbufs", &nmbufs);
> - nsfbufs = NSFBUFS;
> TUNABLE_INT_FETCH("kern.ipc.nsfbufs", &nsfbufs);
> nmbcnt = NMBCNTS;
> TUNABLE_INT_FETCH("kern.ipc.nmbcnt", &nmbcnt);
> diff -u -r sys.old/kern/subr_param.c sys/kern/subr_param.c
> --- sys.old/kern/subr_param.c Sat Dec 8 16:04:51 2001
> +++ sys/kern/subr_param.c Sat Dec 8 16:10:08 2001
> @@ -90,39 +90,46 @@
> */
> struct buf *swbuf;
>
> +int physmemMB;
> +
> /*
> * Boot time overrides
> */
> void
> -init_param(void)
> +init_param(u_int64_t membytes)
> {
> + int memsizemb;
> + int autoproc, autofiles;
> +
> + physmemMB = membytes / 1048576;
>
> - /* Base parameters */
> + /* Calculate maxusers-derived values. */
> maxusers = MAXUSERS;
> TUNABLE_INT_FETCH("kern.maxusers", &maxusers);
> - hz = HZ;
> - TUNABLE_INT_FETCH("kern.hz", &hz);
> - tick = 1000000 / hz;
> - tickadj = howmany(30000, 60 * hz); /* can adjust 30ms in 60s */
> -
> - /* The following can be overridden after boot via sysctl */
> + nbuf = NBUF;
> maxproc = NPROC;
> - TUNABLE_INT_FETCH("kern.maxproc", &maxproc);
> maxfiles = MAXFILES;
> - TUNABLE_INT_FETCH("kern.maxfiles", &maxfiles);
> - maxprocperuid = maxproc - 1;
> - maxfilesperproc = maxfiles;
> -
> - /* Cannot be changed after boot */
> - nbuf = NBUF;
> - TUNABLE_INT_FETCH("kern.nbuf", &nbuf);
> #ifdef VM_SWZONE_SIZE_MAX
> maxswzone = VM_SWZONE_SIZE_MAX;
> #endif
> - TUNABLE_INT_FETCH("kern.maxswzone", &maxswzone);
> #ifdef VM_BCACHE_SIZE_MAX
> maxbcache = VM_BCACHE_SIZE_MAX;
> #endif
> +
> + /* Calculate autoscaled values, choose them if greater than above. */
> + autoproc = min(MAXAUTOPROC, max(MINAUTOPROC, PROCPERMB * physmemMB));
> + maxproc = max(maxproc, autoproc);
> + autofiles = min(MAXAUTOFILES, max(MINAUTOFILES, FILESPERMB * physmemMB));
> + maxfiles = max(maxfiles, autofiles);
> +
> + /* Allow loader-specified tuneables to take effect. */
> + TUNABLE_INT_FETCH("kern.maxproc", &maxproc);
> + TUNABLE_INT_FETCH("kern.maxfiles", &maxfiles);
> + maxprocperuid = (maxproc * 9) / 10;
> + maxfilesperproc = (maxfiles * 9) / 10;
> +
> + TUNABLE_INT_FETCH("kern.nbuf", &nbuf);
> + TUNABLE_INT_FETCH("kern.maxswzone", &maxswzone);
> TUNABLE_INT_FETCH("kern.maxbcache", &maxbcache);
> ncallout = 16 + maxproc + maxfiles;
> TUNABLE_INT_FETCH("kern.ncallout", &ncallout);
> @@ -139,4 +146,16 @@
> TUNABLE_QUAD_FETCH("kern.maxssiz", &maxssiz);
> sgrowsiz = SGROWSIZ;
> TUNABLE_QUAD_FETCH("kern.sgrowsiz", &sgrowsiz);
> +}
> +
> +/*
> + * Set hz. This must be called earlier in machdep.c than init_param().
> + */
> +void
> +init_hz(void)
> +{
> + hz = HZ;
> + TUNABLE_INT_FETCH("kern.hz", &hz);
> + tick = 1000000 / hz;
> + tickadj = howmany(30000, 60 * hz); /* can adjust 30ms in 60s */
> }
> diff -u -r sys.old/kern/uipc_socket2.c sys/kern/uipc_socket2.c
> --- sys.old/kern/uipc_socket2.c Sat Dec 8 16:04:50 2001
> +++ sys/kern/uipc_socket2.c Sat Dec 8 16:08:43 2001
> @@ -1026,7 +1026,12 @@
> */
> static void init_maxsockets(void *ignored)
> {
> + int autosockets, maxuserssockets;
> +
> + autosockets = physmemMB * SOCKETSPERMB;
> + autosockets = min(MAXAUTOSOCKETS, max(MINAUTOSOCKETS, autosockets));
> + maxuserssockets = 2 * (20 + (16 * maxusers));
> + maxsockets = max(maxuserssockets, max(autosockets, nmbclusters));
> TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets);
> - maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters));
> }
> SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL);
> diff -u -r sys.old/netinet/tcp_subr.c sys/netinet/tcp_subr.c
> --- sys.old/netinet/tcp_subr.c Sat Dec 8 16:04:42 2001
> +++ sys/netinet/tcp_subr.c Sat Dec 8 16:10:31 2001
> @@ -190,6 +190,7 @@
> tcp_init()
> {
> int hashsize = TCBHASHSIZE;
> + int autohashsize;
>
> tcp_ccgen = 1;
> tcp_cleartaocache();
> @@ -203,6 +204,13 @@
>
> LIST_INIT(&tcb);
> tcbinfo.listhead = &tcb;
> +
> + /* Calculate autoscaled hash size, use if > default hash size. */
> + autohashsize = physmemMB * TCBHASHPERMB;
> + autohashsize = min(MAXAUTOTCBHASH, max(MINAUTOTCBHASH, autohashsize));
> + while (!powerof2(autohashsize))
> + autohashsize++;
> + hashsize = max(hashsize, autohashsize);
> TUNABLE_INT_FETCH("net.inet.tcp.tcbhashsize", &hashsize);
> if (!powerof2(hashsize)) {
> printf("WARNING: TCB hash size not a power of 2\n");
> diff -u -r sys.old/powerpc/powerpc/machdep.c sys/powerpc/powerpc/machdep.c
> --- sys.old/powerpc/powerpc/machdep.c Sat Dec 8 16:04:39 2001
> +++ sys/powerpc/powerpc/machdep.c Sat Dec 8 16:48:30 2001
> @@ -436,7 +436,8 @@
> __asm ("mtsprg 0, %0" :: "r"(globalp));
>
> /* Init basic tunables, hz etc */
> - init_param();
> + init_hz();
> + init_param(0); /* XXX - needs to be fed physmem for proper autoscaling */
>
> /* setup curproc so the mutexes work */
>
> diff -u -r sys.old/sparc64/sparc64/machdep.c sys/sparc64/sparc64/machdep.c
> --- sys.old/sparc64/sparc64/machdep.c Sat Dec 8 16:04:38 2001
> +++ sys/sparc64/sparc64/machdep.c Sat Dec 8 16:47:29 2001
> @@ -249,10 +249,10 @@
> end = (vm_offset_t)_end;
> }
>
> - /*
> - * Initialize tunables.
> - */
> - init_param();
> + /* Init hz */
> + init_hz();
> + /* Init basic tuneables - XXX - this needs to be moved once maxmem exists here. */
> + init_param(0);
>
> #ifdef DDB
> kdb_init();
> diff -u -r sys.old/sys/param.h sys/sys/param.h
> --- sys.old/sys/param.h Sat Dec 8 16:04:37 2001
> +++ sys/sys/param.h Sat Dec 8 16:05:28 2001
> @@ -230,6 +230,44 @@
> #define ctodb(db) /* calculates pages to devblks */ \
> ((db) << (PAGE_SHIFT - DEV_BSHIFT))
>
> +/*
> + * Values used in autoscaling system structures based on RAM size.
> + *
> + * Although settings are scattered across various subsystems, a
> + * common formula is followed. Generally, there are three
> + * possible values to choose from: The value suggested by maxusers,
> + * the value suggested by the autoscaling formula, and a manually
> + * tuned value from loader.conf. If a manually tuned value is specified,
> + * this value will be used. Otherwise, the maximum of the maxusers
> + * and autoscaled setting will be used.
> + *
> + */
> +
> +/* Max processes, files. These are set in subr_param.c */
> +#define PROCPERMB 16
> +#define MINAUTOPROC 256
> +#define MAXAUTOPROC 32000
> +#define FILESPERMB 128
> +#define MINAUTOFILES 1024
> +#define MAXAUTOFILES 65536
> +
> +/* Max sockets. These are set in uipc_socket2.c */
> +#define SOCKETSPERMB 64
> +#define MINAUTOSOCKETS 512
> +#define MAXAUTOSOCKETS 32000
> +
> +/* Max mbuf clusters, sendfile buffers. These are set in subr_mbuf.c */
> +#define MCLSPERMB 32
> +#define MINAUTOMCLS 512
> +#define MAXAUTOMCLS 32000
> +#define SFBUFPERMB 16
> +#define MINAUTOSFBUF 1024
> +#define MAXAUTOSFBUF 32000
> +
> +/* Number of TCP hash buckets. These are set in tcp_subr.c */
> +#define TCBHASHPERMB 8
> +#define MINAUTOTCBHASH 512
> +#define MAXAUTOTCBHASH 8192
>
> /*
> * Make this available for most of the kernel. There were too many
> diff -u -r sys.old/sys/systm.h sys/sys/systm.h
> --- sys.old/sys/systm.h Sat Dec 8 16:04:37 2001
> +++ sys/sys/systm.h Sat Dec 8 16:07:45 2001
> @@ -60,6 +60,7 @@
> extern struct cv selwait; /* select conditional variable */
>
> extern int physmem; /* physical memory */
> +extern int physmemMB; /* physical memory size in megabytes */
>
> extern dev_t dumpdev; /* dump device */
> extern long dumplo; /* offset into dumpdev */
> @@ -121,7 +122,8 @@
>
> void cpu_boot __P((int));
> void cpu_rootconf __P((void));
> -void init_param __P((void));
> +void init_hz __P((void));
> +void init_param __P((u_int64_t));
> void tablefull __P((const char *));
> int kvprintf __P((char const *, void (*)(int, void*), void *, int,
> _BSD_VA_LIST_)) __printflike(1, 0);
--
Bosko Milekic
bmilekic@technokratis.com
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011208195913.A55885>
