Date: Tue, 26 Jan 1999 12:53:31 -0600 (CST) From: Kevin Day <toasty@home.dragondata.com> To: hackers@FreeBSD.ORG Subject: High Load cron patches - comments? Message-ID: <199901261853.MAA15095@home.dragondata.com>
next in thread | raw e-mail | index | archive | help
--ELM917376811-8356-0_ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit I have a somewhat unusual setup, where I have a server that several hundred customers use, that has all of /home over an nfs mount, and each customer has quite a few cron jobs that they like to execute every ten minutes or so. The problem is that they all want to execute their cron jobs on a */10 minute frequency, so on every minute ending in '0', I suddenly have cron spawning off a few hundred processes, bringing the machine's load average above 15.0, and saturating my NFS link for quite a while. No amount of pleading with my users did much good, since they were just following a template given to them by the software they were using. I talked briefly about this with Paul Vixie (cron's author), while we had differing ideas about how to accomplish this, my patches have been running for over a month now on a production system, and have worked very well. These patches limit the number of jobs cron will start per second, with a initial burst, a hard limit, as well as a 'burst mode', if the number of jobs on the 'to do list' is geting excessively high. Giving no options to cron makes it behave exactly as it did without the patches. The format for enabling this load balancing is as follows: cron [-x debugflag[,...]] [-a addweight [-c tickdecay] [-t threshold]] The -a parameter controls how many 'points' are added on every on execution The -c parameter controls how many points are subtracted every second The -t parameter controls how many points are necessary before queuing jobs, instead of running them. The flow is as follows: If (numpoints < threshold) { Execute Job numpoints += AddWeight; } Every second { numpoints -= tickdecay; if (reallybehindinrunningjobs) { Turn on burst mode } } Burst mode will keep tuning itself higher and higher, the further behind jobs get, until it's caught up. This prevents one user from putting 50,000 jobs in their crontab from making cron start sucking ram like mad. For me, cron -a 10 -c 100 -t 200 works very well. (Allow 10 jobs per second, but allow 20 at the beginning to hurry things up) Paul's idea was to limit the number of children cron has running at a time, hwoever for me this wasn't effective, as the user's jobs tend to hang around for a long time. Can I get comments/suggestions about this? Kevin --ELM917376811-8356-0_ Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: attachment; filename=hlcron.patch Content-Description: /home/toasty/hlcron.patch Content-Transfer-Encoding: 7bit --- ../oldcron/cron.c Sat Jul 18 06:09:09 1998 +++ cron.c Sat Jan 2 20:04:02 1999 @@ -47,16 +47,19 @@ static void usage() { char **dflags; - fprintf(stderr, "usage: cron [-x debugflag[,...]]\n"); + fprintf(stderr, "usage: cron [-x debugflag[,...]] [-a addweight [-c tickdecay] [-t threshold]]\n"); fprintf(stderr, "\ndebugflags: "); for(dflags = DebugFlagNames; *dflags; dflags++) { fprintf(stderr, "%s ", *dflags); } - fprintf(stderr, "\n"); + fprintf(stderr, "\n\n"); + fprintf(stderr, "-a [addweight] Number of 'points' to add every time job is run\n"); + fprintf(stderr, "-c [tickdecay] Number of 'points' to subtract every second\n"); + fprintf(stderr, "-t [threshold] Number of 'points' to stop running jobs and just queue\n"); + fprintf(stderr, "\n"); exit(ERROR_EXIT); } @@ -126,19 +129,29 @@ while (TRUE) { # if DEBUGGING /* if (!(DebugFlags & DTEST)) */ # endif /*DEBUGGING*/ - cron_sleep(); - - load_database(&database); + cron_sleep(); - /* do this iteration + /* Prevent misconfigured options from making cron take + * over system ram + * This may not be desirable for production systems + * where cron jobs must run */ - cron_tick(&database); + if (BurstRate < 7) { + + load_database(&database); + + /* do this iteration + */ + cron_tick(&database); + } /* sleep 1 minute */ TargetTime += 60; } } @@ -226,31 +239,64 @@ static void cron_sleep() { - register int seconds_to_wait; + register int seconds_to_wait, seconds_to_delay; - do { - seconds_to_wait = (int) (TargetTime - time((time_t*)0)); - Debug(DSCH, ("[%d] TargetTime=%ld, sec-to-wait=%d\n", - getpid(), (long)TargetTime, seconds_to_wait)) - - /* if we intend to sleep, this means that it's finally - * time to empty the job queue (execute it). - * - * if we run any jobs, we'll probably screw up our timing, - * so go recompute. - * - * note that we depend here on the left-to-right nature - * of &&, and the short-circuiting. - */ - } while (seconds_to_wait > 0 && job_runqueue()); - - while (seconds_to_wait > 0) { - Debug(DSCH, ("[%d] sleeping for %d seconds\n", - getpid(), seconds_to_wait)) - seconds_to_wait = (int) sleep((unsigned int) seconds_to_wait); - } + do { + seconds_to_wait = (int) (TargetTime - time((time_t*)0)); + if (LoadAverage > LoadThreshold) { + /* if we denied jobs to run last time around, + * see if we should sleep for a short period + * before exiting + */ + if (seconds_to_wait > 0) { + /* decide if we should take a short nap + * or just go ahead with the normal + * return + */ + seconds_to_delay = MIN(seconds_to_wait, + ((LoadAverage - LoadThreshold) / + (TickDecay << BurstRate)) + 1); + if (seconds_to_delay == 0) + seconds_to_delay = 1; + Debug(DSCH, ("[%d] short sleeping for %d seconds\n", + getpid(), seconds_to_delay)) + sleep(seconds_to_delay); + } + } + if (LoadAverage > ((TickDecay << BurstRate) * seconds_to_delay)) + LoadAverage -= (TickDecay << BurstRate) * seconds_to_delay; + else + LoadAverage = 0; + /* if we're bursting jobs, and still not catching up + * increase the burst speed + */ + if (NumJobs > (MAXJOBLENHIGH << BurstRate)) + BurstRate++; + /* Put the burst rate back down if we're caught up */ + else if (BurstRate && (NumJobs < (MAXJOBLENLOW << (BurstRate - 1)))) + BurstRate--; + Debug(DSCH, ("[%d] TargetTime=%ld, sec-to-wait=%d, load=%d, jobs=%d, burst=%d\n", + getpid(), (long)TargetTime, seconds_to_wait, LoadAverage, NumJobs, + BurstRate)) + /* if we intend to sleep, this means that it's finally + * time to empty the job queue (execute it). + * + * if we run any jobs, we'll probably screw up our timing, + * so go recompute. + * + * note that we depend here on the left-to-right nature + * of &&, and the short-circuiting. + */ + } while ((seconds_to_wait > 0) && job_runqueue()); + + while (seconds_to_wait > 0) { + Debug(DSCH, ("[%d] sleeping for %d seconds\n", + getpid(), seconds_to_wait)) + seconds_to_wait = (int) sleep((unsigned int) seconds_to_wait); + } + } #ifdef USE_SIGCHLD @@ -296,13 +342,34 @@ char *argv[]; { int argch; - while ((argch = getopt(argc, argv, "x:")) != -1) { + while ((argch = getopt(argc, argv, "x:a:t:c:")) != -1) { switch (argch) { case 'x': if (!set_debug_flags(optarg)) usage(); + break; + case 'a': + AddWeight = atoi(optarg); + if (AddWeight > 100) { /* arbitrary value */ + fprintf(stderr, "-a parameter %i too high. Max: 100\n\n", AddWeight); + usage(); + } + break; + case 't': + LoadThreshold = atoi(optarg); + if (LoadThreshold < 1) { + fprintf(stderr, "-t parameter too low. Min: 1\n\n", LoadThreshold); + usage(); + } + break; + case 'c': + TickDecay = atoi(optarg); + if (TickDecay < 1) { + fprintf(stderr, "-c parameter too low. Min: 1\n\n", TickDecay); + usage(); + } break; default: usage(); } --- ../oldcron/cron.h Mon Mar 9 05:41:41 1998 +++ cron.h Sat Jan 2 16:50:13 1999 @@ -72,8 +72,16 @@ #define MAX_COMMAND 1000 /* max length of internally generated cmd */ #define MAX_ENVSTR 1000 /* max length of envvar=value\0 strings */ #define MAX_TEMPSTR 100 /* obvious */ #define MAX_UNAME 20 /* max length of username, should be overkill */ +#define MAXJOBLENHIGH 512 /* How many jobs in the run queue before + * increasing run speed + * every multiple of two higher than this + * will increase speed even more + */ +#define MAXJOBLENLOW 128 /* How many jobs in the run queue before + * returing to normal + */ #define ROOT_UID 0 /* don't change this, it really must be root */ #define ROOT_USER "root" /* ditto */ /* NOTE: these correspond to DebugFlagNames, @@ -266,8 +274,15 @@ char *ProgramName; int LineNumber; time_t TargetTime; +int LoadAverage = 0; +int AddWeight = 0; /* default load balancing off */ +int TickDecay = 10; /* sane value if not given */ +int LoadThreshold = 100; /* sane value if not given */ +int BurstRate = 0; +int NumJobs = 0; # if DEBUGGING int DebugFlags; char *DebugFlagNames[] = { /* sync with #defines */ @@ -281,8 +296,14 @@ *DowNames[], *ProgramName; extern int LineNumber; extern time_t TargetTime; +extern int LoadAverage; +extern int AddWeight; +extern int TickDecay; +extern int BurstRate; +extern int LoadThreshold; +extern int NumJobs; # if DEBUGGING extern int DebugFlags; extern char *DebugFlagNames[]; # endif /* DEBUGGING */ --- ../oldcron/job.c Mon Mar 9 05:41:47 1998 +++ job.c Sat Jan 2 19:51:33 1999 @@ -55,22 +55,29 @@ /* add it to the tail */ if (!jhead) { jhead=j; } else { jtail->next=j; } jtail = j; + NumJobs++; } int job_runqueue() { register job *j, *jn; register int run = 0; for (j=jhead; j; j=jn) { + if (LoadAverage > LoadThreshold) { + /* We've executed too much, clean up and stop. */ + jhead = j; + return 1; + } do_command(j->e, j->u); jn = j->next; free(j); run++; + NumJobs--; + LoadAverage += AddWeight; } jhead = jtail = NULL; return run; } --ELM917376811-8356-0_-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901261853.MAA15095>