Date: Tue, 26 Jan 1999 12:53:31 -0600 (CST) From: Kevin Day <toasty@home.dragondata.com> To: hackers@FreeBSD.ORG Subject: High Load cron patches - comments? Message-ID: <199901261853.MAA15095@home.dragondata.com>
next in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
I have a somewhat unusual setup, where I have a server that several hundred
customers use, that has all of /home over an nfs mount, and each customer
has quite a few cron jobs that they like to execute every ten minutes or so.
The problem is that they all want to execute their cron jobs on a */10
minute frequency, so on every minute ending in '0', I suddenly have cron
spawning off a few hundred processes, bringing the machine's load average
above 15.0, and saturating my NFS link for quite a while.
No amount of pleading with my users did much good, since they were just
following a template given to them by the software they were using.
I talked briefly about this with Paul Vixie (cron's author), while we had
differing ideas about how to accomplish this, my patches have been running
for over a month now on a production system, and have worked very well.
These patches limit the number of jobs cron will start per second, with a
initial burst, a hard limit, as well as a 'burst mode', if the number of
jobs on the 'to do list' is geting excessively high.
Giving no options to cron makes it behave exactly as it did without the
patches.
The format for enabling this load balancing is as follows:
cron [-x debugflag[,...]] [-a addweight [-c tickdecay] [-t threshold]]
The -a parameter controls how many 'points' are added on every on execution
The -c parameter controls how many points are subtracted every second
The -t parameter controls how many points are necessary before queuing jobs,
instead of running them.
The flow is as follows:
If (numpoints < threshold) {
Execute Job
numpoints += AddWeight;
}
Every second {
numpoints -= tickdecay;
if (reallybehindinrunningjobs) {
Turn on burst mode
}
}
Burst mode will keep tuning itself higher and higher, the further behind
jobs get, until it's caught up. This prevents one user from putting 50,000 jobs
in their crontab from making cron start sucking ram like mad.
For me, cron -a 10 -c 100 -t 200 works very well. (Allow 10 jobs per
second, but allow 20 at the beginning to hurry things up)
Paul's idea was to limit the number of children cron has running at a time,
hwoever for me this wasn't effective, as the user's jobs tend to hang around
for a long time.
Can I get comments/suggestions about this?
Kevin
[-- Attachment #2 --]
--- ../oldcron/cron.c Sat Jul 18 06:09:09 1998
+++ cron.c Sat Jan 2 20:04:02 1999
@@ -47,16 +47,19 @@
static void
usage() {
char **dflags;
- fprintf(stderr, "usage: cron [-x debugflag[,...]]\n");
+ fprintf(stderr, "usage: cron [-x debugflag[,...]] [-a addweight [-c tickdecay] [-t threshold]]\n");
fprintf(stderr, "\ndebugflags: ");
for(dflags = DebugFlagNames; *dflags; dflags++) {
fprintf(stderr, "%s ", *dflags);
}
- fprintf(stderr, "\n");
+ fprintf(stderr, "\n\n");
+ fprintf(stderr, "-a [addweight] Number of 'points' to add every time job is run\n");
+ fprintf(stderr, "-c [tickdecay] Number of 'points' to subtract every second\n");
+ fprintf(stderr, "-t [threshold] Number of 'points' to stop running jobs and just queue\n");
+ fprintf(stderr, "\n");
exit(ERROR_EXIT);
}
@@ -126,19 +129,29 @@
while (TRUE) {
# if DEBUGGING
/* if (!(DebugFlags & DTEST)) */
# endif /*DEBUGGING*/
- cron_sleep();
-
- load_database(&database);
+ cron_sleep();
- /* do this iteration
+ /* Prevent misconfigured options from making cron take
+ * over system ram
+ * This may not be desirable for production systems
+ * where cron jobs must run
*/
- cron_tick(&database);
+ if (BurstRate < 7) {
+
+ load_database(&database);
+
+ /* do this iteration
+ */
+ cron_tick(&database);
+ }
/* sleep 1 minute
*/
TargetTime += 60;
}
}
@@ -226,31 +239,64 @@
static void
cron_sleep() {
- register int seconds_to_wait;
+ register int seconds_to_wait, seconds_to_delay;
- do {
- seconds_to_wait = (int) (TargetTime - time((time_t*)0));
- Debug(DSCH, ("[%d] TargetTime=%ld, sec-to-wait=%d\n",
- getpid(), (long)TargetTime, seconds_to_wait))
-
- /* if we intend to sleep, this means that it's finally
- * time to empty the job queue (execute it).
- *
- * if we run any jobs, we'll probably screw up our timing,
- * so go recompute.
- *
- * note that we depend here on the left-to-right nature
- * of &&, and the short-circuiting.
- */
- } while (seconds_to_wait > 0 && job_runqueue());
-
- while (seconds_to_wait > 0) {
- Debug(DSCH, ("[%d] sleeping for %d seconds\n",
- getpid(), seconds_to_wait))
- seconds_to_wait = (int) sleep((unsigned int) seconds_to_wait);
- }
+ do {
+ seconds_to_wait = (int) (TargetTime - time((time_t*)0));
+ if (LoadAverage > LoadThreshold) {
+ /* if we denied jobs to run last time around,
+ * see if we should sleep for a short period
+ * before exiting
+ */
+ if (seconds_to_wait > 0) {
+ /* decide if we should take a short nap
+ * or just go ahead with the normal
+ * return
+ */
+ seconds_to_delay = MIN(seconds_to_wait,
+ ((LoadAverage - LoadThreshold) /
+ (TickDecay << BurstRate)) + 1);
+ if (seconds_to_delay == 0)
+ seconds_to_delay = 1;
+ Debug(DSCH, ("[%d] short sleeping for %d seconds\n",
+ getpid(), seconds_to_delay))
+ sleep(seconds_to_delay);
+ }
+ }
+ if (LoadAverage > ((TickDecay << BurstRate) * seconds_to_delay))
+ LoadAverage -= (TickDecay << BurstRate) * seconds_to_delay;
+ else
+ LoadAverage = 0;
+ /* if we're bursting jobs, and still not catching up
+ * increase the burst speed
+ */
+ if (NumJobs > (MAXJOBLENHIGH << BurstRate))
+ BurstRate++;
+ /* Put the burst rate back down if we're caught up */
+ else if (BurstRate && (NumJobs < (MAXJOBLENLOW << (BurstRate - 1))))
+ BurstRate--;
+ Debug(DSCH, ("[%d] TargetTime=%ld, sec-to-wait=%d, load=%d, jobs=%d, burst=%d\n",
+ getpid(), (long)TargetTime, seconds_to_wait, LoadAverage, NumJobs,
+ BurstRate))
+ /* if we intend to sleep, this means that it's finally
+ * time to empty the job queue (execute it).
+ *
+ * if we run any jobs, we'll probably screw up our timing,
+ * so go recompute.
+ *
+ * note that we depend here on the left-to-right nature
+ * of &&, and the short-circuiting.
+ */
+ } while ((seconds_to_wait > 0) && job_runqueue());
+
+ while (seconds_to_wait > 0) {
+ Debug(DSCH, ("[%d] sleeping for %d seconds\n",
+ getpid(), seconds_to_wait))
+ seconds_to_wait = (int) sleep((unsigned int) seconds_to_wait);
+ }
+
}
#ifdef USE_SIGCHLD
@@ -296,13 +342,34 @@
char *argv[];
{
int argch;
- while ((argch = getopt(argc, argv, "x:")) != -1) {
+ while ((argch = getopt(argc, argv, "x:a:t:c:")) != -1) {
switch (argch) {
case 'x':
if (!set_debug_flags(optarg))
usage();
+ break;
+ case 'a':
+ AddWeight = atoi(optarg);
+ if (AddWeight > 100) { /* arbitrary value */
+ fprintf(stderr, "-a parameter %i too high. Max: 100\n\n", AddWeight);
+ usage();
+ }
+ break;
+ case 't':
+ LoadThreshold = atoi(optarg);
+ if (LoadThreshold < 1) {
+ fprintf(stderr, "-t parameter too low. Min: 1\n\n", LoadThreshold);
+ usage();
+ }
+ break;
+ case 'c':
+ TickDecay = atoi(optarg);
+ if (TickDecay < 1) {
+ fprintf(stderr, "-c parameter too low. Min: 1\n\n", TickDecay);
+ usage();
+ }
break;
default:
usage();
}
--- ../oldcron/cron.h Mon Mar 9 05:41:41 1998
+++ cron.h Sat Jan 2 16:50:13 1999
@@ -72,8 +72,16 @@
#define MAX_COMMAND 1000 /* max length of internally generated cmd */
#define MAX_ENVSTR 1000 /* max length of envvar=value\0 strings */
#define MAX_TEMPSTR 100 /* obvious */
#define MAX_UNAME 20 /* max length of username, should be overkill */
+#define MAXJOBLENHIGH 512 /* How many jobs in the run queue before
+ * increasing run speed
+ * every multiple of two higher than this
+ * will increase speed even more
+ */
+#define MAXJOBLENLOW 128 /* How many jobs in the run queue before
+ * returing to normal
+ */
#define ROOT_UID 0 /* don't change this, it really must be root */
#define ROOT_USER "root" /* ditto */
/* NOTE: these correspond to DebugFlagNames,
@@ -266,8 +274,15 @@
char *ProgramName;
int LineNumber;
time_t TargetTime;
+int LoadAverage = 0;
+int AddWeight = 0; /* default load balancing off */
+int TickDecay = 10; /* sane value if not given */
+int LoadThreshold = 100; /* sane value if not given */
+int BurstRate = 0;
+int NumJobs = 0;
# if DEBUGGING
int DebugFlags;
char *DebugFlagNames[] = { /* sync with #defines */
@@ -281,8 +296,14 @@
*DowNames[],
*ProgramName;
extern int LineNumber;
extern time_t TargetTime;
+extern int LoadAverage;
+extern int AddWeight;
+extern int TickDecay;
+extern int BurstRate;
+extern int LoadThreshold;
+extern int NumJobs;
# if DEBUGGING
extern int DebugFlags;
extern char *DebugFlagNames[];
# endif /* DEBUGGING */
--- ../oldcron/job.c Mon Mar 9 05:41:47 1998
+++ job.c Sat Jan 2 19:51:33 1999
@@ -55,22 +55,29 @@
/* add it to the tail */
if (!jhead) { jhead=j; }
else { jtail->next=j; }
jtail = j;
+ NumJobs++;
}
int
job_runqueue()
{
register job *j, *jn;
register int run = 0;
for (j=jhead; j; j=jn) {
+ if (LoadAverage > LoadThreshold) {
+ /* We've executed too much, clean up and stop. */
+ jhead = j;
+ return 1;
+ }
do_command(j->e, j->u);
jn = j->next;
free(j);
run++;
+ NumJobs--;
+ LoadAverage += AddWeight;
}
jhead = jtail = NULL;
return run;
}
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901261853.MAA15095>
