Date: Thu, 28 Jan 1999 12:26:12 -0600 (CST) From: Kevin Day <toasty@home.dragondata.com> To: dillon@apollo.backplane.com (Matthew Dillon) Cc: dyson@iquest.net, wes@softweyr.com, hackers@FreeBSD.ORG Subject: Re: High Load cron patches - comments? Message-ID: <199901281826.MAA06446@home.dragondata.com> In-Reply-To: <199901281817.KAA09891@apollo.backplane.com> from Matthew Dillon at "Jan 28, 1999 10:17:37 am"
next in thread | previous in thread | raw e-mail | index | archive | help
> :Here's my problem. > : > :Cron turned into a massive forkbomb every minute, and especially every 10 > :minutes. Not only did the system nearly go dead at those points, but at > :times, it took 5 minutes to catch up. > : > :Supposed you have to run 60 jobs per minute, and they all take around a > :second to execute. If you run them one second at a time, you're likely to > :... > : > :My only goal was to spread cron's jobs out a bit, so I didn't saturate my > :nfs server's ethernet every 10 mins. When users are allowed to submit their > :... > :While I think a way that took how busy the CPU is, rather than how busy cron > :is would be a better metric to go by, it's obviously not as simple as it > :... > :My patches have a feature where they'll continually increasing the fork > :speed, if it's obvious that the backlog is getting to some silly > :proportions. Perhaps this is wrong, and it should just drop new jobs. In my > :case this probably wouldn't be bad, but I think that's definately 'breaking' > :cron, and should be an optional feature. > :... > :What I came up with, sounds a lot like John Dyson's sample piece of code, > :except I used integer math, and he's using floating point. (He's also using > :... > :Kevin > > I think a rate limited cron is a good solution, but I would also ( if you > haven't already ) supply a max-parallel-jobs option. Increasing the > fork rate works to a degree, but you also have to make sure that cron > (A) cannot kill the machine, and (B) cannot fall into a fork cascade > failure by overloading the machine so much that the jobs can't be > retired faster then new jobs are queued. > > So, for example, you might have a feedback parameter X but you should > also have an absolute limit Y, which you set relatively high. > > Lets see... here's a good example. Lets say that every 10 minutes cron > decides to fork off 50 jobs simultaniously, but at midnight and noon > cron wants to fork off 200 jobs simultaniously. > > Lets say that every 10 minutes, with nominal delaying tactics and no hard > limits, you are able to limit the maximum number of parallel jobs to, > say, 35. Say you want a relatively sharp feedback to bump up the fork > rate to get the jobs done before the next 10 minute period occurs. > > These same parameters, however, could fail utterly at noon and midnight. > At noon and midnight the rate parameters that worked for the 10 minute > jobs might result, say, in 120 parallel jobs. > > This is where the hard limit comes in. If you specified a hard limit > that was nominally greater then the 10 minute parallel job load, but > less then the midnight and noon job load, you effectively allow your > nominal case through but force the jobs that get run at midnight > and noon to 'spread out' a little more. > > You might specify a hard limit of, for example, 60 parallel jobs. This > is well within the 35 parallel jobs that the fork-rate limit produces > on the 10 minute jobs but prevents the midnight and noon jobs from > overloading the system. > > In effect, your feedback parameter solves your NFS burstiness problem > under 'normal' load conditions and the absolute limit handles the more > severe noon & midnight cases. > > -Matt > Matthew Dillon > <dillon@backplane.com> > I considered a 'maximum children' limit. How do you prevent a user from breaking cron by executing 100 shell scripts that have 'sleep 10000' in them? Kevin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901281826.MAA06446>