From owner-freebsd-hackers Thu Jan 28 09:48:45 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id JAA17909 for freebsd-hackers-outgoing; Thu, 28 Jan 1999 09:48:45 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from home.dragondata.com (home.dragondata.com [204.137.237.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA17904 for ; Thu, 28 Jan 1999 09:48:43 -0800 (PST) (envelope-from toasty@home.dragondata.com) Received: (from toasty@localhost) by home.dragondata.com (8.9.2/8.9.2) id LAA02377; Thu, 28 Jan 1999 11:48:23 -0600 (CST) From: Kevin Day Message-Id: <199901281748.LAA02377@home.dragondata.com> Subject: Re: High Load cron patches - comments? In-Reply-To: <199901281709.JAA07773@apollo.backplane.com> from Matthew Dillon at "Jan 28, 1999 9: 9:30 am" To: dillon@apollo.backplane.com (Matthew Dillon) Date: Thu, 28 Jan 1999 11:48:23 -0600 (CST) Cc: dyson@iquest.net, wes@softweyr.com, hackers@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL43 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :> > :> Especially as we start diving more into SMP and threaded applications; > :> which will need some effective means of throttling themselves. The > :> problem with Matt's comment above is he doesn't offer any useful > :> alternative, and couting child processes just isn't an effective means > :> of throttling the overall load on a machine. > :> > :It is *sometimes* appropriate to criticize, even when alternatives aren't > :provided. The kind of technique that I have successfully experimented with > :is a scheme that has two phases: A costing mechanism and a stats mechanism. > : > :The costing mechanism is a direct call from when the resource is attempted > :to be allocated. It checks immediately if the cost (and recent incurred > > Well, actually I would put forth that limiting the number of processes > any one subsystem is allowed to fork is perfectly acceptable and generally > produces better results then trying to dynamically balance the load > between subsystems on any given machine. > > The lesson I learned at BEST was simple: When you are out of cpu, you > are out of cpu. All that dynamically balancing the load does is cause > ALL of the subsystems to slow down, and cause all of the subsystems to > start to clog the system. In a very heavily loaded system ( aka our old > IRIX box, shellx, which had 20,000 heavily used accounts ), it only took > a small imbalance to create a fork cascade failure. If sendmail got > a little overloaded, popper would not be able to retire connections > quickly enough. If popper got a little overloaded, sendmail would not > be able to retire connections quickly enough. > To step back in here again... It's kinda interesting the discussion this generated. I think we're all discusssing slightly different problems, so we're coming up with different ways to address them. Here's my problem. Cron turned into a massive forkbomb every minute, and especially every 10 minutes. Not only did the system nearly go dead at those points, but at times, it took 5 minutes to catch up. Supposed you have to run 60 jobs per minute, and they all take around a second to execute. If you run them one second at a time, you're likely to get done with them every minute. If you try to run them all at once, you're likely not to get finished after a minute, causing a backlog. My only goal was to spread cron's jobs out a bit, so I didn't saturate my nfs server's ethernet every 10 mins. When users are allowed to submit their own cron jobs, and times to run, *and* the application they are using suggests to them that */10 or even * is correct, cron needs to be able to cope with this. While I think a way that took how busy the CPU is, rather than how busy cron is would be a better metric to go by, it's obviously not as simple as it looks at the moment. Load average simply doesn't work, especially for a machine that's a heavy heavy NFS client. I can see load averages of 12.00, and the CPU being completely idle. The NFS server is just busy. My patches have a feature where they'll continually increasing the fork speed, if it's obvious that the backlog is getting to some silly proportions. Perhaps this is wrong, and it should just drop new jobs. In my case this probably wouldn't be bad, but I think that's definately 'breaking' cron, and should be an optional feature. What I came up with, sounds a lot like John Dyson's sample piece of code, except I used integer math, and he's using floating point. (He's also using DSP/PLL terminology a bit more, too. :) I wasn't exactly sure where John suggests putting that code, cron, or somewhere deeper down? Even if it's not an optimal solution, i'm sure plenty of other FreeBSD users could use something like a rate limited cron. Kevin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message