From owner-freebsd-hackers Thu Jan 28 09:09:35 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id JAA12193 for freebsd-hackers-outgoing; Thu, 28 Jan 1999 09:09:35 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA12179 for ; Thu, 28 Jan 1999 09:09:33 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.2/8.9.1) id JAA07773; Thu, 28 Jan 1999 09:09:30 -0800 (PST) (envelope-from dillon) Date: Thu, 28 Jan 1999 09:09:30 -0800 (PST) From: Matthew Dillon Message-Id: <199901281709.JAA07773@apollo.backplane.com> To: "John S. Dyson" Cc: wes@softweyr.com (Wes Peters), dyson@iquest.net, toasty@home.dragondata.com, hackers@FreeBSD.ORG Subject: Re: High Load cron patches - comments? References: <199901281611.LAA21412@y.dyson.net> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> :> Especially as we start diving more into SMP and threaded applications; :> which will need some effective means of throttling themselves. The :> problem with Matt's comment above is he doesn't offer any useful :> alternative, and couting child processes just isn't an effective means :> of throttling the overall load on a machine. :> :It is *sometimes* appropriate to criticize, even when alternatives aren't :provided. The kind of technique that I have successfully experimented with :is a scheme that has two phases: A costing mechanism and a stats mechanism. : :The costing mechanism is a direct call from when the resource is attempted :to be allocated. It checks immediately if the cost (and recent incurred Well, actually I would put forth that limiting the number of processes any one subsystem is allowed to fork is perfectly acceptable and generally produces better results then trying to dynamically balance the load between subsystems on any given machine. The lesson I learned at BEST was simple: When you are out of cpu, you are out of cpu. All that dynamically balancing the load does is cause ALL of the subsystems to slow down, and cause all of the subsystems to start to clog the system. In a very heavily loaded system ( aka our old IRIX box, shellx, which had 20,000 heavily used accounts ), it only took a small imbalance to create a fork cascade failure. If sendmail got a little overloaded, popper would not be able to retire connections quickly enough. If popper got a little overloaded, sendmail would not be able to retire connections quickly enough. If sendmail is operating normally but, say, the popper goes crazy, it is not appropriate to slap limits on sendmail. If sendmail is checking the load, this is precisely what happens. What we do now is put an absolute limit on each subsystem that weighs in at around 70% of the machine's total resources. That doesn't mean the subsystem *gets* 70% of the machine's total resource, it just means that the subsystem can't *exceed* 70%. so, for example, sendmail is limited to around 200 processes. When a subsystem gets attacked or fails through other machines, the machine slows down... but the machine does *not* enter into a cascade failure situation. The moment the attack ceases, the machine recovers pretty quickly. The key is that the attack may max out one subsystem and slow down others, but it will not indirectly cause other subsystems to try to limit themselves just because the load average goes up. AOLs mail system used to barf once or twice a week, either creating large mail backlogs on our machines when down, or making hundreds ( even thousands ) of incoming connections when their system came back up after a long downtime. It is simply not possible for a machine to predict instantanious load. No matter what you do, therefore using the load for a feedback mechanism is always going to be problematic. The reason it is not possible to predict instantanious load is simple: The act of allocating resources does not in of itself generate a load, it is *using* those resources that generates the load. For example, taking sendmail again: When sendmail clogs up on outgoing connections it typically spends memory resources but no cpu resources. When sendmail clogs up on incoming connections it typically spends cpu resources AND memory resources. If sendmail clogs up on lots of incoming connections being slowed down by a network screwup 'the internet is lossy today', they may not eat cpu or memory, but when the WAN link suddenly clears up you could get a massive load on the preexisting connections without any additional forks. For the first two years of BEST's existance, I literally spent day and night trying to balance things on overloaded machines: The web server, sendmail, news, user load, popper, and so forth. Load-relating balancing never worked well. It took a year before I realized that it wouldn't work at all. After we started slapping absolute limits on things, the machines stopped crashing due to multi-subsysem fork cascade failures. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message