From owner-freebsd-hackers Thu Jan 28 10:04:51 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA19772 for freebsd-hackers-outgoing; Thu, 28 Jan 1999 10:04:51 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA19757 for ; Thu, 28 Jan 1999 10:04:45 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.2/8.9.1) id KAA09766; Thu, 28 Jan 1999 10:04:41 -0800 (PST) (envelope-from dillon) Date: Thu, 28 Jan 1999 10:04:41 -0800 (PST) From: Matthew Dillon Message-Id: <199901281804.KAA09766@apollo.backplane.com> To: "John S. Dyson" Cc: wes@softweyr.com, toasty@home.dragondata.com, hackers@FreeBSD.ORG Subject: Re: High Load cron patches - comments? References: <199901281734.MAA21561@y.dyson.net> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Throttling fork rate is also a valuable tool, and maybe a hard limit is good :also. It is all about how creative you are (or want to be) in your solution :-). Throttling the fork rate immediately leads to complaints. The perception of load is easily as important as the reality. We had put fork rate limits on both sendmail and popper and the result was a hundreds of calls to tech support :-(. I even had load-based feedback mechanisms. It was a disaster. The issue is that the load is an interactive load, not a batch load -- it is not acceptable to accept a connection and then pause for 5 minutes before yielding a shell prompt, process a popper request, or even respond with an SMTP HELO. Or handle a web request. The machine *must* be able to handle a temporary overload. Even *mail delivery* is an interactive load -- users have come to expect their email to propogate in 5 minutes or less and if it doesn't, we got complaints. While BEST is certainly not indicative of all situations, we cover the spectrum pretty well for general-use server installations: There are shell/web machines, mail servers, mail frontend and backend boxes, mailing list servers, news feed boxes, DNS boxes, radius boxes, etc etc etc. Each one operates under load differently and requires different hard limits. In BEST's earlier days, functions were combined ( we didn't have the money to buy lots of machines ). For example, the mail machines are tuned with fork limits such that sendmail is able to eat around 90% of the machine's resources worst case, but sendmail on the shell/web servers are tuned with fork limits such that they can't eat more then 50%. The only thing that ever worked reliably were absolute limits. The internet is so bursty that a machine *must* be able to accept a high load or overload situation for upwards of 10 or 15 minutes *without* slapping limits on processes. That is, it must allow the processes to build up in such burst-load situations. An absolute limit works extremely well for this sort of response requirement. It says, in effect, "I will allow you to overload the machine to a point, as long as you can recover from it eventually". i.e. even though you are not allowing any one subsystem to overload the machine, summing all the hard limits together yields a number > 100 so, in effect, you are allowing a subsystem to take the machine over the top. In effect, allowing a machine's load to pass 25 for a few minutes is perfectly acceptable so long a the machine can recover, but slapping load-limiting limitations on, say, forks ( this being different from the absolute limit ) simply creates a cascade failure situation earlier that might have been avoided if you let the machine run with it a little longer. The absolute limit in effect allows the machine to temporarily overload while still maintaining responsiveness, and operates on the assumption that the 'burst' will not last forever. Since the burst is already generating a higher load then you would nominally allow, this temporary overloading will do a better job for a short period of time. If the temporary overloading becomes more permanent, *both* the absolute limitation methodology and the dynamic feedback limiting methodology have the same problem: You've run out of cpu, or memory, or disk I/O, or all three... and no matter what you do you will piss some customer off. Both methodologies can prevent a machine from going poof, but I firmly believe that the absolute-limit methodology will allow a machine to recover much more quickly from an overload then a dynamic balancing methodology. That is my experience, anyway. -Matt Matthew Dillon :-- :John | Never try to teach a pig to sing, :dyson@iquest.net | it makes one look stupid :jdyson@nc.com | and it irritates the pig. : To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message