From owner-freebsd-hackers Thu Jan 28 11:02:05 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA25365 for freebsd-hackers-outgoing; Thu, 28 Jan 1999 11:02:05 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA25353 for ; Thu, 28 Jan 1999 11:02:03 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.2/8.9.1) id LAA10225; Thu, 28 Jan 1999 11:02:01 -0800 (PST) (envelope-from dillon) Date: Thu, 28 Jan 1999 11:02:01 -0800 (PST) From: Matthew Dillon Message-Id: <199901281902.LAA10225@apollo.backplane.com> To: "John S. Dyson" Cc: dyson@iquest.net, wes@softweyr.com, toasty@home.dragondata.com, hackers@FreeBSD.ORG Subject: Re: High Load cron patches - comments? References: <199901281845.NAA21716@y.dyson.net> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Imagine having 100 sendmails fork off instantaneously!!! That would certainly :cause interactive performance to glitch a little, wouldn't it? How big is too :big? Is 1000 sendmails too many, or is 100 or is 10? What are the real limits The sendmail limit on a shell machine might be set to, say, 150. The nominal sendmail load is typically 10-20 sendmail processes running at once. If 150 sendmails are fork instantly, the machine glitches for about 2 seconds. The issue is not the fork, but the memory and disk I/O. The sendmails (as an example) do not start to eat memory and disk I/O until *after* they've forked and sent their HELO. The memory and disk I/O utilization depends on what the client is trying to do with the sendmail. This is something that is simply not predictable. The problem is that when this situation occurs, we *WANT* to give the sendmails nearly the entire machine's resources for at least a little while to try to handle the situation. The hard limit, in this case, is designed such that the machine will not fall on its face if all 150 clients try to do bad things with sendmail. If the overload persists, then something out of the ordinary is going on and sysops intervention is required in any case. But *most* of the time when the hard limit is hit, it is a temporary bursty-load situation that has solved itself by the time the sysop logs in. In this regard, allowing the machine to be temporaily overloaded is good because it allows the burst to get in, hog the machine, and then get out. The machine 'glitches' for a much shorter period of time then it would if you had tried to spread the load out across a longer period of time. The hard limit prevents a cascade failure ( not being able to retire processes quickly enough verses the rate of new incoming connections ), but otherwise allows the machine to run with a high load. Attempting to rate limit that sort of thing results in a longer 'glitch' and more complaints. This is why we switched to using simple absolute limits on subsystems -- because they produced fewer complaints, fewer crashes, and less maintenance. Most of the problems fix themselves. The ones that don't require intervention in anycase. That's for non-dedicated machines -- i.e. a shell/web machine that is also running sendmail. For dedicated machines we operate the hard limits at 90% machine capacity and use it to crop extreme peak cases or handle catch-up situations. i.e. if AOL goes down and then comes back up an hour later, we want our sendmail boxes to run themselves right up to the limit (250 sendmail processes or so) receiving mail because we know if we don't, it will take 8 hours to recover the system back to steady state instead of 1 hour. This is a case where the sysop intervention is simply in monitoring the system over a period of a few hours to make sure it is able catch up. -Matt Matthew Dillon :John | Never try to teach a pig to sing, To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message