From owner-freebsd-hackers  Thu Jan 28 09:09:35 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id JAA12193
          for freebsd-hackers-outgoing; Thu, 28 Jan 1999 09:09:35 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA12179
          for <hackers@FreeBSD.ORG>; Thu, 28 Jan 1999 09:09:33 -0800 (PST)
          (envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.2/8.9.1) id JAA07773;
	Thu, 28 Jan 1999 09:09:30 -0800 (PST)
	(envelope-from dillon)
Date: Thu, 28 Jan 1999 09:09:30 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199901281709.JAA07773@apollo.backplane.com>
To: "John S. Dyson" <dyson@iquest.net>
Cc: wes@softweyr.com (Wes Peters), dyson@iquest.net,
        toasty@home.dragondata.com, hackers@FreeBSD.ORG
Subject: Re: High Load cron patches - comments?
References:  <199901281611.LAA21412@y.dyson.net>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:> 
:> Especially as we start diving more into SMP and threaded applications;
:> which will need some effective means of throttling themselves.  The
:> problem with Matt's comment above is he doesn't offer any useful
:> alternative, and couting child processes just isn't an effective means
:> of throttling the overall load on a machine.
:> 
:It is *sometimes* appropriate to criticize, even when alternatives aren't
:provided.  The kind of technique that I have successfully experimented with
:is a scheme that has two phases:  A costing mechanism and a stats mechanism.
:
:The costing mechanism is a direct call from when the resource is attempted
:to be allocated.  It checks immediately if the cost (and recent incurred

    Well, actually I would put forth that limiting the number of processes
    any one subsystem is allowed to fork is perfectly acceptable and generally
    produces better results then trying to dynamically balance the load
    between subsystems on any given machine.

    The lesson I learned at BEST was simple:  When you are out of cpu, you
    are out of cpu.  All that dynamically balancing the load does is cause
    ALL of the subsystems to slow down, and cause all of the subsystems to
    start to clog the system.  In a very heavily loaded system ( aka our old
    IRIX box, shellx, which had 20,000 heavily used accounts ), it only took 
    a small imbalance to create a fork cascade failure.  If sendmail got
    a little overloaded, popper would not be able to retire connections
    quickly enough.  If popper got a little overloaded, sendmail would not
    be able to retire connections quickly enough.

    If sendmail is operating normally but, say, the popper goes crazy, it
    is not appropriate to slap limits on sendmail.  If sendmail is checking
    the load, this is precisely what happens.

    What we do now is put an absolute limit on each subsystem that weighs
    in at around 70% of the machine's total resources.  That doesn't mean
    the subsystem *gets* 70% of the machine's total resource, it just means
    that the subsystem can't *exceed* 70%.  so, for example, sendmail is
    limited to around 200 processes.

    When a subsystem gets attacked or fails through other machines, the 
    machine slows down... but the machine does *not* enter into a cascade 
    failure situation.  The moment the attack ceases, the machine recovers
    pretty quickly.  The key is that the attack may max out one subsystem
    and slow down others, but it will not indirectly cause other subsystems
    to try to limit themselves just because the load average goes up.

    AOLs mail system used to barf once or twice a week, either creating large
    mail backlogs on our machines when down, or making hundreds ( even 
    thousands ) of incoming connections when their system came back up after
    a long downtime. 

    It is simply not possible for a machine to predict instantanious load.
    No matter what you do, therefore using the load for a feedback mechanism
    is always going to be problematic.  The reason it is not possible to
    predict instantanious load is simple:  The act of allocating resources
    does not in of itself generate a load, it is *using* those resources
    that generates the load.

    For example, taking sendmail again:  When sendmail clogs up on outgoing
    connections it typically spends memory resources but no cpu resources.
    When sendmail clogs up on incoming connections it typically spends cpu 
    resources AND memory resources.  If sendmail clogs up on lots of
    incoming connections being slowed down by a network screwup 'the internet
    is lossy today', they may not eat cpu or memory, but when the WAN link
    suddenly clears up you could get a massive load on the preexisting
    connections without any additional forks.

    For the first two years of BEST's existance, I literally spent day and
    night trying to balance things on overloaded machines:  The web server,
    sendmail, news, user load, popper, and so forth.  Load-relating balancing
    never worked well.  It took a year before I realized that it wouldn't
    work at all.  After we started slapping absolute limits on things, the
    machines stopped crashing due to multi-subsysem fork cascade failures.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message