Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 May 2001 22:32:54 +0100
From:      Ian Dowse <iedowse@maths.tcd.ie>
To:        freebsd-bugs@FreeBSD.org
Cc:        Seth <seth@psychotic.aberrant.org>, iedowse@maths.tcd.ie
Subject:   Re: kern/27334: load average constantly above 1.0, even when idle
Message-ID:   <200105152232.aa55865@salmon.maths.tcd.ie>
In-Reply-To: Your message of "Tue, 15 May 2001 08:50:04 PDT." <200105151550.f4FFo4c75456@freefall.freebsd.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <200105151550.f4FFo4c75456@freefall.freebsd.org>, Seth writes:
>     0     4     0   0 -18  0     0    0 psleep DL    ??    0:00.08  (bufdaemo
>n)

> 11:42AM  up 13:24, 4 users, load averages: 1.06, 1.03, 1.01

Seth provided a number of further details which seemed to indicate
that `bufdaemon' was the process responsible for the load average
staying at 1.0. I'm not so sure that this is the case now, but
looking into this has highlighted 3 related issues that may be of
general interest:

- One of the conditions that causes a process to be is counted as
  contributing to the load average is:

	p->p_stat == SSLEEP &&
	    p->p_pri.pri_level <= PZERO &&
	    p->p_slptime == 0

  I think this is supposed to count un-interruptible sleeping
  processes that have slept for less than 1 second as being 'running'
  for the purposes of load calculation; this would probably count
  processes that are busy doing IO to disk. However, since loadav()
  is only ever called just after incrementing p_slptime in schedcpu(),
  this condition can never occur (I think).

  I seem to remember seeing a discussion of this issue somewhere,
  but there was no agreement as to whether it should be "fixed".
  Changing the above test to 'p->p_slptime <= 1' as NetBSD have
  done would result in loads on existing servers suddenly going
  up after an upgrade, which would confuse a lot of people.
  

- The load average calculation code is very succeptible to
  synchronisation with processes that have a regular periodic
  behaviour, where the period divides into 5 seconds. I suspect
  this may be the issue that Seth and others have observed, though
  I have no idea which process becomes synchronised.

  The simplest way to correct this issue is to add some randomness
  to the timing of the sweeps through the process table that
  determine the instantaneous load. Instead of scanning the processes
  from loadav() itself, loadav() could schedule a callout with a
  random delay in the range 0-5 seconds which would actually perform
  the scan. loadav() would still update the averunnable array at
  regular 5-second intervals, but the actual measurements wouldn't
  be periodic (this approach would delay load measurements a bit
  though).


- The bufdaemon kernel process seems to be woken up far more often
  than intended. On a recent -current machine, the command

	ps axo ucomm,sl | grep bufdaemon

  almost always shows a zero p_slptime for bufdaemon, even though
  `numdirtybuffers' remains well below `lodirtybuffers' all the
  time. The bufdaemon currently never does anything unless
  numdirtybuffers > lodirtybuffers, so maybe it is just a bug that
  bd_wakeup() doesn't check this?


Ian

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi? <200105152232.aa55865>