Date: Tue, 15 May 2001 22:32:54 +0100 From: Ian Dowse <iedowse@maths.tcd.ie> To: freebsd-bugs@FreeBSD.org Cc: Seth <seth@psychotic.aberrant.org>, iedowse@maths.tcd.ie Subject: Re: kern/27334: load average constantly above 1.0, even when idle Message-ID: <200105152232.aa55865@salmon.maths.tcd.ie> In-Reply-To: Your message of "Tue, 15 May 2001 08:50:04 PDT." <200105151550.f4FFo4c75456@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
In message <200105151550.f4FFo4c75456@freefall.freebsd.org>, Seth writes: > 0 4 0 0 -18 0 0 0 psleep DL ?? 0:00.08 (bufdaemo >n) > 11:42AM up 13:24, 4 users, load averages: 1.06, 1.03, 1.01 Seth provided a number of further details which seemed to indicate that `bufdaemon' was the process responsible for the load average staying at 1.0. I'm not so sure that this is the case now, but looking into this has highlighted 3 related issues that may be of general interest: - One of the conditions that causes a process to be is counted as contributing to the load average is: p->p_stat == SSLEEP && p->p_pri.pri_level <= PZERO && p->p_slptime == 0 I think this is supposed to count un-interruptible sleeping processes that have slept for less than 1 second as being 'running' for the purposes of load calculation; this would probably count processes that are busy doing IO to disk. However, since loadav() is only ever called just after incrementing p_slptime in schedcpu(), this condition can never occur (I think). I seem to remember seeing a discussion of this issue somewhere, but there was no agreement as to whether it should be "fixed". Changing the above test to 'p->p_slptime <= 1' as NetBSD have done would result in loads on existing servers suddenly going up after an upgrade, which would confuse a lot of people. - The load average calculation code is very succeptible to synchronisation with processes that have a regular periodic behaviour, where the period divides into 5 seconds. I suspect this may be the issue that Seth and others have observed, though I have no idea which process becomes synchronised. The simplest way to correct this issue is to add some randomness to the timing of the sweeps through the process table that determine the instantaneous load. Instead of scanning the processes from loadav() itself, loadav() could schedule a callout with a random delay in the range 0-5 seconds which would actually perform the scan. loadav() would still update the averunnable array at regular 5-second intervals, but the actual measurements wouldn't be periodic (this approach would delay load measurements a bit though). - The bufdaemon kernel process seems to be woken up far more often than intended. On a recent -current machine, the command ps axo ucomm,sl | grep bufdaemon almost always shows a zero p_slptime for bufdaemon, even though `numdirtybuffers' remains well below `lodirtybuffers' all the time. The bufdaemon currently never does anything unless numdirtybuffers > lodirtybuffers, so maybe it is just a bug that bd_wakeup() doesn't check this? Ian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi? <200105152232.aa55865>