From owner-freebsd-current Tue Jul 17 9: 4:56 2001 Delivered-To: freebsd-current@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id CFDDE37B405 for ; Tue, 17 Jul 2001 09:04:51 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.4/8.11.2) with ESMTP id f6HG4Tv88582; Tue, 17 Jul 2001 09:04:29 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200107152319.aa46183@salmon.maths.tcd.ie> Date: Tue, 17 Jul 2001 09:04:34 -0700 (PDT) From: John Baldwin To: Ian Dowse Subject: RE: Load average synchronisation and phantom loads Cc: freebsd-current@FreeBSD.org Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 15-Jul-01 Ian Dowse wrote: > > There are a few PRs and a number of messages in the mailing list > archives that describe a problem where the load average occasionally > remains at 1.0 or greater even though top(1) reports that the CPU > is nearly 100% idle. The PRs I could find in a quick search are > kern/21155, kern/23448 and kern/27334. > > The most probable cause for this effect is a synchonisation between > the load measurement and processes that periodically run for short > amounts of time. The load average is based on samples of the number > of running processes taken at exact 5-second intervals. If some > other process regularly runs with a period that divides into 5 > seconds, that process may always be seen as running even though it > may only run for a tiny fraction of the available CPU time. > > A very likely candidate process is bufdaemon; it sleeps for 1 second > at a time, so if it happens to get scheduled in the same tick as > the load measurement and before the load measurement, it will always > be seen as running. > > The patch below causes the samples of running processes to be > somewhat randomised; instead of being taken every 5 seconds, the > gap now varies in the range 4 to 6 seconds, so that synchronisation > should no longer occur. Would there be any objections to my committing > this? > > Two comments on the patch: > - This patch removes the SSLEEP case in loadav(), because in the > existing code, p->p_slptime has always just been incremented in > schedcpu() so this case never made a difference. To keep the same > load average behaviour when loadav() is called at different times, > this case needs to be removed. > > - The load average calculation now has really nothing to do with > the VM system, so it could be moved elsewhere. I've just left > it in vm_meter.c because that's where it's always been. sys/kern/kern_synch.c perhaps? Might be best to do that as a separate commit however. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message