From owner-freebsd-bugs Wed Jul 26 01:49:27 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.11/8.6.6) id BAA11100 for bugs-outgoing; Wed, 26 Jul 1995 01:49:27 -0700 Received: from Root.COM (implode.Root.COM [198.145.90.1]) by freefall.cdrom.com (8.6.11/8.6.6) with ESMTP id BAA11093 for ; Wed, 26 Jul 1995 01:49:24 -0700 Received: from corbin.Root.COM (corbin [198.145.90.18]) by Root.COM (8.6.11/8.6.5) with ESMTP id BAA07013; Wed, 26 Jul 1995 01:48:49 -0700 Received: from localhost (localhost [127.0.0.1]) by corbin.Root.COM (8.6.11/8.6.5) with SMTP id BAA27035; Wed, 26 Jul 1995 01:50:02 -0700 Message-Id: <199507260850.BAA27035@corbin.Root.COM> To: Matt Dillon cc: Doug Rabson , bugs@freebsd.org Subject: Re: brelse() panic in nfs_read()/nfs_bioread() In-reply-to: Your message of "Wed, 26 Jul 95 00:57:03 PDT." <199507260757.AAA13857@blob.best.net> From: David Greenman Reply-To: davidg@Root.COM Date: Wed, 26 Jul 1995 01:50:01 -0700 Sender: bugs-owner@freebsd.org Precedence: bulk > Dima and I will bring BEST's system uptodate tonight. > > We have been having some rather severe (about once a day) > crashes on our second shell machine that are completely > different from the crashes we see on other machines. > > This second shell machine is distinguished from the others > in that it mounts user's home directories via NFS, so there > is a great deal more NFS client activity. > > Unfortunately, the crash locks things up.. it can partially > synch the disks but it can't dump core. The only message I > get is the panic message on the console: > > panic biodone: page busy < 0 > off: 180224, foff: 180224, valid: 0xFF, dirty:0 mapped:0 > resid: 4096, index: 0, iosize: 8192, lblkno: 22 > > I believe the failure is related to NFS. The question is, > is this a new bug or do any of the recent patches have a > chance at fixing it? Hard question considering the lack > of information. We've been working on this problem for the past week or so and believe it is fixed in 2.2-current and 2.1-stable. Please update your sources and let us know if the problem persists. > I have been noticing some pretty major cascade failures in the scheduling > algorithm. Basically it is impossibe to use nice() values to give one > process a reasonable priority over another. ... > The solution is that I've pretty much redone the scheduling core... about ... > We are going to install these scheduling changes tonight as well and I > will tell you on friday how well they worked. If they work well, I'd > like to submit them for review. We've messed with the scheduling algorithm quite a bit since the original one in 4.4BSD, and I think have made substantial improvements. Our main concern was that compute-bound processes must execute in a lower priority queue and there needs to be some form of backward inheritence of CPU consumption/ priority. Without this, people doing compiles (or other compute-intensive things) will quickly bring the system to it's knees. In the old model, CPU priorities were evaluated once per second. This is fine for slow computers that take a couple of minutes to compile your average C file, but on fast machines that can do it in 1-2 seconds, we found that the compile job was always in the foreground - making the system appear very sluggish to interactive users. I'd like to here more about how your algorithm works in real-world situations and especially how it functions across the spectrum of system loading. -DG