From owner-freebsd-current Sat Aug 22 20:20:19 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id UAA13193 for freebsd-current-outgoing; Sat, 22 Aug 1998 20:20:19 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA13172; Sat, 22 Aug 1998 20:20:14 -0700 (PDT) (envelope-from tlambert@usr04.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id UAA04959; Sat, 22 Aug 1998 20:19:31 -0700 (MST) Received: from usr04.primenet.com(206.165.6.204) via SMTP by smtp02.primenet.com, id smtpd004906; Sat Aug 22 20:19:23 1998 Received: (from tlambert@localhost) by usr04.primenet.com (8.8.5/8.8.5) id UAA21616; Sat, 22 Aug 1998 20:19:17 -0700 (MST) From: Terry Lambert Message-Id: <199808230319.UAA21616@usr04.primenet.com> Subject: Re: softupdates and smp crash To: dyson@iquest.net Date: Sun, 23 Aug 1998 03:19:16 +0000 (GMT) Cc: tlambert@primenet.com, sos@FreeBSD.ORG, croot@btp1da.phy.uni-bayreuth.de, regnauld@deepo.prosa.dk, current@FreeBSD.ORG, smp@FreeBSD.ORG In-Reply-To: <199808220523.AAA19739@dyson.iquest.net> from "John S. Dyson" at Aug 22, 98 00:23:34 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I'm also not convinced that this is the only possible cause of > > the problem; the VM code is hardly "assert" protected everywhere, > > so diagnosing this thing is not trivial. Look at the VM fixes > > I recently did, which killed the bugs Karl Denniger was seeing > > in 75% of the cases, leaving 25% of the cases "clustered" (in his > > words), indicating a seperate problem, in addition to the ones I > > fixed, in a periodically executing code path. I had suspected > > that this would be the case when I made the fix, since it doesn't > > account for the buggy behaviour I'm personally seeing. 8-(. > > > I have to chime in here -- some of the "fixes" are work-arounds, and > there are still underlying VM problems. It might be "good enough" > for 3.0, but I would suggest preparing for some rework to find the > root cause for the problem. Can you identify which of the "better fixes" are workarounds? The two fixes I have done, and now have enough confidence in to want them committed, are: o The "valid = 0 at wrong time" that you told me about. o The "setting the recorded size of a backing object to a page boundary instead of to the actual size". You could argue that this second, which promiscuously sets the vnode object size after instancing the object, is a workaround which should be repaired by adding a "real_size" parameter to the allocator, but the fact is that the setsize code path is not a problem at the only time when it is called (ie: it can't be called at interrupt level as a result of a disk I/O completion interrupt); so the window I noted has been analyzed, and is not there. The code is ugly, but it does the intended job, without side effects. The other "fix", the "back up one" is, indeed, a kludge that happens to work for some cases, but I would not want that one committed (I explicitly posted that it should be tried as a dianostic). The only other changes packaged with the two real changes, above, are panics in the diagnostic case, which is basically an "assert" that map contents aren't being stomped on page insertion, and a lock acquisition logging that was arguably erroneously missing. I haven't been able to get anyone to run with the "DIAGNOSTIC" flag to test the first nor the "MAP_LOCK_DIAGNOSTIC" for the second (but they run without error here, where I can't trigger the failures at will). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message