From owner-freebsd-current  Sat Aug 22 20:20:19 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id UAA13193
          for freebsd-current-outgoing; Sat, 22 Aug 1998 20:20:19 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA13172;
          Sat, 22 Aug 1998 20:20:14 -0700 (PDT)
          (envelope-from tlambert@usr04.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id UAA04959;
	Sat, 22 Aug 1998 20:19:31 -0700 (MST)
Received: from usr04.primenet.com(206.165.6.204)
 via SMTP by smtp02.primenet.com, id smtpd004906; Sat Aug 22 20:19:23 1998
Received: (from tlambert@localhost)
	by usr04.primenet.com (8.8.5/8.8.5) id UAA21616;
	Sat, 22 Aug 1998 20:19:17 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199808230319.UAA21616@usr04.primenet.com>
Subject: Re: softupdates and smp crash
To: dyson@iquest.net
Date: Sun, 23 Aug 1998 03:19:16 +0000 (GMT)
Cc: tlambert@primenet.com, sos@FreeBSD.ORG, croot@btp1da.phy.uni-bayreuth.de,
        regnauld@deepo.prosa.dk, current@FreeBSD.ORG, smp@FreeBSD.ORG
In-Reply-To: <199808220523.AAA19739@dyson.iquest.net> from "John S. Dyson" at Aug 22, 98 00:23:34 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > I'm also not convinced that this is the only possible cause of
> > the problem; the VM code is hardly "assert" protected everywhere,
> > so diagnosing this thing is not trivial.  Look at the VM fixes
> > I recently did, which killed the bugs Karl Denniger was seeing
> > in 75% of the cases, leaving 25% of the cases "clustered" (in his
> > words), indicating a seperate problem, in addition to the ones I
> > fixed, in a periodically executing code path.  I had suspected
> > that this would be the case when I made the fix, since it doesn't
> > account for the buggy behaviour I'm personally seeing.  8-(.
> > 
> I have to chime in here -- some of the "fixes" are work-arounds, and
> there are still underlying VM problems.  It might be "good enough"
> for 3.0, but I would suggest preparing for some rework to find the
> root cause for the problem.

Can you identify which of the "better fixes" are workarounds?

The two fixes I have done, and now have enough confidence in to
want them committed, are:


o	The "valid = 0 at wrong time" that you told me about.

o	The "setting the recorded size of a backing object to
	a page boundary instead of to the actual size".

You could argue that this second, which promiscuously sets the
vnode object size after instancing the object, is a workaround
which should be repaired by adding a "real_size" parameter to the
allocator, but the fact is that the setsize code path is not a
problem at the only time when it is called (ie: it can't be called
at interrupt level as a result of a disk I/O completion interrupt);
so the window I noted has been analyzed, and is not there.  The code
is ugly, but it does the intended job, without side effects.


The other "fix", the "back up one" is, indeed, a kludge that happens
to work for some cases, but I would not want that one committed (I
explicitly posted that it should be tried as a dianostic).

The only other changes packaged with the two real changes, above, are
panics in the diagnostic case, which is basically an "assert" that
map contents aren't being stomped on page insertion, and a lock
acquisition logging that was arguably erroneously missing.  I haven't
been able to get anyone to run with the "DIAGNOSTIC" flag to test
the first nor the "MAP_LOCK_DIAGNOSTIC" for the second (but they
run without error here, where I can't trigger the failures at will).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message