From owner-freebsd-current  Sat Jan 23 11:43:33 1999
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id LAA08102
          for freebsd-current-outgoing; Sat, 23 Jan 1999 11:43:33 -0800 (PST)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from spinner.netplex.com.au (spinner.netplex.com.au [202.12.86.3])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA08093
          for <current@FreeBSD.ORG>; Sat, 23 Jan 1999 11:43:27 -0800 (PST)
          (envelope-from peter@netplex.com.au)
Received: from spinner.netplex.com.au (localhost [127.0.0.1])
	by spinner.netplex.com.au (8.9.2/8.9.2/Netplex) with ESMTP id DAA00782;
	Sun, 24 Jan 1999 03:43:04 +0800 (WST)
	(envelope-from peter@spinner.netplex.com.au)
Message-Id: <199901231943.DAA00782@spinner.netplex.com.au>
X-Mailer: exmh version 2.0.2 2/24/98
To: Matthew Dillon <dillon@apollo.backplane.com>, current@FreeBSD.ORG
Subject: Re: panic: found dirty cache page 0xf046f1c0 
In-reply-to: Your message of "Sat, 23 Jan 1999 17:25:03 +0800."
             <199901230925.RAA00489@spinner.netplex.com.au> 
Date: Sun, 24 Jan 1999 03:43:03 +0800
From: Peter Wemm <peter@netplex.com.au>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Peter Wemm wrote:
> Matthew Dillon wrote:
[..]
> >     Try changing the panic in vm/vm_page.c to a printf() ( 
> 
> I'll do that.

BTW; what are the dangers of this?  lost disk writes or corruption?  Can 
we (as a workaround) push the page that we found back onto a dirty queue 
and try again after some diagnostics?

> FWIW, this has happened while the system has been nearly quiescent all the 
> way through to being thrashed with parallel cvs updates etc running.  Most 
> times it waits till exmh is running.  Last time (when recompiling without 
> SMP) it crashed when it came to linking the kernel (and no exmh running).
> 
> I'll see if it still crashes in uniprocessor mode, if so, I'll put some 
> debugging in and see if I can find anything out.  The kernel was last 
> built on Jan 16, and that one works fine still, so I'm pretty sure it 
> isn't hardware.

It crashed in uniprocessor mode about 60 seconds after sending this mail. 
It's got a really trimmed down kernel config and no modules loaded or in 
use.  I have not disabled softupdates yet, that's next.

This particular machine won't reboot by itself after it's been running in 
SMP mode (it's really old), so I have to manually reset it.  I went to 
sleep straight after that, and it ran the whole time I was asleep.  After 
getting up again, I started exmh, and it crashed 30 seconds later.  There 
was no swapping in progress, I have been tunning top -s1 to see what the 
swap and memory state is when it dies.  Unfortunately I lost the last one, 
but it generally looks like this:

last pid:  6293;  load averages:  0.51,  0.52,  0.65    up 0+01:40:54  14:19:06
40 processes:  1 running, 39 sleeping
CPU states:  4.6% user,  0.0% nice, 11.8% system,  1.5% interrupt, 82.1% idle
Mem: 19M Active, 9236K Inact, 13M Wired, 3068K Cache, 4691K Buf, 508K Free
Swap: 120M Total, 128K Used, 120M Free

This machine has 48M of ram, one swap partition only.

Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
(ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
console corruption was happening at the instant that de0 was being 
configured with ifconfig.  exmh is running to a remote display over that 
de0 interface.

Under Jan 16 3.0-current, I do not get that tranmitter underflow..

The only thin I can think of about if_de that's unusual that is VM related
(apart from the complexity of the code) is that it uses configmalloc().  I 
wonder if this is somehow setting the scene for the later failures?  It's 
certainly suspicious that has done strange things when being ifconfig'ed, 
including things like trashing the serial console on no less than a dozen 
occasions.

Cheers,
-Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message