Date: Wed, 5 Jul 1995 17:52:34 +0200 From: esser@zpr.uni-koeln.de (Stefan Esser) To: Voradesh Yenbut <yenbut@cs.washington.edu> Cc: hackers@freebsd.org Subject: Re: One cause of 2.05R instability found Message-ID: <199507051552.AA06352@FileServ1.MI.Uni-Koeln.DE>
next in thread | raw e-mail | index | archive | help
Regarding problems with panics:
Fatal trap 12: page fault while in kernel mode
Is this a single case ?
Who else (other than <yenbut@cs.washington.edu> Voradesh Yenbut)
sees this ???
} A few days ago, I committed a 90MHz pentium system running 2.05R to be
} a news server. The system was not stable at all. It kept on crashing
} within 2 hours with "Fatal trap 12: page fault while in kernel mode"
} and fault code "supervisor read, page not present". The crash always
} happened at the same instruction pointer, i.e., ncr_complete+195
} (as reported by gdb; I don't have the hex number with me) in ncr.c.
}
} In ncr.c, ncr_complte+195 is at the following if statement:
}
} if (DEBUG_FLAGS & DEBUG_TINY)
} printf ("CCB=%x STAT=%x/%x\n", (unsigned)cp & 0xfff,
} cp->host_status,cp->scsi_status);
No, sorry, this statement isn't there (at ncr_complete+195) for sure ...
Except if you changed the sources, or if you configured NCR debugging
in your kernel config file, eg. by:
options "SCSI_DEBUG_FLAGS=0x80"
} where DEBUG_FLAGS is ncr_debug declared in ncr.c as
}
} static int ncr_debug = SCSI_DEBUG_FLAGS;
No, not really ... The complete code is:
#ifdef SCSI_DEBUG_FLAGS
#define DEBUG_FLAGS ncr_debug
#else /* SCSI_DEBUG_FLAGS */
#define SCSI_DEBUG_FLAGS 0
#define DEBUG_FLAGS 0
#endif /* SCSI_DEBUG_FLAGS */
and SCSI_DEBUG_FLAGS is undefined by default. This makes
DEBUG_FLAGS a constant zero, and GCC generates no code
at all for the if statement or the printf() ...
} I commented out the if statement, rebuilt and installed the new
} kernel. The system has been running fine with the new kernel for two
} days (though I still keep my fingers crossed).
Well, since there shouldn't have been any code generated
before, there shouldn't be any difference ...
The NCR code hasn't changed over many months until after
FreeBSD-2.0.5R has been released, and I don't have any
other report of "trap 12: page fault while in kernel mode"
problems. So I don't suppose this to be a problem caused
by the driver.
But I have got to admit, that a panic within some subroutine
generally points at some problem in close proximity ...
For further diagnosis, I need to know:
Did you change the sources or use any NCR specific kernel
config file options ?
How did you identify the suspected error location in ncr.c ?
; ncb_profile (np, cp);
pushl %ecx
pushl 8(%ebp)
call _ncb_profile
addl $8,%esp
; if (DEBUG_FLAGS & DEBUG_TINY)
; printf ("CCB=%x STAT=%x/%x\n", (unsigned)cp & 0xfff,
; cp->host_status,cp->scsi_status);
; xp = cp->xfer;
movl 12(%ebp),%ecx
movl 452(%ecx),%edi
; cp->xfer = NULL;
movl $0,452(%ecx)
Alll data structures should remain unchanged over the
execution of ncr_complete(), since they are locked in a
way that should also prevent simultanous updates by the
NCR ...
xp = cp->xfer;
cp->xfer = NULL;
tp = &np->target[xp->sc_link->target];
lp = tp->lp[xp->sc_link->lun];
ncr_complete + 195:
if (cp->parity_status) {
...
{
On address ncr_complete + 195, there is the test of
cp->parity_status. I'd be rather surprised, if the
access to cp->xfer (four lines above) would always
succeed, and the page would get lost (reproducibly)
before the access to cp->parity ...
The address of cp->parity_status is a few bytes
before cp->xfer, and I really can't see, how the
memory allocated for CCBs at driver startup should
get unmapped from kernel VM ...
I assume, that the address printed by the panic message
points at the failed instruction, not behind that
instruction. Is this true for this trap ???
(Don't have a i486 manual here, but else the failed
instruction couldn't be restarted, so this seems the
only possibility.)
It might help to send a stack trace obtained using
the kernel debugger ...
Is there anybody else seeing that kind of failure ???
STefan
--
Stefan Esser Internet: <se@ZPR.Uni-Koeln.DE>
Zentrum fuer Paralleles Rechnen Tel: +49 221 4706021
Universitaet zu Koeln FAX: +49 221 4705160
Weyertal 80
50931 Koeln
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199507051552.AA06352>
