Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 Jul 1995 14:43:50 PDT
From:      Voradesh Yenbut <yenbut@cs.washington.edu>
To:        esser@zpr.uni-koeln.de (Stefan Esser)
Cc:        Voradesh Yenbut <yenbut@cs.washington.edu>, hackers@freebsd.org
Subject:   Re: One cause of 2.05R instability found 
Message-ID:  <199507052143.OAA20148@vetch.cs.washington.edu>
In-Reply-To: Your message of "Wed, 05 Jul 1995 17:52:34 %2B0200." <199507051552.AA06352@FileServ1.MI.Uni-Koeln.DE> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <199507051552.AA06352@FileServ1.MI.Uni-Koeln.DE>, Stefan Esser writes:
>Regarding problems with panics:
>
>	Fatal trap 12: page fault while in kernel mode
>
>Is this a single case ?

Yes, that was a single case of panics on my system.

>No, sorry, this statement isn't there (at ncr_complete+195) for sure ...

You are absolutely correct right.  That if statement isn't there in my
kernel.

>Well, since there shouldn't have been any code generated
>before, there shouldn't be any difference ...

There are some differences.  One fact is that my system no longer
crashes.  The others seem to be some shifts in the code.  More details
of code changes are below.

>For further diagnosis, I need to know:
>
>Did you change the sources or use any NCR specific kernel
>config file options ?

I did make a trivial change of code in if_ed.c for it to identify my
NIC board as 8216 instead of 8416.  At first I thought the problem was
with the modified ed driver, but changing it to unmodified version or
version from previous release did not make any difference to the
crash.  I did not use any NCR specific kernel config file options.

>How did you identify the suspected error location in ncr.c ?

The instruction pointer at the crash location points to a location
somewhere between call ncb_prfile() and printf(), so I simply looked
for if statement between the locations without realizing that "if
(DEBUG_FLAGS & DEBUG_TINY)" was not generated.  Since commenting it
out makes a difference, I (incorrectly) presume it must be present.

>;	ncb_profile (np, cp);
>	pushl %ecx
>	pushl 8(%ebp)
>	call _ncb_profile

<ncr_complete+128>

>	addl $8,%esp
>
>;	if (DEBUG_FLAGS & DEBUG_TINY)
>;		printf ("CCB=%x STAT=%x/%x\n", (unsigned)cp & 0xfff,
>;			cp->host_status,cp->scsi_status);
>
>;	xp = cp->xfer;
>	movl 12(%ebp),%ecx
>	movl 452(%ecx),%edi
>
>;	cp->xfer = NULL;
>	movl $0,452(%ecx)

When the "if (DEBUG_FLAGS.."  statement has actually been commented
out in the source code, the line "addl $8,%esp" above was moved to a
location before "if (cp->parity_status" as below.  There is no change
to the code between the old and the new locations of addl.

>Alll data structures should remain unchanged over the 
>execution of ncr_complete(), since they are locked in a 
>way that should also prevent simultanous updates by the 
>NCR ...
>
>	xp = cp->xfer;
>	cp->xfer = NULL;
>	tp = &np->target[xp->sc_link->target];
>	lp = tp->lp[xp->sc_link->lun];

>>><ncr_complete+189> 
>>>	addl $8,%esp   <<<<<<     New location

>ncr_complete + 195:
>	if (cp->parity_status) {
>		...
>	{

Also the locations of instructions were shifted.  For example,
ncr_complete is now at 0xf0168eb1 instead of at 0xf0168ec1.  There
could also be other changes that are not mentioned here.

>It might help to send a stack trace obtained using
>the kernel debugger ...

I am afraid it would be hard to do.  My system has 64 MBs of memory
and each swap partition has only 48 MBs.  Since the panic was in
ncr.c, sometimes the system was just stuck not being able to write
anything to the disks.  If there is an easy way to get a dump (without
changing the system much), I might attempt to do it.

----
Voradesh Yenbut				Phone:	+1 206 685-0912
BOX 352350,  U of Washington		FAX:	+1 206 543-2969
Seattle, WA 98195			Email: yenbut@cs.washington.edu





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199507052143.OAA20148>