From owner-freebsd-bugs  Tue Dec 19 01:06:55 1995
Return-Path: owner-bugs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id BAA02002
          for bugs-outgoing; Tue, 19 Dec 1995 01:06:55 -0800 (PST)
Received: from hcshh.hcs.de (hcshh.hcs.de [194.49.17.1])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id BAA01988
          for <freebsd-bugs@freebsd.org>; Tue, 19 Dec 1995 01:06:14 -0800 (PST)
Received: from hcswork.hcs.de by hcshh.hcs.de with smtp
	(Smail3.1.28.1 #9) id m0tRxyw-000TIrC; Tue, 19 Dec 95 10:05 MET
Received: by hcswork.hcs.de (Smail3.1.28.1 #9)
	id m0tRxyw-000UTcC; Tue, 19 Dec 95 10:05 MET
Message-Id: <m0tRxyw-000UTcC@hcswork.hcs.de>
From: hm@hcs.de (Hellmuth Michaelis)
Subject: Re: Problem with FreeBSD 2.1.0-RELEASE
To: davidg@Root.COM
Date: Tue, 19 Dec 1995 10:05:10 +0100 (MET)
Cc: dufault@hda.com, gibbs@freefall.freebsd.org, m.sapsed@bangor.ac.uk,
        hm@hcs.de, freebsd-bugs@freebsd.org
In-Reply-To: <199512190342.TAA02913@corbin.Root.COM> from "David Greenman" at Dec 18, 95 07:42:33 pm
Reply-To: hm@hcs.de
Organization: HCS Hanseatischer Computerservice GmbH
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-bugs@freebsd.org
Precedence: bulk

>From the keyboard of David Greenman:

> >This is certainly the same problem that Hellmuth and others
> >have been having with CD-ROM changers.  Hellmuth sent me the changer so I
> >want to find some time to track it down soon.  I saved some mail from
> >him that says it takes place here:
> >
> >> Fatal trap 12: page fault while in kernel mode
> >> Fault virtual address              = 0x60
> >> Fault code                         = supervisor read, page not present
> ...
> >> Stopped at _incore+0x48: cmpl %esi, 0x48(%ebx)
> >
> >I believe this is here in kern/vfs_bio.c:
> >
> >>	int s = splbio();
> >>
> >>	bh = BUFHASH(vp, blkno);
> >>	bp = bh->lh_first;
> >>
> >>	/* Search hash chain */
> >>	while (bp) {
> >
> >where we go indirect on that bp.
> 
>    This is a "can't happen" panic. It can only happen if the CPU executes the
> instructions incorrectly. The assembly code is:

What do you mean with "incorrectly" ?

[assembly deleted]

>    The only other possibility is that incore() isn't actually being called at
> all and the CPU ended up here because of some weird stack corruption or
> bogus function pointer or something. ...This seems unlikely.

Yes, it is very unlikey in my opinion. I have put masses of printf's at the
above mentioned place in vfs_bio.c to detect anything which might go wrong
but surprisingly nothing goes wrong (which i am able to detect ...) but the
panic happenes at the very same instruction every time (i tried hard and long
to debug this, i could reproduce the panic with the same values at the same
instruction 50..70 times - eventually i gave up because all looks sane to me).

Also, the above described situation is completely reproducable here (and 
someone from the US with the same device has had exactly the same problems,
also i'm 99.5% shure that all my hardware and setup is ok):

	- mount and unmount some (i found one has to mount at least 3 - 4 CD's
		and unmount 0 - all CD's)
	- write (!) to a mounted filesystem (i did it with iozone because
		then i knew where i had a corrupt FS after the panic)

As long as i did NOT do a write (!), the panic did not happen !

All this is happening under a plain 2.0.5 compiled with SCSI_NEWCONF enabled.

hellmuth
-- 
Hellmuth Michaelis    HCS Hanseatischer Computerservice GmbH    Hamburg, Europe
                       "There are lies, damn lies, and open systems." (unknown)