From owner-freebsd-bugs Tue Dec 19 01:06:55 1995 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id BAA02002 for bugs-outgoing; Tue, 19 Dec 1995 01:06:55 -0800 (PST) Received: from hcshh.hcs.de (hcshh.hcs.de [194.49.17.1]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id BAA01988 for ; Tue, 19 Dec 1995 01:06:14 -0800 (PST) Received: from hcswork.hcs.de by hcshh.hcs.de with smtp (Smail3.1.28.1 #9) id m0tRxyw-000TIrC; Tue, 19 Dec 95 10:05 MET Received: by hcswork.hcs.de (Smail3.1.28.1 #9) id m0tRxyw-000UTcC; Tue, 19 Dec 95 10:05 MET Message-Id: From: hm@hcs.de (Hellmuth Michaelis) Subject: Re: Problem with FreeBSD 2.1.0-RELEASE To: davidg@Root.COM Date: Tue, 19 Dec 1995 10:05:10 +0100 (MET) Cc: dufault@hda.com, gibbs@freefall.freebsd.org, m.sapsed@bangor.ac.uk, hm@hcs.de, freebsd-bugs@freebsd.org In-Reply-To: <199512190342.TAA02913@corbin.Root.COM> from "David Greenman" at Dec 18, 95 07:42:33 pm Reply-To: hm@hcs.de Organization: HCS Hanseatischer Computerservice GmbH X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-bugs@freebsd.org Precedence: bulk >From the keyboard of David Greenman: > >This is certainly the same problem that Hellmuth and others > >have been having with CD-ROM changers. Hellmuth sent me the changer so I > >want to find some time to track it down soon. I saved some mail from > >him that says it takes place here: > > > >> Fatal trap 12: page fault while in kernel mode > >> Fault virtual address = 0x60 > >> Fault code = supervisor read, page not present > ... > >> Stopped at _incore+0x48: cmpl %esi, 0x48(%ebx) > > > >I believe this is here in kern/vfs_bio.c: > > > >> int s = splbio(); > >> > >> bh = BUFHASH(vp, blkno); > >> bp = bh->lh_first; > >> > >> /* Search hash chain */ > >> while (bp) { > > > >where we go indirect on that bp. > > This is a "can't happen" panic. It can only happen if the CPU executes the > instructions incorrectly. The assembly code is: What do you mean with "incorrectly" ? [assembly deleted] > The only other possibility is that incore() isn't actually being called at > all and the CPU ended up here because of some weird stack corruption or > bogus function pointer or something. ...This seems unlikely. Yes, it is very unlikey in my opinion. I have put masses of printf's at the above mentioned place in vfs_bio.c to detect anything which might go wrong but surprisingly nothing goes wrong (which i am able to detect ...) but the panic happenes at the very same instruction every time (i tried hard and long to debug this, i could reproduce the panic with the same values at the same instruction 50..70 times - eventually i gave up because all looks sane to me). Also, the above described situation is completely reproducable here (and someone from the US with the same device has had exactly the same problems, also i'm 99.5% shure that all my hardware and setup is ok): - mount and unmount some (i found one has to mount at least 3 - 4 CD's and unmount 0 - all CD's) - write (!) to a mounted filesystem (i did it with iozone because then i knew where i had a corrupt FS after the panic) As long as i did NOT do a write (!), the panic did not happen ! All this is happening under a plain 2.0.5 compiled with SCSI_NEWCONF enabled. hellmuth -- Hellmuth Michaelis HCS Hanseatischer Computerservice GmbH Hamburg, Europe "There are lies, damn lies, and open systems." (unknown)