From owner-freebsd-bugs Thu May 16 21:09:44 1996 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id VAA07893 for bugs-outgoing; Thu, 16 May 1996 21:09:44 -0700 (PDT) Received: from uruk.org (uruk.org [198.145.95.253]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id VAA07884 for ; Thu, 16 May 1996 21:09:38 -0700 (PDT) From: erich@uruk.org Received: from loopback (loopback [127.0.0.1]) by uruk.org (8.7.4/8.7.3) with SMTP id VAA23438; Thu, 16 May 1996 21:11:41 -0700 (PDT) Message-Id: <199605170411.VAA23438@uruk.org> X-Authentication-Warning: uruk.org: Host loopback [127.0.0.1] didn't use HELO protocol To: freebsd-bugs@freebsd.org cc: se@zpr.uni-koeln.de Subject: Re: Post 2.1.0 FreeBSD bug in PCI code ?? In-reply-to: Your message of "Thu, 16 May 1996 06:53:45 PDT." <199605161353.GAA20179@uruk.org> Date: Thu, 16 May 1996 21:11:40 -0700 Sender: owner-bugs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk [ Please e-mail responses directly to me, as I'm not on the list ] I decided to bite the bullet and simply debug the problem. As is typical of such situations, I found it in about 20 minutes, and had it fixed and tested in another 10. Please see that this fix is included in the main source tree. I wrote in my message this morning: > When running on a PC with multiple PCI buses (and the EISA bus bring > bridged off of a PCI bus), buring the boot sequence, the machine crashes > with a page fault that always looks very similar (I hadn't carefully > written down the dump message from the earlier versions). > > I've tried this on both an Intel Xtended Xpress (Pentium CPUs) and Intel > Alder (Pentium Pro CPUs). They have different chipsets, and are known > to work with other OSes just fine. They even work with 2.1.0 correctly. > A Pentium-Pro machine with the same chipset, but one PCI bus (and no > EISA bridge) works fine. > > I wrote down the error message from 2.2-960501-SNAP. This occurs when > trying "boot.flp". ... > ep0: on eisa0 slot 3 > ep0: aui/utp[*AUI*] address 00:20:af:0b:7e:e0 > Probing for devices on PCI bus 0: > chip0 rev 5 on pci0:14:0 > pci0:15:0: Intel Corporation, device 0x0008, class=0xff > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x16d0d16c > fault code = supervisor read, page not present > instruction pointer = 0x8:0xf016d1aa > stack pointer = 0x10:0xefbfff0c > frame pointer = 0x10:0xefbfff1c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 () > interrupt mask = net tty bio > panic: page fault The "class=0xff" was the key here. My diff is on the kernel source from the 2.2-960224-SNAP snapshot (I think that was it... it was from the end of february). When checking an array indexed by "class", the check was made against "subclass". Then it indexed by class and possibly goes off following a random pointer (well, not random, but not what you want, either). --------------------------(pci.c.diff)-------------------------- --- pci.c.old Thu May 16 19:46:48 1996 +++ pci.c Thu May 16 21:04:08 1996 @@ -1700,7 +1700,7 @@ printf(", class=0x%02x", class); } - if (subclass < sizeof(subclasses) / sizeof(subclasses[0])) { + if (class < sizeof(subclasses) / sizeof(subclasses[0])) { const subclass_name *p = subclasses[class]; while (p->name && (p->subclass != subclass)) p++; --------------------------(pci.c.diff)-------------------------- -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying" This is my home system, so I'm speaking only for myself, not for Intel.