Date: Tue, 13 Mar 2001 00:39:50 -0600 From: Richard Todd <rmtodd@ichotolot.servalan.com> To: current@freebsd.org Subject: Tracking down problem with booting large kernels (bug in locore.s) Message-ID: <m14ciTH-004MkiC@servalan.servalan.com>
next in thread | raw e-mail | index | archive | help
On my system (dual PII/400 running -current), I've noticed for some time that if I build a kernel with too many device drivers in it (where "too many" seems to correspond to text size >3M for the resulting kernel), the system reboots itself immediately upon booting with the new kernel. Other people have noticed this before (see the thread "Recent kernels won't boot" in the mailing list archives at http://www.freebsd.org/mail/archive/2000/freebsd-current/20001015.freebsd-current.html ). However, no fix for or cause of the problem was ever identified, and the problem still exists in -current cvsuped as of today. I spent some time tonight seeing if I could localize the exact place of the crash, and had some luck finding where it's crashing. The problem is annoyingly hard to track down, as even booting with DDB and boot -d wouldn't catch the bug; the kernel reboots before DDB starts. I had to resort to sticking "hlt" instructions (or calls to cpu_halt()) in various places and seeing if I could get the kernel to hang (telling me that the kernel had gotten as far as where I stuck the halt.) I narrowed the crash down to this area of locore.s (note the arrows). ----------------------------------- /* Now enable paging */ movl R(IdlePTD), %eax movl %eax,%cr3 /* load ptd addr into mmu */ movl %cr0,%eax /* get control word */ orl $CR0_PE|CR0_PG,%eax /* enable paging */ movl %eax,%cr0 /* and let's page NOW! */ #ifdef BDE_DEBUGGER /* * Complete the adjustments for paging so that we can keep tracing through * initi386() after the low (physical) addresses for the gdt and idt become * invalid. */ call bdb_commit_paging #endif <---- No crashes as of here pushl $begin /* jump to high virtualized address */ ret /* now running relocated at KERNBASE where the system is linked to run */ begin: <==== crashes before it gets here!!! /* set up bootstrap stack */ movl proc0paddr,%eax /* location of in-kernel pages */ ---------------------------------------------------------- The pushl and ret is where the boot code is jumping to "begin:" at its proper virtual address after the page tables are setup. I'm guessing that create_pagetables is somehow losing and creating bogus page tables such that the jump to the kernel virtual address space goes into deep space somewhere, but frankly the details of page tables on the i386 are beyond my expertise. So I'm posting this in hopes that someone on here *does* know enough to figure out what's going wrong when the kernel size is sufficiently large. Any takers? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m14ciTH-004MkiC>