From owner-freebsd-stable@FreeBSD.ORG Fri Jul 29 00:34:34 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2CDF316A41F for ; Fri, 29 Jul 2005 00:34:34 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (lots.reanimators.org [64.142.28.221]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5517A43D62 for ; Fri, 29 Jul 2005 00:34:22 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (localhost.reanimators.org [127.0.0.1]) by lots.reanimators.org (8.13.3/8.13.3) with ESMTP id j6T0YMUY014412 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 28 Jul 2005 17:34:22 -0700 (PDT) (envelope-from fmc@lots.reanimators.org) Received: (from fmc@localhost) by lots.reanimators.org (8.13.3/8.13.3/Submit) id j6T0YLdZ014411; Thu, 28 Jul 2005 17:34:21 -0700 (PDT) (envelope-from fmc) Message-Id: <200507290034.j6T0YLdZ014411@lots.reanimators.org> To: freebsd-stable@freebsd.org From: Frank McConnell Date: Thu, 28 Jul 2005 17:34:21 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: RELENG_5 PAE panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jul 2005 00:34:34 -0000 Intel SE7320VP2 motherboard with single Xeon 2.8GHz, 4GB RAM, and an 80GB disk and a CD-ROM drive connected to motherboard ATA. 1GB of the RAM appears above 4GB which suggested building a PAE kernel. So, imagine 5.4-RELEASE with a kernel config file that goes like this: include PAE options MAXDSIZ="(2000UL*1024*1024)" options IPFIREWALL options IPFIREWALL_DEFAULT_TO_ACCEPT options IPDIVERT options DUMMYNET It boots, but often panics while starting a rather modified named (based on BIND 8, and running just fine on other 4.x and 5.x systems with similar kernel configuration but no PAE). Sometimes it doesn't panic right away, but it usually does. Boot from /boot/kernel.old/kernel (no PAE), add KDB/DDB options to PAE kernel configuration, build and install kernel and try some more. See that it is panicking in propagate_priority(). No crash dumps, it reliably dumps 3552MB and then loses with an NMI. s/PAE/GENERIC/ and it runs, but ignores 1GB RAM. That was last night. This morning I found which could be describing a related problem (though I have no ips-type hardware in my picture), and Scott Long seemed to be interested. And I looked through the commit logs and saw a commit to sys/kern/kern_switch.c that looked like it could perhaps have some bearing. (Rev 1.112, MFCd as 1.78.2.19, basing this on the commit message for 1.112.) So, hmm. cvsup using stable-supfile, buildworld, buildkernel, &c. It didn't help. A kernel with PAE still got me a panic during named startup (this time by hand after having logged in as root following multi-user startup). --- begin paste --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x24 fault code = supervisor read, page not present instruction pointer = 0x8:0xc03db1cf stack pointer = 0x10:0xeb328c64 frame pointer = 0x10:0xeb328c78 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 70 (pagedaemon) [thread pid 70 tid 100080 ] Stopped at 0xc03db1cf = propagate_priority+0x7f: movl 0x24(%eax),%eax db> trace Tracing pid 70 tid 100080 td 0xc6a89000 propagate_priority(c6a89000,c0628280,c0636c60,c6a89000,c6cdaa82) at 0xc03db1cf = propagate_priority+0x7f turnstile_wait(c6a6f240,c0636c60,c6cdaa80) at 0xc03db84a = turnstile_wait + 0x266 _mtx_lock_sleep(c0636c60,c6a89000,0,0,0) at 0xc03b4c25 = _mtx_lock_speed+0xad msleep(c0637104,c0636c60,44,c059aa74,1f4) at 0xc03c37ea = msleep+0x39a vm_pageout(0,eb328d38) at 0xc04fb0e4 = vm_pageout+0x280 fork_exit(c04fae64,0,eb328d38) at 0xc03a8680 = fork_exit+0x74 fork_trampoline() at 0xc0539d9c = fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xeb328d6c, ebp = 0 --- db> --- end paste --- "panic" to force a dump continues to lose with an NMI after 3552MB. Um, help? Pretty please? My clues about this part of the kernel are a bit stale. I don't know how long I have to play, and don't think I can give remote access, but I'm willing to try stuff and be remote eyes, hands, and as much of a brain as I can while I can. -Frank McConnell