From owner-freebsd-stable@FreeBSD.ORG Fri Jul 29 18:09:52 2005 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B6BB16A41F; Fri, 29 Jul 2005 18:09:52 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (lots.reanimators.org [64.142.28.221]) by mx1.FreeBSD.org (Postfix) with ESMTP id C40EA43D45; Fri, 29 Jul 2005 18:09:51 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (localhost.reanimators.org [127.0.0.1]) by lots.reanimators.org (8.13.3/8.13.3) with ESMTP id j6TI9p54035629 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 29 Jul 2005 11:09:51 -0700 (PDT) (envelope-from fmc@lots.reanimators.org) Received: (from fmc@localhost) by lots.reanimators.org (8.13.3/8.13.3/Submit) id j6TI9p37035628; Fri, 29 Jul 2005 11:09:51 -0700 (PDT) (envelope-from fmc) Message-Id: <200507291809.j6TI9p37035628@lots.reanimators.org> To: Robert Watson References: <200507290034.j6T0YLdZ014411@lots.reanimators.org> <20050729091624.R74149@fledge.watson.org> From: Frank McConnell Date: Fri, 29 Jul 2005 11:09:50 -0700 In-Reply-To: <20050729091624.R74149@fledge.watson.org> (Robert Watson's message of "Fri, 29 Jul 2005 09:20:34 +0100 (BST)") MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-stable@FreeBSD.org Subject: Re: RELENG_5 PAE panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jul 2005 18:09:52 -0000 Robert Watson wrote: > This appears to be a NULL pointer dereference in > propagate_priotity(). Often a panic in propagate_priority is actually > a symptom of a slightly earlier problem which is discovered by > propagate_priority when it trips over, for example, a bad mutex. If > you're set up with a serial port to copy and paste debugging output, > the output of 'ps' and 'show pcpu' for each of the cpus (as well as > 'show pcpu without a cpu argument) would be helpful. It wouldn't hurt > also to use gdb on a copy of the kernel with debugging sybols to map > 'vm_pageout+0x280' into a line number. Details on these various > activities can be found in the handbook. Thanks, that's helpful. It's been a while since I've needed to debug a FreeBSD kernel (good work, y'all!), and while I have worked on RTOSs and TCP/IP stacks and drivers, and I've looked at the code enough to figure that I probably don't have the right clues to make sense of the entrails as seen through the debugger. It'd be interesting and probably fun, but I have other stuff that needs doing too. It's a single-CPU system with hyperthreading disabled in the firmware setup, so I'm thinking 'show pcpu 1' will not be meaningful. I'll do it anyway. --- begin crash --- splat# /usr/sbin/named -c /etc/namedb/named.conf kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0x24 fault code = supervisor read, page not present instruction pointer = 0x8:0xc03db1cf stack pointer = 0x10:0xeb328c64 frame pointer = 0x10:0xeb328c78 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 70 (pagedaemon) [thread pid 70 tid 100080 ] Stopped at 0xc03db1cf = propagate_priority+0x7f: movl 0x24(%eax),%eax db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 597 c6a66e20 0 596 596 0000002 new [INACTIVE] named 596 c6cd61c4 0 585 596 0004002 [SLPQ user map 0xc7080b24][SLP] named 585 c717d8d4 0 555 585 0004002 [SLPQ pause 0xc717d90c][SLP] csh 563 c6cd954c 0 1 563 0004002 [SLPQ ttyin 0xc6a61810][SLP] getty 562 c6cd9c5c 0 1 562 0004002 [SLPQ ttyin 0xc6a61a10][SLP] getty 561 c6cd9710 0 1 561 0004002 [SLPQ ttyin 0xc6a61c10][SLP] getty 560 c6cd91c4 0 1 560 0004002 [SLPQ ttyin 0xc6a61e10][SLP] getty 559 c6cd9a98 0 1 559 0004002 [SLPQ ttyin 0xc6adb010][SLP] getty 558 c7067000 0 1 558 0004002 [SLPQ ttyin 0xc6adb210][SLP] getty 557 c6cd9388 0 1 557 0004002 [SLPQ ttyin 0xc6adb410][SLP] getty 556 c6cd6710 0 1 556 0004002 [SLPQ ttyin 0xc6a60a10][SLP] getty 555 c7067c5c 0 1 555 0004102 [SLPQ wait 0xc7067c5c][SLP] login 541 c7067388 0 1 541 0000000 [SLPQ select 0xc0628624][SLP] inetd 534 c70671c4 125 522 522 0004100 [SLPQ select 0xc0628624][SLP] qmgr 533 c6cd9e20 125 522 522 0004100 [SLPQ select 0xc0628624][SLP] pickup 522 c6cd98d4 0 1 522 0004100 [SLPQ select 0xc0628624][SLP] master 421 c6a66710 0 1 421 0000000 [SLPQ nanslp 0xc0624fec][SLP] cron 408 c6cd654c 0 1 408 0000100 [SLPQ select 0xc0628624][SLP] sshd 391 c6cd6000 0 1 391 0000000 [SLPQ select 0xc0628624][SLP] ntpd 289 c6cd6388 0 1 289 0000000 [SLPQ select 0xc0628624][SLP] syslogd 271 c6a668d4 0 1 271 0000000 [SLPQ select 0xc0628624][SLP] devd 214 c6a6654c 0 1 214 0000000 [SLPQ pause 0xc6a66584][SLP] adjkerntz 80 c6cd68d4 0 0 0 0000204 [RUNQ] schedcpu 79 c6cd6a98 0 0 0 0000204 [SLPQ - 0xc063084c][SLP] nfsiod 3 78 c6cd6c5c 0 0 0 0000204 [SLPQ - 0xc0630848][SLP] nfsiod 2 77 c6cd6e20 0 0 0 0000204 [SLPQ - 0xc0630844][SLP] nfsiod 1 76 c6cd9000 0 0 0 0000204 [SLPQ - 0xc0630840][SLP] nfsiod 0 75 c69f3a98 0 0 0 0000204 [RUNQ] vnlru 74 c69f3c5c 0 0 0 0000204 [RUNQ] syncer 73 c69f3e20 0 0 0 0000204 [RUNQ] bufdaemon 72 c6a63000 0 0 0 000020c [SLPQ pgzero 0xc06370f4][SLP] pagezero 71 c6a631c4 0 0 0 0000204 [SLPQ psleep 0xc0637148][SLP] vmdaemon 70 c6a63388 0 0 0 0000204 [LOCK vm page queue mutex c6a57240] pagedaemon 69 c6a6354c 0 0 0 0000204 [IWAIT] swi0: sio 9 c6a63710 0 0 0 0000204 [SLPQ actask 0xc061c3cc][SLP] acpi_task2 8 c6a638d4 0 0 0 0000204 [SLPQ actask 0xc061c3cc][SLP] acpi_task1 7 c6a63a98 0 0 0 0000204 [SLPQ actask 0xc061c3cc][SLP] acpi_task0 68 c6a63c5c 0 0 0 0000204 [IWAIT] swi6:+ 6 c6a63e20 0 0 0 0000204 [SLPQ - 0xc6a3fd80][SLP] thread taskq 67 c6a66000 0 0 0 0000204 [IWAIT] swi6:+ 66 c6a661c4 0 0 0 0000204 [IWAIT] swi6: task queue 5 c6a66388 0 0 0 0000204 [SLPQ - 0xc6a57500][SLP] kqueue taskq 65 c69e41c4 0 0 0 0000204 [IWAIT] swi6: acpitaskq 64 c69e4388 0 0 0 0000204 [IWAIT] swi3: cambio 63 c69e454c 0 0 0 0000204 [IWAIT] swi2: camnet 62 c69e4710 0 0 0 0000204 [SLPQ - 0xc061ca20][SLP] yarrow 4 c69e48d4 0 0 0 0000204 [SLPQ - 0xc061f648][SLP] g_down 3 c69e4a98 0 0 0 0000204 [SLPQ - 0xc061f644][SLP] g_up 2 c69e4c5c 0 0 0 0000204 [SLPQ - 0xc061f63c][SLP] g_event 61 c69e4e20 0 0 0 0000204 [IWAIT] swi4: vm 60 c69f3000 0 0 0 000020c [IWAIT] swi5: clock sio 59 c69f31c4 0 0 0 0000204 [IWAIT] swi1: net 58 c69f3388 0 0 0 0000204 [IWAIT] irq0: clk 57 c69f354c 0 0 0 0000204 [IWAIT] irq47: 56 c69f3710 0 0 0 0000204 [IWAIT] irq46: 55 c69f38d4 0 0 0 0000204 [IWAIT] irq45: 54 c69cca98 0 0 0 0000204 [IWAIT] irq44: 53 c69ccc5c 0 0 0 0000204 [IWAIT] irq43: 52 c69cce20 0 0 0 0000204 [IWAIT] irq42: 51 c69e0000 0 0 0 0000204 [IWAIT] irq41: 50 c69e01c4 0 0 0 0000204 [IWAIT] irq40: 49 c69e0388 0 0 0 0000204 [IWAIT] irq39: 48 c69e054c 0 0 0 0000204 [IWAIT] irq38: 47 c69e0710 0 0 0 0000204 [IWAIT] irq37: 46 c69e08d4 0 0 0 0000204 [IWAIT] irq36: 45 c69e0a98 0 0 0 0000204 [IWAIT] irq35: 44 c69e0c5c 0 0 0 0000204 [IWAIT] irq34: 43 c69e0e20 0 0 0 0000204 [IWAIT] irq33: 42 c69e4000 0 0 0 0000204 [IWAIT] irq32: 41 c69bc54c 0 0 0 0000204 [IWAIT] irq31: 40 c69bc710 0 0 0 0000204 [IWAIT] irq30: 39 c69bc8d4 0 0 0 0000204 [IWAIT] irq29: 38 c69bca98 0 0 0 0000204 [IWAIT] irq28: 37 c69bcc5c 0 0 0 0000204 [IWAIT] irq27: 36 c69bce20 0 0 0 0000204 [IWAIT] irq26: 35 c69cc000 0 0 0 0000204 [IWAIT] irq25: 34 c69cc1c4 0 0 0 0000204 [IWAIT] irq24: 33 c69cc388 0 0 0 0000204 [IWAIT] irq23: 32 c69cc54c 0 0 0 0000204 [IWAIT] irq22: 31 c69cc710 0 0 0 0000204 [IWAIT] irq21: 30 c69cc8d4 0 0 0 0000204 [IWAIT] irq20: 29 c696c1c4 0 0 0 0000204 [IWAIT] irq19: 28 c696c388 0 0 0 0000204 [IWAIT] irq18: 27 c696c54c 0 0 0 0000204 [IWAIT] irq17: em0 26 c696c710 0 0 0 0000204 [IWAIT] irq16: 25 c696c8d4 0 0 0 0000204 [IWAIT] irq15: ata1 24 c696ca98 0 0 0 0000204 [IWAIT] irq14: ata0 23 c696cc5c 0 0 0 0000204 [IWAIT] irq13: 22 c696ce20 0 0 0 0000204 [IWAIT] irq12: 21 c69bc000 0 0 0 0000204 [IWAIT] irq11: 20 c69bc1c4 0 0 0 0000204 [IWAIT] irq10: 19 c69bc388 0 0 0 0000204 [IWAIT] irq9: acpi0 18 c6964000 0 0 0 0000204 [IWAIT] irq8: rtc 17 c69641c4 0 0 0 0000204 [IWAIT] irq7: 16 c6964388 0 0 0 0000204 [IWAIT] irq6: 15 c696454c 0 0 0 0000204 [IWAIT] irq5: 14 c6964710 0 0 0 0000204 [IWAIT] irq4: sio0 13 c69648d4 0 0 0 0000204 [IWAIT] irq3: sio1 12 c6964a98 0 0 0 0000204 [IWAIT] irq1: atkbd0 11 c6964c5c 0 0 0 000020c [Can run] idle 1 c6964e20 0 0 1 0004200 [SLPQ wait 0xc6964e20][SLP] init 10 c696c000 0 0 0 0000204 [SLPQ ktrace 0xc0622f98][SLP] ktrace 0 c061f740 0 0 0 0000200 [SLPQ sched 0xc061f740][SLP] swapper db> show pcpu cpuid = 0 curthread = 0xc6a65000: pid 70 "pagedaemon" curpcb = 0xeb328d90 fpcurthread = none idlethread = 0xc6965480: pid 11 "idle" APIC ID = 0 currentldt = 0x28 db> show pcpu 0 cpuid = 0 curthread = 0xc6a65000: pid 70 "pagedaemon" curpcb = 0xeb328d90 fpcurthread = none idlethread = 0xc6965480: pid 11 "idle" APIC ID = 0 currentldt = 0x28 db> trace Tracing pid 70 tid 100080 td 0xc6a65000 propagate_priority(c6a65000,c0628280,c0636c60,c6a65000,c6cd7782) at 0xc03db1cf = propagate_priority+0x7f turnstile_wait(c6a57240,c0636c60,c6cd7780) at 0xc03db84a = turnstile_wait+0x266 _mtx_lock_sleep(c0636c60,c6a65000,0,0,0) at 0xc03b4c25 = _mtx_lock_sleep+0xad msleep(c0637104,c0636c60,44,c059aa74,1f4) at 0xc03c37ea = msleep+0x39a vm_pageout(0,eb328d38) at 0xc04fb0e4 = vm_pageout+0x280 fork_exit(c04fae64,0,eb328d38) at 0xc03a8680 = fork_exit+0x74 fork_trampoline() at 0xc0539d9c = fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xeb328d6c, ebp = 0 --- db> show pcpu 1 cpuid = 2130835587 curthread = db> reset --- end crash --- --- begin gdb --- splat# gdb /usr/obj/usr/src/sys/EAST1-PAE/kernel.debug GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... (gdb) list *vm_pageout+0x280 0xc04fb0e4 is in vm_pageout (/usr/src/sys/vm/vm_pageout.c:1466). 1461 pass = 1; 1462 else 1463 pass = 0; 1464 error = msleep(&vm_pages_needed, &vm_page_queue_mtx, PVM, 1465 "psleep", vm_pageout_stats_interval * hz); 1466 if (error && !vm_pages_needed) { 1467 pass = 0; 1468 vm_pageout_page_stats(); 1469 vm_page_unlock_queues(); 1470 continue; (gdb) --- end gdb --- Based on the 'ps' output, I'm thinking the foreground named has loaded up the zone files and is forking a copy of itself to run as a daemon. There's more output (dmesg, gdb list for other trace frames, &c). Mostly I don't want to flood you or the list. When it was running 5.4-RELEASE, I had at one point added options INVARIANTS and INVARIANT_SUPPORT to the PAE-based configuration, and it panic'd during startup, either during fsck of the root filesystem (if multi-user startup) or immediately after I pressed return at the prompt for a single-user shell (during single-user startup), but I didn't take good notes. I'm willing to try that again if it would be helpful. -Frank McConnell