From owner-freebsd-stable@FreeBSD.ORG Tue Aug 2 22:21:00 2005 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F34216A420; Tue, 2 Aug 2005 22:21:00 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (lots.reanimators.org [64.142.28.221]) by mx1.FreeBSD.org (Postfix) with ESMTP id CD28C43D55; Tue, 2 Aug 2005 22:20:58 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (localhost.reanimators.org [127.0.0.1]) by lots.reanimators.org (8.13.3/8.13.3) with ESMTP id j72MKwse056655 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 2 Aug 2005 15:20:58 -0700 (PDT) (envelope-from fmc@lots.reanimators.org) Received: (from fmc@localhost) by lots.reanimators.org (8.13.3/8.13.3/Submit) id j72MKvUt056654; Tue, 2 Aug 2005 15:20:57 -0700 (PDT) (envelope-from fmc) Message-Id: <200508022220.j72MKvUt056654@lots.reanimators.org> To: Robert Watson References: <200507290034.j6T0YLdZ014411@lots.reanimators.org> <20050729091624.R74149@fledge.watson.org> <200507291809.j6TI9p37035628@lots.reanimators.org> <200508021726.j72HQPQG051111@lots.reanimators.org> From: Frank McConnell Date: Tue, 02 Aug 2005 15:20:57 -0700 In-Reply-To: <200508021726.j72HQPQG051111@lots.reanimators.org> (Frank McConnell's message of "Tue, 02 Aug 2005 10:26:23 -0700") MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-stable@FreeBSD.org Subject: Re: RELENG_5 PAE panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2005 22:21:00 -0000 Earlier I wrote: > It looks to me like the pagedaemon is running and trying to acquire > the vm page queue mutex, which appears to be owned on behalf of named, > which isn't running but also isn't blocked on a turnstile. And looking at the same crash (haven't rebooted yet), it occurred to me that I should go find named's thread and see if I could figure out why it lost the CPU. propagate_priority() was called with a thread argument of 0xc69e6c00, so I went looking for that. (gdb) info threads 102 Thread 100001 0xc03ce333 in sched_switch (td=0xc6965180, newtd=0xc69e6300, flags=1) at /usr/src/sys/kern/sched_4bsd.c:881 [...] 4 Thread 100107 0xc03ce333 in sched_switch (td=0xc6cdaa80, newtd=0xc6965480, flags=1) at /usr/src/sys/kern/sched_4bsd.c:881 3 Thread 100068 0xc03ce333 in sched_switch (td=0xc69e6c00, newtd=0xc69e6000, flags=1) at /usr/src/sys/kern/sched_4bsd.c:881 2 Thread 100085 0xc03ce333 in sched_switch (td=0xc6a65780, newtd=0xc6cda600, flags=-1063014400) at /usr/src/sys/kern/sched_4bsd.c:881 * 1 Thread 100080 propagate_priority (td=0xc69e6c00) at /usr/src/sys/kern/subr_turnstile.c:245 (gdb) thread 3 [Switching to thread 3 (Thread 100068)]#0 0xc03ce333 in sched_switch ( td=0xc69e6c00, newtd=0xc69e6000, flags=1) at /usr/src/sys/kern/sched_4bsd.c:881 881 cpu_switch(td, newtd); (gdb) backtrace #0 0xc03ce333 in sched_switch (td=0xc69e6c00, newtd=0xc69e6000, flags=1) at /usr/src/sys/kern/sched_4bsd.c:881 #1 0xc03c4c2a in mi_switch (flags=1, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:355 #2 0xc03dacf2 in sleepq_switch (wchan=???) at /usr/src/sys/kern/subr_sleepqueue.c:406 #3 0xc03daedb in sleepq_wait (wchan=0xc6970534) at /usr/src/sys/kern/subr_sleepqueue.c:518 #4 0xc03960eb in cv_wait (cvp=0xc6970534, mp=0xc06258d0) at /usr/src/sys/kern/kern_condvar.c:128 #5 0xc03c42c8 in _sx_xlock (sx=0xc6970504, file=0x0, line=0) at /usr/src/sys/kern/kern_sx.c:175 #6 0xc04ef26f in _vm_map_lock_read (map=???, file=???, line=???) at /usr/src/sys/vm/vm_map.c:380 #7 0xc04f23ae in vm_map_lookup (var_map=0xeb304970, vaddr=0, fault_typea=2 '\002', out_entry=0xeb304974, object=???, pindex=???, out_prot=???, wired=0xeb30494c) at /usr/src/sys/vm/vm_map.c:2998 #8 0xc04eacfd in vm_fault (map=0xc69704c0, vaddr=0, fault_type=2 '\002', fault_flags=8) at /usr/src/sys/vm/vm_fault.c:229 #9 0xc054c7b5 in trap_pfault (frame=0xeb304a34, usermode=0, eva=28) at /usr/src/sys/i386/i386/trap.c:712 #10 0xc054c481 in trap (frame= {tf_fs = -1062993896, tf_es = -349175792, tf_ds = 34013200, tf_edi = -1063014400, tf_esi = 2, tf_ebp = -349156544, tf_isp = -349156768, tf_ebx = 0, tf_edx = 0, tf_ecx = -962696192, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip = -1068585392, tf_cs = 8, tf_eflags = 66050, tf_esp = -962699264, tf_ss = -1067618760}) at /usr/src/sys/i386/i386/trap.c:425 #11 0xc053b15a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #12 0xc0a40018 in ?? () #13 0xeb300010 in ?? () #14 0x02070010 in ?? () #15 0xc0a3b000 in ?? () #16 0x00000002 in ?? () #17 0xeb304b40 in ?? () #18 0xeb304a60 in ?? () #19 0x00000000 in ?? () #20 0x00000000 in ?? () #21 0xc69e6c00 in ?? () #22 0x00000004 in ?? () #23 0x0000000c in ?? () #24 0x00000002 in ?? () #25 0xc04eae50 in vm_fault (map=0xc0a3b000, vaddr=3232002048, fault_type=2 '\002', fault_flags=0) at atomic.h:365 #26 0xc054c833 in trap_pfault (frame=0xeb304ba8, usermode=0, eva=3232006126) at /usr/src/sys/i386/i386/trap.c:724 #27 0xc054c481 in trap (frame= {tf_fs = 24, tf_es = -349175792, tf_ds = -1068302320, tf_edi = -1081504208---Type to continue, or q to quit--- , tf_esi = 1127, tf_ebp = -349156376, tf_isp = -349156396, tf_ebx = 1127, tf_edx = -1062961228, tf_ecx = -1, tf_eax = 128, tf_trapno = 12, tf_err = 2, tf_eip = -1068534339, tf_cs = 8, tf_eflags = 66199, tf_esp = -349156336, tf_ss = -1068204127}) at /usr/src/sys/i386/i386/trap.c:425 #28 0xc053b15a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #29 0x00000018 in ?? () #30 0xeb300010 in ?? () #31 0xc0530010 in init_scp (sc=0xbf898e30, vty=???, scp=0x467) at /usr/src/sys/dev/syscons/syscons.c:2958 #32 0xc0547fa1 in pmap_protect (pmap=0xc6970580, sva=320626688, eva=1120960512, prot=???) at /usr/src/sys/i386/i386/pmap.c:1860 #33 0xc04f1769 in vm_map_copy_entry (src_map=0xc69704c0, dst_map=0xc696fd10, src_entry=0xc718b264, dst_entry=0xc708750c) at /usr/src/sys/vm/vm_map.c:2394 #34 0xc04f1b5a in vmspace_fork (vm1=0xc69704c0) at /usr/src/sys/vm/vm_map.c:2581 #35 0xc04ed8df in vm_forkproc (td=0xc69e6c00, p2=0xc6cd9388, td2=0xc6a65780, flags=20) at /usr/src/sys/vm/vm_glue.c:464 #36 0xc03a93bc in fork1 (td=0xc69e6c00, flags=20, pages=0, procp=0xeb304cd4) at /usr/src/sys/kern/kern_fork.c:644 #37 0xc03a82fc in fork (td=0xc69e6c00, uap=0xeb304d04) at /usr/src/sys/kern/kern_fork.c:97 #38 0xc054ce3b in syscall (frame= {tf_fs = -1082195921, tf_es = -2061565905, tf_ds = -1082195921, tf_edi = 0, tf_esi = -1082135760, tf_ebp = -1082135672, tf_isp = -349155996, tf_ebx = -2061606804, tf_edx = -1082135752, tf_ecx = 65535, tf_eax = 2, tf_trapno = 12, tf_err = 2, tf_eip = -2062103505, tf_cs = 31, tf_eflags = 642, tf_esp = -1082135780, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1009 #39 0xc053b1af in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:201 #40 0xbf7f002f in ?? () #41 0x851f002f in ?? () #42 0xbf7f002f in ?? () #43 0x00000000 in ?? () #44 0xbf7feb30 in ?? () #45 0xbf7feb88 in ?? () #46 0xeb304d64 in ?? () #47 0x851e606c in ?? () #48 0xbf7feb38 in ?? () #49 0x0000ffff in ?? () #50 0x00000002 in ?? () #51 0x0000000c in ?? () #52 0x00000002 in ?? () #53 0x8516cc2f in ?? () #54 0x0000001f in ?? () #55 0x00000282 in ?? () #56 0xbf7feb1c in ?? () #57 0x0000002f in ?? () ---Type to continue, or q to quit--- #58 0xfbbab4bb in ?? () #59 0x3ba15b43 in ?? () #60 0x9ec046cc in ?? () #61 0x67b74ff9 in ?? () #62 0xba0a4c00 in ?? () #63 0xc6a668d4 in ?? () #64 0xc69e6c00 in ?? () #65 0xeb3047e0 in ?? () #66 0xeb3047c8 in ?? () #67 0xc69e6000 in ?? () #68 0xc03ce333 in sched_switch (td=0xbf7feb30, newtd=0x851e606c, flags=0) at /usr/src/sys/kern/sched_4bsd.c:881 Previous frame inner to this frame (corrupt stack?) (gdb) Reading vmspace_fork() I note that it locks old_map with vm_lock_map(old_map). (gdb) frame 34 #34 0xc04f1b5a in vmspace_fork (vm1=0xc69704c0) at /usr/src/sys/vm/vm_map.c:2581 2581 vm_map_copy_entry(old_map, new_map, old_entry, (gdb) print old_map $25 = 0xc69704c0 (gdb) print old_map->system_map $26 = 0 '\0' (gdb) And I note that the backtrace indicates that vm_map_lookup() is trying to lock something with _vm_map_lock_read(). (gdb) frame 7 #7 0xc04f23ae in vm_map_lookup (var_map=0xeb304970, vaddr=0, fault_typea=2 '\002', out_entry=0xeb304974, object=???, pindex=???, out_prot=???, wired=0xeb30494c) at /usr/src/sys/vm/vm_map.c:2998 2998 vm_map_lock_read(map); (gdb) print map $27 = 0xc69704c0 (gdb) Both of the locks would appear to boil down to _sx_xlock(&map->lock, file, line), which I think means that this thread is deadlocked. -Frank McConnell