From owner-freebsd-stable@FreeBSD.ORG Tue Aug 2 17:26:26 2005 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C47A16A41F; Tue, 2 Aug 2005 17:26:26 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (lots.reanimators.org [64.142.28.221]) by mx1.FreeBSD.org (Postfix) with ESMTP id CE75C43D45; Tue, 2 Aug 2005 17:26:25 +0000 (GMT) (envelope-from fmc@reanimators.org) Received: from lots.reanimators.org (localhost.reanimators.org [127.0.0.1]) by lots.reanimators.org (8.13.3/8.13.3) with ESMTP id j72HQPDO051112 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 2 Aug 2005 10:26:25 -0700 (PDT) (envelope-from fmc@lots.reanimators.org) Received: (from fmc@localhost) by lots.reanimators.org (8.13.3/8.13.3/Submit) id j72HQPQG051111; Tue, 2 Aug 2005 10:26:25 -0700 (PDT) (envelope-from fmc) Message-Id: <200508021726.j72HQPQG051111@lots.reanimators.org> To: Robert Watson References: <200507290034.j6T0YLdZ014411@lots.reanimators.org> <20050729091624.R74149@fledge.watson.org> <200507291809.j6TI9p37035628@lots.reanimators.org> From: Frank McConnell Date: Tue, 02 Aug 2005 10:26:23 -0700 In-Reply-To: <200507291809.j6TI9p37035628@lots.reanimators.org> (Frank McConnell's message of "Fri, 29 Jul 2005 11:09:50 -0700") MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-stable@FreeBSD.org Subject: Re: RELENG_5 PAE panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2005 17:26:26 -0000 Poking at this some more.... Yesterday I put together another PC, installed 5.4-RELEASE, cvsup'd to RELENG_5 using stable-supfile, built world and kernel, installed kernel and world, then built a new kernel for the target with kernel config file as follows: --- begin kernel config --- include PAE options MAXDSIZ="(2000UL*1024*1024)" options IPFIREWALL options IPFIREWALL_DEFAULT_TO_ACCEPT options DUMMYNET options IPDIVERT makeoptions DEBUG="-g" options KDB options KDB_TRACE #options KDB_UNATTENDED options DDB options DB_NUMSYM options GDB --- end kernel config --- Built a kernel from that, pushed it over to the target, installed it there, twiddled /boot/device.hints a bit, cabled both PCs together (serial cable), rebooted, and got gdb on the new PC talking to the target. --- begin gdb transcript 1 --- $ gdb /usr/obj/usr/src/sys/EAST1-PAE/kernel.debug GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... (gdb) target remote /dev/cuaa1 Remote debugging using /dev/cuaa1 kdb_enter (msg=0x2a "") at /usr/src/sys/kern/subr_kdb.c:268 268 } warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. warning: shared library handler failed to enable breakpoint (gdb) c Continuing. --- end gdb transcript 1 --- Note, gdb, not kgdb: kgdb connects and immediately segfaults. Grumble. I log in as root on the (serial) console and start named. It runs for a bit and then crashes. --- begin console transcript --- splat# /usr/sbin/named -c /etc/namedb/named.conf kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0x24 fault code = supervisor read, page not present instruction pointer = 0x8:0xc03dc587 stack pointer = 0x10:0xeb328c64 frame pointer = 0x10:0xeb328c78 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resumt, IOPL 0 current process = 70 (pagedaemon) --- end console transcript --- And gdb awakes: --- begin gdb transcript 2 --- [New Thread 100080] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 100080] propagate_priority (td=0xc69e6c00) at /usr/src/sys/kern/subr_turnstile.c:245 245 tc = TC_LOOKUP(ts->ts_lockobj); (gdb) print ts $1 = (struct turnstile *) 0x0 (gdb) print *td $2 = {td_proc = 0xc6a668d4, td_ksegrp = 0xc6a67070, td_plist = { tqe_next = 0x0, tqe_prev = 0xc6a668e4}, td_kglist = {tqe_next = 0x0, tqe_prev = 0xc6a6707c}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xc69ad460}, td_lockq = {tqe_next = 0x0, tqe_prev = 0x0}, td_runq = {tqe_next = 0x0, tqe_prev = 0x0}, td_selq = {tqh_first = 0x0, tqh_last = 0x0}, td_sleepqueue = 0x0, td_turnstile = 0xc69b3e40, td_tid = 100068, td_flags = 0, td_inhibitors = 2, td_pflags = 0, td_dupfd = 0, td_wchan = 0xc6970534, td_wmesg = 0xc05a93e5 "user map", td_lastcpu = 0 '\0', td_oncpu = 255 'y', td_locks = -26881, td_blocked = 0x0, td_ithd = 0x0, td_lockname = 0x0, td_contested = { lh_first = 0xc6a57240}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 1, td_mailbox = 0x0, td_ucred = 0xc6d17180, td_standin = 0x0, td_prticks = 0, td_upcall = 0x0, td_sticks = 249, td_uuticks = 0, td_usticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_siglist = {__bits = {0, 0, 0, 0}}, td_umtx = {tqe_next = 0x0, tqe_prev = 0x0}, td_generation = 20300, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_kflags = 0, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_base_pri = 214 'O', td_priority = 68 'D', td_pcb = 0xeb304d90, td_state = TDS_INHIBITED, td_retval = {0, -1082135752}, td_slpcallout = {c_links = {sle = { sle_next = 0xc0637c80}, tqe = {tqe_next = 0xc0637c80, tqe_prev = 0xda9c2728}}, c_time = 6657, c_arg = 0xc69e6c00, c_func = 0xc03db1f4 , c_flags = 10}, td_frame = 0xeb304d38, td_kstack_obj = 0xc0a3edec, td_kstack = 3945803776, td_kstack_pages = 2, td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0, td_critnest = 1, td_md = {md_savecrit = 582}, td_sched = 0xc69e6d50} (gdb) print *(td->td_proc) $3 = {p_list = {le_next = 0xc706754c, le_prev = 0xc6cd9388}, p_ksegrps = { tqh_first = 0xc6a67070, tqh_last = 0xc6a67074}, p_threads = { tqh_first = 0xc69e6c00, tqh_last = 0xc69e6c08}, p_suspended = { tqh_first = 0x0, tqh_last = 0xc6a668ec}, p_ucred = 0xc6d17180, p_fd = 0xc7149a00, p_fdtol = 0x0, p_stats = 0xc69f4800, p_limit = 0xc7149d00, p_unused1 = 0x0, p_sigacts = 0xc714c000, p_flag = 16386, p_sflag = 1, p_state = PRS_NORMAL, p_pid = 574, p_hash = { le_next = 0x0, le_prev = 0xc695a8f8}, p_pglist = {le_next = 0xc6cd9388, le_prev = 0xc6ac2c08}, p_pptr = 0xc706754c, p_sibling = {le_next = 0x0, le_prev = 0xc70675b4}, p_children = {lh_first = 0xc6cd9388}, p_mtx = { mtx_object = {lo_class = 0xc05d54dc, lo_name = 0xc0597f53 "process lock", lo_type = 0xc0597f53 "process lock", lo_flags = 4390912, lo_list = { tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, p_oppid = 0, p_vmspace = 0xc69704c0, p_swtime = 118, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = { tv_sec = 0, tv_usec = 0}}, p_runtime = {sec = 111, frac = 18404916408694389440}, p_uu = 0, p_su = 0, p_iu = 0, p_uticks = 14108, p_sticks = 249, p_iticks = 0, p_profthreads = 0, p_maxthrwaits = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0xc719dc60, p_siglist = {__bits = {0, 0, 0, 0}}, p_lock = 2 '\002', p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, p_magic = 3203398350, p_comm = "named\000r", '\0' , p_pgrp = 0xc6ac2c00, p_sysent = 0xc0614ea0, p_args = 0xc6cdbd00, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_xstat = 0, p_klist = { kl_lock = 0xc6a66940, kl_list = {slh_first = 0x0}}, p_numthreads = 1, p_numksegrps = 1, p_md = {md_ldt = 0xc6ac0d60}, p_itcallout = {c_links = { sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_flags = 8}, p_unused2 = 0x0, p_acflag = 0, p_ru = 0x0, p_peers = 0x0, p_leader = 0xc6a668d4, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xc6a66a98} (gdb) backtrace #0 propagate_priority (td=0xc69e6c00) at /usr/src/sys/kern/subr_turnstile.c:245 #1 0xc03dcc02 in turnstile_wait (ts=0xc6a57240, lock=0xc06387a0, owner=0xc69e6c00) at /usr/src/sys/kern/subr_turnstile.c:556 #2 0xc03b5e71 in _mtx_lock_sleep (m=0xc06387a0, td=0xc6a65000, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:552 #3 0xc03c4a36 in msleep (ident=0xc0638c44, mtx=0xc06387a0, priority=68, wmesg=0xc059c1c7 "psleep", timo=500) at /usr/src/sys/kern/kern_synch.c:239 #4 0xc04fc49c in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:1464 #5 0xc03a98cc in fork_exit (callout=0xc04fc21c , arg=0x0, frame=0xeb328d38) at /usr/src/sys/kern/kern_fork.c:791 #6 0xc053b1bc in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209 (gdb) frame 0 #0 propagate_priority (td=0xc69e6c00) at /usr/src/sys/kern/subr_turnstile.c:245 245 tc = TC_LOOKUP(ts->ts_lockobj); (gdb) print td $4 = (struct thread *) 0xc69e6c00 (gdb) frame 1 #1 0xc03dcc02 in turnstile_wait (ts=0xc6a57240, lock=0xc06387a0, owner=0xc69e6c00) at /usr/src/sys/kern/subr_turnstile.c:556 556 propagate_priority(td); (gdb) print td $5 = (struct thread *) 0xc6a65000 (gdb) print *td $6 = {td_proc = 0xc6a63388, td_ksegrp = 0xc696b380, td_plist = { tqe_next = 0x0, tqe_prev = 0xc6a63398}, td_kglist = {tqe_next = 0x0, tqe_prev = 0xc696b38c}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xc69adba0}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xc6a57240}, td_runq = { tqe_next = 0x0, tqe_prev = 0x0}, td_selq = {tqh_first = 0x0, tqh_last = 0x0}, td_sleepqueue = 0xc69adba0, td_turnstile = 0x0, td_tid = 100080, td_flags = 16777216, td_inhibitors = 8, td_pflags = 0, td_dupfd = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 0 '\0', td_oncpu = 0 '\0', td_locks = 0, td_blocked = 0xc6a57240, td_ithd = 0x0, td_lockname = 0xc05a9db6 "vm page queue mutex", td_contested = { lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_mailbox = 0x0, td_ucred = 0xc694fd80, td_standin = 0x0, td_prticks = 0, td_upcall = 0x0, td_sticks = 0, td_uuticks = 0, td_usticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_siglist = {__bits = {0, 0, 0, 0}}, td_umtx = {tqe_next = 0x0, tqe_prev = 0x0}, td_generation = 58, td_sigstk = { ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_kflags = 0, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_base_pri = 68 'D', td_priority = 68 'D', td_pcb = 0xeb328d90, td_state = TDS_INHIBITED, td_retval = {0, 0}, td_slpcallout = {c_links = {sle = { sle_next = 0xc69e6710}, tqe = {tqe_next = 0xc69e6710, tqe_prev = 0xda9ee2b8}}, c_time = 29043, c_arg = 0xc6a65000, c_func = 0xc03db1f4 , c_flags = 10}, td_frame = 0xeb328d38, td_kstack_obj = 0xc0a3e7bc, td_kstack = 3945951232, td_kstack_pages = 2, td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0, td_critnest = 2, td_md = {md_savecrit = 582}, td_sched = 0xc6a65150} (gdb) print *(td->td_proc) $7 = {p_list = {le_next = 0xc6a6354c, le_prev = 0xc6a631c4}, p_ksegrps = { tqh_first = 0xc696b380, tqh_last = 0xc696b384}, p_threads = { tqh_first = 0xc6a65000, tqh_last = 0xc6a65008}, p_suspended = { tqh_first = 0x0, tqh_last = 0xc6a633a0}, p_ucred = 0xc694fd80, p_fd = 0xc6cbca00, p_fdtol = 0x0, p_stats = 0xc6a62500, p_limit = 0xc6953d00, p_unused1 = 0x0, p_sigacts = 0xc6cd3000, p_flag = 516, p_sflag = 1, p_state = PRS_NORMAL, p_pid = 70, p_hash = {le_next = 0x0, le_prev = 0xc695a118}, p_pglist = {le_next = 0xc6a6354c, le_prev = 0xc6a63218}, p_pptr = 0xc0621280, p_sibling = { le_next = 0xc6a6354c, le_prev = 0xc6a63224}, p_children = { lh_first = 0x0}, p_mtx = {mtx_object = {lo_class = 0xc05d54dc, lo_name = 0xc0597f53 "process lock", lo_type = 0xc0597f53 "process lock", lo_flags = 4390912, lo_list = { tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, p_oppid = 0, p_vmspace = 0xc0621620, p_swtime = 290, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = { tv_sec = 0, tv_usec = 0}}, p_runtime = {sec = 0, frac = 5060676077206272}, p_uu = 0, p_su = 104, p_iu = 0, p_uticks = 0, p_sticks = 0, p_iticks = 0, p_profthreads = 0, p_maxthrwaits = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0x0, p_siglist = {__bits = {0, 0, 0, 0}}, p_lock = 1 '\001', p_sigiolst = { slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, p_magic = 3203398350, p_comm = "pagedaemon\000\000\000\000\000\000\000\000\000", p_pgrp = 0xc06217e0, p_sysent = 0xc05d0e80, p_args = 0x0, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_xstat = 0, p_klist = { kl_lock = 0xc6a633f4, kl_list = {slh_first = 0x0}}, p_numthreads = 1, p_numksegrps = 1, p_md = {md_ldt = 0x0}, p_itcallout = {c_links = {sle = { sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_flags = 8}, p_unused2 = 0x0, p_acflag = 1, p_ru = 0x0, p_peers = 0x0, p_leader = 0xc6a63388, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xc6a6354c} (gdb) print ts $8 = (struct turnstile *) 0xc6a57240 (gdb) print *ts $9 = {ts_blocked = {tqh_first = 0xc6a65000, tqh_last = 0xc6a65020}, ts_pending = {tqh_first = 0x0, tqh_last = 0xc6a57248}, ts_hash = { le_next = 0x0, le_prev = 0xc0628df8}, ts_link = {le_next = 0x0, le_prev = 0xc69e6c6c}, ts_free = {lh_first = 0x0}, ts_lockobj = 0xc06387a0, ts_owner = 0xc69e6c00} (gdb) print *(ts->ts_lockobj) $10 = {lo_class = 0xc05d54dc, lo_name = 0xc05a9db6 "vm page queue mutex", lo_type = 0xc05a9db6 "vm page queue mutex", lo_flags = 720896, lo_list = { tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0} (gdb) frame 2 #2 0xc03b5e71 in _mtx_lock_sleep (m=0xc06387a0, td=0xc6a65000, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:552 552 turnstile_wait(ts, &m->mtx_object, mtx_owner(m)); (gdb) print ts $11 = (struct turnstile *) 0xc06387a0 (gdb) print *ts $12 = {ts_blocked = {tqh_first = 0xc05d54dc, tqh_last = 0xc05a9db6}, ts_pending = {tqh_first = 0xc05a9db6, tqh_last = 0xb0000}, ts_hash = { le_next = 0x0, le_prev = 0x0}, ts_link = {le_next = 0x0, le_prev = 0xc69e6c02}, ts_free = {lh_first = 0x0}, ts_lockobj = 0x0, ts_owner = 0x0} (gdb) frame 3 #3 0xc03c4a36 in msleep (ident=0xc0638c44, mtx=0xc06387a0, priority=68, wmesg=0xc059c1c7 "psleep", timo=500) at /usr/src/sys/kern/kern_synch.c:239 239 mtx_lock(mtx); (gdb) frame 4 #4 0xc04fc49c in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:1464 1464 error = msleep(&vm_pages_needed, &vm_page_queue_mtx, PVM, (gdb) frame 5 #5 0xc03a98cc in fork_exit (callout=0xc04fc21c , arg=0x0, frame=0xeb328d38) at /usr/src/sys/kern/kern_fork.c:791 791 callout(arg, frame); (gdb) frame 6 #6 0xc053b1bc in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209 209 call fork_exit Current language: auto; currently asm (gdb) --- end gdb transcript 2 --- So, hmm. I'm a bit bewildered but will keep trying to wrap my head around it. Clues and/or direction would be welcome. It looks to me like the pagedaemon is running and trying to acquire the vm page queue mutex, which appears to be owned on behalf of named, which isn't running but also isn't blocked on a turnstile. -Frank McConnell