From owner-freebsd-current@FreeBSD.ORG Thu Jul 1 15:33:16 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A0DDC16A4CE; Thu, 1 Jul 2004 15:33:16 +0000 (GMT) Received: from mailout1.informatik.tu-muenchen.de (mailout1.informatik.tu-muenchen.de [131.159.0.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B9EF43D31; Thu, 1 Jul 2004 15:33:15 +0000 (GMT) (envelope-from langd@informatik.tu-muenchen.de) Date: Thu, 1 Jul 2004 17:32:21 +0200 From: Daniel Lang To: John Baldwin , freebsd-gnats-submit@FreeBSD.org Message-ID: <20040701153221.GC84986@atrbg11.informatik.tu-muenchen.de> References: <20040628202434.GA73213@atrbg11.informatik.tu-muenchen.de> <20040629170014.GC1144@green.homeunix.org> <20040629183557.GA77135@atrbg11.informatik.tu-muenchen.de> <200406291453.34291.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200406291453.34291.jhb@FreeBSD.org> X-Geek: GCS/CC d-- s: a- C++$ UBS++++$ P+++$ L- E-(---) W+++(--) N++ o K w--- O? M? V? PS+(++) PE--(+) Y+ PGP+ t++ 5+++ X R+(-) tv+ b+ DI++ D++ G++ e+++ h---(-) r+++ y+ User-Agent: Mutt/1.5.6i X-Virus-Scanned: by amavisd-new at informatik.tu-muenchen.de cc: freebsd-current@FreeBSD.org cc: Colin Percival Subject: Re: kern/68442: panic - acquiring duplicate lock of same type: "sleepq chain" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jul 2004 15:33:16 -0000 Hi, John Baldwin wrote on Tue, Jun 29, 2004 at 02:53:34PM -0400: [..] > Well, it may be a hint sadly enough. The fact that it thinks Giant is a spin > lock means that witness is confused, and this panic may be further such > confusion. One possibility is that somehow the sleep queue chain mutexes > have been corrupted. [..] First: the system locked up also with a UP kernel, so maybe this is not entirely related to SMP locking? Or maybe locks are now used in a non-smp environment as well. I guess so since the advent of KSE? Anyway... I have applied Bruce Evans' sio.c patch and now I did get a 'ddb' prompt after the machine paniced. I could call doadump() and this time there is enough space in /var. The following applies again to the SMP kernel but no PAE stuff. Here is the panic message and ddb trace: [..] Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x34 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0539096 stack pointer = 0x10:0xe5502ab0 frame pointer = 0x10:0xe5502ad4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 9176 (cvs) kernel: type 12 trap, code=0 Stopped at 0xc0539096 = witness_checkorder+0x176: movl 0x34(%eax),%edx db> trace witness_checkorder(c07100a4,9,c069f530,19b,500) at 0xc0539096 = witness_checkor6 _mtx_lock_spin_flags(c07100a4,0,c069f530,19b,c070a0e0) at 0xc0516e9e = _mtx_loce turnstile_lookup(c070a0e0,c070a0e0,3bc,c06a390d,e5502b54) at 0xc05382ae = turnse _mtx_lock_sleep(c070a0e0,0,c06a390d,3bc) at 0xc051701e = _mtx_lock_sleep+0x66 _mtx_lock_flags(c070a0e0,0,c06a390d,3bc,c0711c30) at 0xc0516e0f = _mtx_lock_fla7 kern_open(c403bd20,81ef440,0,1,1b6) at 0xc056873f = kern_open+0xc7 open(c403bd20,e5502d14,3,ab,296) at 0xc0568674 = open+0x18 syscall(2f,2835002f,bfbf002f,4,283502e0) at 0xc0664703 = syscall+0x217 Xint0x80_syscall() at 0xc06539ef = Xint0x80_syscall+0x1f --- syscall (5, FreeBSD ELF32, open), eip = 0x282c9e1b, esp = 0xbfbfe0fc, ebp =- [..] Please note, this time, there was no LOR happening! Here is a log of my sort of fruitless gdb session. Since there was no LOR, I am not sure if the sleepq_chain is related to the panic? However, the panic is obviously triggered inside the witness code, because *lock_list = 0x0 in line 749. Although a few lines above, the list is checked for beeing empty (line 707), just Colin has already pointed out from the first trace I could get. But between line 707 and 749 there is no obvious modification to this list. I am not sure what 'find_instance()' does? So maybe another thread on another CPU has modified the locklist meanwhile? Is this possible? Anyway, here is my gdb session. I poked around in the mutexes in the 'turnstile_chain' without finding anything obviously wrong. [..] -rw-r--r-- 1 root wheel 5 Feb 23 20:42 minfree -rw------- 1 root wheel 2147352576 Jul 1 16:58 vmcore.1 atleo6:/var/crash#gdb6 -k kernel.1 vmcore.1 GNU gdb 20040615 [GDB v6.x for FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-portbld-freebsd5.2"... panic messages: --- Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x34 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0539096 stack pointer = 0x10:0xe5502ab0 frame pointer = 0x10:0xe5502ad4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 9176 (cvs) kernel: type 12 trap, code=0 Dumping 2047 MB 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 1152 1168 1184 1200 1216 1232 1248 1264 1280 1296 1312 1328 1344 1360 1376 1392 1408 1424 1440 1456 1472 1488 1504 1520 1536 1552 1568 1584 1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 1760 1776 1792 1808 1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 2016 2032 --- #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:236 236 dumping++; doadump () at /usr/src/sys/kern/kern_shutdown.c:236 236 dumping++; (kgdb) bt #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:236 #1 0xc0451f2e in db_fncall (dummy1=0, dummy2=0, dummy3=-1066115808, dummy4=0xe55028ec "\b)På\032oQÀ ]tÀF") at /usr/src/sys/ddb/db_command.c:551 #2 0xc0451d3c in db_command (last_cmdp=0xc0702150, cmd_table=0x0, aux_cmd_tablep=0xc06ba454, aux_cmd_tablep_end=0xc06ba46c) at /usr/src/sys/ddb/db_command.c:348 #3 0xc0451e14 in db_command_loop () at /usr/src/sys/ddb/db_command.c:475 #4 0xc0454551 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_trap.c:73 #5 0xc06523fd in kdb_trap (type=12, code=0, regs=0xe5502a70) at /usr/src/sys/i386/i386/db_interface.c:159 #6 0xc0664433 in trap_fatal (frame=0xe5502a70, eva=52) at /usr/src/sys/i386/i386/trap.c:810 #7 0xc0664177 in trap_pfault (frame=0xe5502a70, usermode=0, eva=52) at /usr/src/sys/i386/i386/trap.c:733 #8 0xc0663e19 in trap (frame= {tf_fs = -1066336232, tf_es = -931856368, tf_ds = 16, tf_edi = 9, tf_esi = -1066336092, tf_ebp = -447730988, tf_isp = -447731044, tf_ebx = -4194252, tf_edx = -1, tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1068265322, tf_cs = 8, tf_eflags = 66118, tf_esp = -447731012, tf_ss = -1068404931}) at /usr/src/sys/i386/i386/trap.c:420 #9 0xc065399a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #10 0xc0710018 in turnstile_chains () #11 0xc8750010 in ?? () #12 0x00000010 in ?? () #13 0x00000009 in ?? () #14 0xc07100a4 in turnstile_chains () #15 0xe5502ad4 in ?? () #16 0xe5502a9c in ?? () #17 0xffc00034 in ?? () #18 0xffffffff in ?? () #19 0x00000000 in ?? () #20 0x00000000 in ?? () #21 0x0000000c in ?? () #22 0x00000000 in ?? () #23 0xc0539096 in witness_checkorder (lock=0xc07100a4, flags=9, file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411) at /usr/src/sys/kern/subr_witness.c:749 #24 0xc0516e9e in _mtx_lock_spin_flags (m=0xc07100a4, opts=0, file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411) at /usr/src/sys/kern/kern_mutex.c:354 #25 0xc05382ae in turnstile_lookup (lock=0xc070a0e0) at /usr/src/sys/kern/subr_turnstile.c:411 #26 0xc051701e in _mtx_lock_sleep (m=0xc070a0e0, opts=0, file=0xc06a390d "/usr/src/sys/kern/vfs_syscalls.c", line=956) at /usr/src/sys/kern/kern_mutex.c:458 #27 0xc0516e0f in _mtx_lock_flags (m=0xc070a0e0, opts=0, file=0xc06a390d "/usr/src/sys/kern/vfs_syscalls.c", line=956) at /usr/src/sys/kern/kern_mutex.c:252 #28 0xc056873f in kern_open (td=0xc403bd20, path=0x0, pathseg=UIO_USERSPACE, flags=1, mode=438) at /usr/src/sys/kern/vfs_syscalls.c:956 #29 0xc0568674 in open (td=0xc403bd20, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:926 #30 0xc0664703 in syscall (frame= {tf_fs = 47, tf_es = 674562095, tf_ds = -1078001617, tf_edi = 4, tf_esi = 674562784, tf_ebp = -1077944024, tf_isp = -447730316, tf_ebx = 674484940, tf_edx = 0, tf_ecx = 0, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 674012699, tf_cs = 31, tf_eflags = 662, tf_esp = -1077944068, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1004 #31 0xc06539ef in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:201 #32 0x0000002f in ?? () #33 0x2835002f in ?? () #34 0xbfbf002f in ?? () #35 0x00000004 in ?? () #36 0x283502e0 in ?? () #37 0xbfbfe128 in ?? () #38 0xe5502d74 in ?? () #39 0x2833d2cc in ?? () #40 0x00000000 in ?? () #41 0x00000000 in ?? () #42 0x00000005 in ?? () #43 0x0000000c in ?? () #44 0x00000002 in ?? () ---Type to continue, or q to quit---up 23 #45 0x282c9e1b in ?? () #46 0x0000001f in ?? () #47 0x00000296 in ?? () #48 0xbfbfe0fc in ?? () #49 0x0000002f in ?? () #50 0x00000000 in ?? () #51 0x00000000 in ?? () #52 0x0804ad90 in ?? () #53 0x00000000 in ?? () #54 0x67ff8000 in ?? () #55 0xc8755dc0 in ?? () #56 0xc8755e6c in ?? () #57 0xe55029c4 in ?? () #58 0xe55029b4 in ?? () #59 0xc403bd20 in ?? () #60 0xc052bd30 in sched_switch (td=0x2833d2cc) at /usr/src/sys/kern/sched_4bsd.c:676 Previous frame inner to this frame (corrupt stack?) (kgdb) up 23 #23 0xc0539096 in witness_checkorder (lock=0xc07100a4, flags=9, file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411) at /usr/src/sys/kern/subr_witness.c:749 749 lock1 = &(*lock_list)->ll_children[(*lock_list)->ll_count - 1]; (kgdb) l 744 /* 745 * Check for duplicate locks of the same type. Note that we only 746 * have to check for this on the last lock we just acquired. Any 747 * other cases will be caught as lock order violations. 748 */ 749 lock1 = &(*lock_list)->ll_children[(*lock_list)->ll_count - 1]; 750 w1 = lock1->li_lock->lo_witness; 751 if (w1 == w) { 752 if (w->w_same_squawked || (lock->lo_flags & LO_DUPOK)) 753 return; (kgdb) p lock_list $2 = (struct lock_list_entry *) 0x0 (kgdb) up #24 0xc0516e9e in _mtx_lock_spin_flags (m=0xc07100a4, opts=0, file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411) at /usr/src/sys/kern/kern_mutex.c:354 354 WITNESS_CHECKORDER(&m->mtx_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, (kgdb) l 349 350 MPASS(curthread != NULL); 351 KASSERT(m->mtx_object.lo_class == &lock_class_mtx_spin, 352 ("mtx_lock_spin() of sleep mutex %s @ %s:%d", 353 m->mtx_object.lo_name, file, line)); 354 WITNESS_CHECKORDER(&m->mtx_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, 355 file, line); 356 #if defined(SMP) || LOCK_DEBUG > 0 || 1 357 _get_spin_lock(m, curthread, opts, file, line); 358 #else (kgdb) p m $3 = (struct mtx *) 0xc07100a4 (kgdb) p *m $4 = {mtx_object = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc07100cc, tqe_prev = 0xc071008c}, lo_witness = 0xc0712900}, mtx_lock = 4, mtx_recurse = 0} (kgdb) p m->mtx_object $5 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc07100cc, tqe_prev = 0xc071008c}, lo_witness = 0xc0712900} (kgdb) p m->mtx_object->lo_class $6 = (struct lock_class *) 0xc06e25a4 (kgdb) p m->mtx_object->lo_class $7 = {lc_name = 0xc069cce2 "spin mutex", lc_flags = 10} (kgdb) p m->mtx_object->lo_list $12 = {tqe_next = 0xc07100cc, tqe_prev = 0xc071008c} (kgdb) p m->mtx_object->lo_list->tqe_next $13 = (struct lock_object *) 0xc07100cc (kgdb) p m->mtx_object->lo_list->tqe_next $14 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc07100f4, tqe_prev = 0xc07100b4}, lo_witness = 0xc0712900} (kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next $15 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc071011c, tqe_prev = 0xc07100dc}, lo_witness = 0xc0712900} (kgdb) p opts $16 = 0 (kgdb) p file $17 = 0xc069f530 "/usr/src/sys/kern/subr_turnstile.c" (kgdb) p line $18 = 411 (kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next $22 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc0710144, tqe_prev = 0xc0710104}, lo_witness = 0xc0712900} (kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next $23 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc071011c, tqe_prev = 0xc07100dc}, lo_witness = 0xc0712900} (kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next $24 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc071016c, tqe_prev = 0xc071012c}, lo_witness = 0xc0712900} (kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_cl ass $25 = {lc_name = 0xc069cce2 "spin mutex", lc_flags = 10} (kgdb) p m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_listt->tqe_next->lo_list $27 = {tqe_next = 0xc0710194, tqe_prev = 0xc0710154} (kgdb) quit Script done on Thu Jul 1 17:19:36 2004 [..] Cheers, Daniel -- IRCnet: Mr-Spock - ceterum censeo Microsoftinem esse delendam - Daniel Lang * dl@leo.org * +49 89 289 18532 * http://www.leo.org/~dl/