From owner-freebsd-current@FreeBSD.ORG Tue Oct 19 21:18:42 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C0DC16A4D0 for ; Tue, 19 Oct 2004 21:18:42 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id A4E1E43D5C for ; Tue, 19 Oct 2004 21:18:41 +0000 (GMT) (envelope-from john@baldwin.cx) Received: (qmail 30071 invoked from network); 19 Oct 2004 21:18:41 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 19 Oct 2004 21:18:40 -0000 Received: from zion.baldwin.cx (zion.baldwin.cx [192.168.0.7]) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9JLIbeM050406; Tue, 19 Oct 2004 17:18:37 -0400 (EDT) (envelope-from john@baldwin.cx) Received: from zion.baldwin.cx (localhost [127.0.0.1]) by zion.baldwin.cx (8.12.10/8.12.10) with ESMTP id i9JLIb8h005730; Tue, 19 Oct 2004 17:18:37 -0400 (EDT) (envelope-from john@zion.baldwin.cx) Received: from localhost (localhost [[UNIX: localhost]]) by zion.baldwin.cx (8.12.10/8.12.10/Submit) id i9JLIbuR005729; Tue, 19 Oct 2004 17:18:37 -0400 (EDT) (envelope-from john) From: John Baldwin To: freebsd-current@freebsd.org Date: Tue, 19 Oct 2004 17:18:37 -0400 User-Agent: KMail/1.6 References: <200410191703.52737.jhb@FreeBSD.org> In-Reply-To: <200410191703.52737.jhb@FreeBSD.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410191718.37092.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Robert Watson Subject: Re: Preemption-related bug in propagate_priority(): it's OK to hold Giant but not be running X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Oct 2004 21:18:42 -0000 On Tuesday 19 October 2004 05:03 pm, John Baldwin wrote: > On Sunday 17 October 2004 03:17 pm, Robert Watson wrote: > > Ran into an interesting problem relating to WITNESS and a > > multiply-destroyed mutex today, but I didn't get the real panic message > > because of what looks like one or two other bugs. Here's the high-level: > > > > sodealloc() tries to destroy the mutex on a socket more than once, > > causing WITNESS to panic. The soft clock ithread preempts the serial > > output in panic() when a critical section is released, resulting in a > > context switch to that ithread. That ithread then attempts to lock a > > mutex, and is horrified to find out that Giant is held by the thread it > > preempted, but that that thread isn't blocked: > > > > tiger-2# ppanic: process 15955(tcpconnect):2 holds Giant but isn't > > blocked on a lock > > > > % rwatson -- note that double p is from the initiating panic > > > > cpuid = 0 > > KDB: enter: panic > > [thread 100071] > > Stopped at kdb_enter+0x2b: nop > > db> trace > > kdb_enter(c07fc101) at kdb_enter+0x2b > > panic(c07ff1ec,3e53,c5f90588,2,c080c957) at panic+0x127 > > propagate_priority(c547b180,c08c47b8,c5f91780,c08bdbc0,0) at > > propagate_priority+ > > 0x139 > > turnstile_wait(0,c08bdbc0,c5f91780,c08bdbc0,2,c07fb4a5,217) at > > turnstile_wait+0x > > 2de > > _mtx_lock_sleep(c08bdbc0,c547b180,0,c07fd124,f7) at _mtx_lock_sleep+0x15f > > _mtx_lock_flags(c08bdbc0,0,c07fd124,f7,0) at _mtx_lock_flags+0x85 > > softclock(0) at softclock+0x16a > > ithread_loop(c5436e80,e942fd48,c5436e80,c05f7c80,0) at ithread_loop+0x124 > > fork_exit(c05f7c80,c5436e80,e942fd48) at fork_exit+0xa4 > > fork_trampoline() at fork_trampoline+0x8 > > --- trap 0x1, eip = 0, esp = 0xe942fd7c, ebp = 0 --- > > db> show locks > > db> show pcpu > > cpuid = 0 > > curthread = 0xc547b180: pid 87 "swi4: clock sio" > > curpcb = 0xe942fda0 > > fpcurthread = none > > idlethread = 0xc53e4a80: pid 14 "idle: cpu0" > > APIC ID = 0 > > currentldt = 0x28 > > spin locks held: > > db> ps > > pid proc uarea uid ppid pgrp flag stat wmesg wchan > > cmd 15955 c5f90400 ebbb1000 0 525 15955 0004002 [Can run] tcpconnect > > ... > > 87 c548b400 e946e000 0 0 0 000020c [LOCK Giant c5432d00] > > swi4: clock sio > > db> trace 15955 > > sched_switch(c5f91780,c5451d80,6) at sched_switch+0x16f > > mi_switch(6,c5451d80,c5451ed0,c5451d80,ebba492c) at mi_switch+0x264 > > maybe_preempt(c5451d80) at maybe_preempt+0x156 > > sched_add(c5451d80,4,c53e2d80,c5451d80,c5450800) at sched_add+0x153 > > setrunqueue(c5451d80,4) at setrunqueue+0xab > > ithread_schedule(c53e2d80,1f,c5f91780,c5f91780,2580) at > > ithread_schedule+0xb3 > > intr_execute_handlers(c53dc0d8,ebba49a8,1f,ebba49f0,c0780ad3) at > > intr_execute_ha > > ndlers+0xf5 > > lapic_handle_intr(4f) at lapic_handle_intr+0x2e > > Xapic_isr2() at Xapic_isr2+0x33 > > --- interrupt, eip = 0xc0619446, esp = 0xebba49ec, ebp = 0xebba49f0 --- > > critical_exit(3f8,ebba4a30,c076f731,c08fa000,0) at critical_exit+0x92 > > _mtx_unlock_spin_flags(c08fa000,0,c08174fc,be4) at > > _mtx_unlock_spin_flags+0x8d > > siocnputc(c08987c0,70) at siocnputc+0xb9 > > cnputc(70) at cnputc+0x4d > > putchar(70,ebba4b44) at putchar+0x52 > > kvprintf(c07fc2ed,c06219b8,ebba4b44,a,ebba4b64) at kvprintf+0x77 > > printf(c07fc2ed,c08bf060,c08bf060,100,c07ff91e,ebba4b98) at printf+0x43 > > panic(c07ff91e,c07b69b1,c0813aa8,c07ff6af,c5a23ebc) at panic+0xcb > > witness_destroy(c5a23ebc,c5a23dec,ebba4bd0,c063f15f,c5a23ebc) at > > witness_destroy > > +0x40 > > mtx_destroy(c5a23ebc,c5e94e00,c5a23ebc,ebba4be8,c063f4fd) at > > mtx_destroy+0x7a > > sodealloc(c5a23dec,c5a23eb0,c5832bb0,c5a23dec,ebba4c00) at sodealloc+0xf3 > > sofree(c5a23dec) at sofree+0x281 > > soclose(c5a23dec,c5832bb0,0,ebba4c2c,c05efad0) at soclose+0x24d > > soo_close(c5832bb0,c5f91780) at soo_close+0x4b > > fdrop_locked(c5832bb0,c5f91780,c53ccf40,0,c07f90b1) at fdrop_locked+0x84 > > fdrop(c5832bb0,c5f91780,ebba4c78,c0628c5c,c07f90ba) at fdrop+0x24 > > closef(c5832bb0,c5f91780,c5ebc628,0,c07f90b1) at closef+0x1db > > close(c5f91780,ebba4d14,1,14b,292) at close+0xe6 > > syscall(2f,bfbf002f,bfbf002f,3,bfbfedf7) at syscall+0x213 > > Xint0x80_syscall() at Xint0x80_syscall+0x1f > > --- syscall (6, FreeBSD ELF32, close), eip = 0x280c154b, esp = > > 0xbfbfec0c, ebp = > > 0xbfbfec48 --- > > db> show locks 15955 > > exclusive sleep mutex Giant r = 0 (0xc08bdbc0) locked @ > > kern/kern_descrip.c:967 > > > > So on face value, it looks like propagate_priority() is unhappy about the > > tcpconnect thread (that was preempted) still holding Giant. However, > > given that preemption is supposed to preempt but leave a thread runnable, > > this would seem not to be a panic-worthy condition. > > Yeah, pp doesn't like a thread in TD_CAN_RUN(). That state can only happen > if you have already panic'd though. I can fix it to allow TD_CAN_RUN if > the kernel has already panic'd though. Actually, this won't work. Here's the problem: the reason you got to pp() is that you blocked on Giant, right. So your ithread is going to block after pp() returns. Since the thread at the end of the chain is TD_CAN_RUN and due to the weird TDF_INPANIC stuff we have in choosethread(), the thread holding the lock that the ithread needs is _never_ going to run, so your ithread will never resume. The end result is that if you made pp() just return rather than panic'ing in this case, you would get a single 'p' character on your output, and then the machine would deadlock. At least this way you get into ddb. :-P -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/